Skip to main content

LlamaIndex.TS + AxonFlow Integration

Overview

LlamaIndex.TS is the TypeScript/JavaScript version of the popular LlamaIndex framework for building LLM-powered applications with data connectivity. It provides a comprehensive toolkit for ingestion, indexing, and querying with support for RAG (Retrieval-Augmented Generation) workflows.

AxonFlow adds real-time policy enforcement, audit trails, and compliance controls to ensure LlamaIndex.TS applications operate within enterprise governance requirements.

Together, they enable enterprises to build governed RAG applications with full observability and compliance.


Why Use AxonFlow with LlamaIndex.TS?

LlamaIndex.TS Strengths

  • Native TypeScript/JavaScript for modern web stacks
  • Comprehensive data connectors (PDFs, databases, APIs)
  • Advanced indexing strategies (vector, keyword, tree)
  • Query engines with response synthesis
  • Streaming support for real-time applications
  • Agent and tool support

AxonFlow Strengths

  • Real-time inference governance (policy enforcement at request time)
  • RAG-specific policies (source attribution, data access controls)
  • Cross-system audit trails (track queries and responses)
  • Cost control (budget limits per application or user)
  • PII protection (automatic masking in retrieved context)

The Perfect Combination

LlamaIndex.TS handles: Data ingestion, indexing, retrieval, response synthesis
AxonFlow handles: Governance, compliance, audit trails, access control

Integration Architecture

AxonFlow integrates with LlamaIndex.TS using Gateway Mode, which wraps LLM calls with policy pre-checks and audit logging:

[LlamaIndex.TS Query Engine]
|
v
[AxonFlow Pre-Check] --> Policy Evaluation
|
v (if approved)
[LLM Provider (OpenAI/Anthropic)]
|
v
[AxonFlow Audit] --> Compliance Logging
|
v
[Response to LlamaIndex.TS]

Note: AxonFlow uses its own API for governance, not an OpenAI-compatible endpoint. Integration requires wrapping your LLM calls with AxonFlow's pre-check and audit endpoints.


Quick Start

Prerequisites

  • AxonFlow running locally or deployed (see Getting Started)
  • Node.js 18+
  • LlamaIndex.TS installed

Install Dependencies

npm install llamaindex @axonflow/sdk openai

AxonFlow API Overview

AxonFlow Gateway Mode uses two main endpoints:

EndpointPurpose
POST /api/policy/pre-checkPolicy evaluation before LLM call
POST /api/audit/llm-callAudit logging after LLM call completes

Required Headers:

  • Content-Type: application/json
  • X-Client-Secret: your-client-secret
  • X-License-Key: your-license-key (optional, for enterprise features)

TypeScript SDK Integration

Create Governed LlamaIndex Service

import {
Document,
VectorStoreIndex,
OpenAI,
Settings,
QueryEngine,
} from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

interface GovernedQueryResult {
response: string;
sources: string[];
contextId: string;
blocked?: boolean;
reason?: string;
}

class GovernedLlamaIndex {
private axonflow: GatewayModeClient;
private index: VectorStoreIndex | null = null;

constructor(
axonflowUrl: string,
clientSecret: string,
openaiKey: string,
licenseKey?: string
) {
this.axonflow = new GatewayModeClient({
agentUrl: axonflowUrl,
clientId: 'llamaindex-ts-app',
clientSecret,
licenseKey,
});

// Configure LlamaIndex to use OpenAI
Settings.llm = new OpenAI({
apiKey: openaiKey,
model: 'gpt-4',
});
}

async createIndex(documents: Document[]): Promise<void> {
this.index = await VectorStoreIndex.fromDocuments(documents);
}

async governedQuery(
userToken: string,
query: string,
context?: Record<string, unknown>
): Promise<GovernedQueryResult> {
if (!this.index) {
throw new Error('Index not initialized. Call createIndex first.');
}

const startTime = Date.now();

// 1. Pre-check with AxonFlow
const preCheck = await this.axonflow.preCheck({
userToken,
query,
context: {
...context,
framework: 'llamaindex-ts',
query_type: 'rag',
},
});

if (!preCheck.approved) {
return {
response: '',
sources: [],
contextId: preCheck.contextId,
blocked: true,
reason: preCheck.blockReason,
};
}

try {
// 2. Execute LlamaIndex query
const queryEngine = this.index.asQueryEngine();
const result = await queryEngine.query({ query });

const response = result.response;
const sources = result.sourceNodes?.map(
(node) => node.node.metadata?.source || 'unknown'
) || [];

const latencyMs = Date.now() - startTime;

// 3. Audit the call
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: 500, // Estimate for RAG
completionTokens: 200,
totalTokens: 700,
},
latencyMs,
metadata: {
source_count: sources.length,
sources: sources.slice(0, 5), // Log first 5 sources
},
});

return {
response,
sources,
contextId: preCheck.contextId,
};

} catch (error) {
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: `Error: ${error instanceof Error ? error.message : 'Unknown'}`,
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs,
metadata: { error: true },
});
throw error;
}
}
}

// Usage
const llamaindex = new GovernedLlamaIndex(
'http://localhost:8080',
process.env.AXONFLOW_CLIENT_SECRET!,
process.env.OPENAI_API_KEY!
);

// Create index from documents
const documents = [
new Document({ text: 'Company policies document...', metadata: { source: 'policies.pdf' } }),
new Document({ text: 'Employee handbook...', metadata: { source: 'handbook.pdf' } }),
];

await llamaindex.createIndex(documents);

// Query with governance
const result = await llamaindex.governedQuery(
'user-123',
'What is the vacation policy?',
{ department: 'hr' }
);

console.log(result.response);
console.log('Sources:', result.sources);

Integration Patterns

Pattern 1: Governed Chat Engine

Create a governed chat engine with conversation memory:

import {
Document,
VectorStoreIndex,
ContextChatEngine,
ChatMessage,
} from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

class GovernedChatEngine {
private axonflow: GatewayModeClient;
private chatEngine: ContextChatEngine | null = null;
private conversationHistory: ChatMessage[] = [];

constructor(axonflowUrl: string, clientSecret: string) {
this.axonflow = new GatewayModeClient({
agentUrl: axonflowUrl,
clientId: 'llamaindex-chat',
clientSecret,
});
}

async initialize(index: VectorStoreIndex): Promise<void> {
this.chatEngine = index.asContextChatEngine();
}

async chat(
userToken: string,
message: string,
context?: Record<string, unknown>
): Promise<{ response: string; contextId: string }> {
if (!this.chatEngine) {
throw new Error('Chat engine not initialized');
}

const startTime = Date.now();

// Pre-check
const preCheck = await this.axonflow.preCheck({
userToken,
query: message,
context: {
...context,
conversation_length: this.conversationHistory.length,
query_type: 'chat',
},
});

if (!preCheck.approved) {
throw new Error(`Chat blocked: ${preCheck.blockReason}`);
}

try {
// Execute chat
const response = await this.chatEngine.chat({
message,
chatHistory: this.conversationHistory,
});

// Update history
this.conversationHistory.push(
{ role: 'user', content: message },
{ role: 'assistant', content: response.response }
);

const latencyMs = Date.now() - startTime;

// Audit
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: response.response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 300, completionTokens: 150, totalTokens: 450 },
latencyMs,
});

return {
response: response.response,
contextId: preCheck.contextId,
};

} catch (error) {
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: `Error: ${error instanceof Error ? error.message : 'Unknown'}`,
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs,
metadata: { error: true },
});
throw error;
}
}

clearHistory(): void {
this.conversationHistory = [];
}
}

Pattern 2: Multi-Index Router with Governance

Route queries to different indexes based on context:

import { VectorStoreIndex, Document } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

interface IndexConfig {
name: string;
dataTier: 'public' | 'internal' | 'restricted';
index: VectorStoreIndex;
}

class GovernedIndexRouter {
private axonflow: GatewayModeClient;
private indexes: Map<string, IndexConfig> = new Map();

constructor(axonflowUrl: string, clientSecret: string) {
this.axonflow = new GatewayModeClient({
agentUrl: axonflowUrl,
clientId: 'llamaindex-router',
clientSecret,
});
}

async addIndex(
name: string,
documents: Document[],
dataTier: 'public' | 'internal' | 'restricted'
): Promise<void> {
const index = await VectorStoreIndex.fromDocuments(documents);
this.indexes.set(name, { name, dataTier, index });
}

async query(
userToken: string,
query: string,
indexName: string
): Promise<{ response: string; sources: string[] }> {
const indexConfig = this.indexes.get(indexName);
if (!indexConfig) {
throw new Error(`Index '${indexName}' not found`);
}

const startTime = Date.now();

// Pre-check with index-specific context
const preCheck = await this.axonflow.preCheck({
userToken,
query,
context: {
index_name: indexConfig.name,
data_tier: indexConfig.dataTier,
query_type: 'rag',
},
});

if (!preCheck.approved) {
throw new Error(`Query blocked: ${preCheck.blockReason}`);
}

const queryEngine = indexConfig.index.asQueryEngine();
const result = await queryEngine.query({ query });

const latencyMs = Date.now() - startTime;

await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: result.response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 400, completionTokens: 150, totalTokens: 550 },
latencyMs,
metadata: {
index_name: indexConfig.name,
data_tier: indexConfig.dataTier,
},
});

return {
response: result.response,
sources: result.sourceNodes?.map((n) => n.node.metadata?.source) || [],
};
}
}

// Usage
const router = new GovernedIndexRouter(
'http://localhost:8080',
process.env.AXONFLOW_CLIENT_SECRET!
);

// Add different indexes
await router.addIndex('public-docs', publicDocuments, 'public');
await router.addIndex('internal-wiki', internalDocuments, 'internal');
await router.addIndex('financial-reports', financialDocuments, 'restricted');

// Query with governance
const result = await router.query('user-123', 'What are our Q4 projections?', 'financial-reports');

Pattern 3: Streaming with Governance

Handle streaming responses with governance:

import { VectorStoreIndex } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

class GovernedStreamingQuery {
private axonflow: GatewayModeClient;
private index: VectorStoreIndex;

constructor(
axonflow: GatewayModeClient,
index: VectorStoreIndex
) {
this.axonflow = axonflow;
this.index = index;
}

async *streamQuery(
userToken: string,
query: string
): AsyncGenerator<string, void, unknown> {
const startTime = Date.now();
let fullResponse = '';

// Pre-check before streaming
const preCheck = await this.axonflow.preCheck({
userToken,
query,
context: { streaming: true },
});

if (!preCheck.approved) {
throw new Error(`Query blocked: ${preCheck.blockReason}`);
}

try {
const queryEngine = this.index.asQueryEngine();
const stream = await queryEngine.query({
query,
stream: true,
});

for await (const chunk of stream) {
fullResponse += chunk.response;
yield chunk.response;
}

const latencyMs = Date.now() - startTime;

// Audit after streaming completes
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: fullResponse.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: 400,
completionTokens: fullResponse.split(' ').length,
totalTokens: 400 + fullResponse.split(' ').length,
},
latencyMs,
metadata: { streaming: true },
});

} catch (error) {
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: `Stream error: ${error instanceof Error ? error.message : 'Unknown'}`,
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs,
metadata: { error: true, streaming: true },
});
throw error;
}
}
}

// Usage
const streamingQuery = new GovernedStreamingQuery(axonflow, index);

for await (const chunk of streamingQuery.streamQuery('user-123', 'Explain our benefits')) {
process.stdout.write(chunk);
}

Express.js API Example

Complete API server with governed LlamaIndex.TS:

import express from 'express';
import { Document, VectorStoreIndex, Settings, OpenAI } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

const app = express();
app.use(express.json());

// Initialize
Settings.llm = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4',
});

const axonflow = new GatewayModeClient({
agentUrl: process.env.AXONFLOW_URL || 'http://localhost:8080',
clientId: 'llamaindex-api',
clientSecret: process.env.AXONFLOW_CLIENT_SECRET!,
});

let index: VectorStoreIndex | null = null;

// Initialize index
app.post('/api/index', async (req, res) => {
try {
const { documents } = req.body;
const docs = documents.map(
(d: { text: string; source: string }) =>
new Document({ text: d.text, metadata: { source: d.source } })
);
index = await VectorStoreIndex.fromDocuments(docs);
res.json({ success: true, documentCount: docs.length });
} catch (error) {
res.status(500).json({ error: 'Failed to create index' });
}
});

// Governed query
app.post('/api/query', async (req, res) => {
if (!index) {
return res.status(400).json({ error: 'Index not initialized' });
}

const { user_token, query, context } = req.body;
const startTime = Date.now();

try {
// Pre-check
const preCheck = await axonflow.preCheck({
userToken: user_token,
query,
context: { ...context, framework: 'llamaindex-ts' },
});

if (!preCheck.approved) {
return res.status(403).json({
error: 'Query blocked by policy',
reason: preCheck.blockReason,
});
}

// Execute query
const queryEngine = index.asQueryEngine();
const result = await queryEngine.query({ query });

const latencyMs = Date.now() - startTime;

// Audit
await axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: result.response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 400, completionTokens: 150, totalTokens: 550 },
latencyMs,
});

res.json({
response: result.response,
sources: result.sourceNodes?.map((n) => n.node.metadata?.source) || [],
context_id: preCheck.contextId,
});

} catch (error) {
res.status(500).json({ error: 'Query failed' });
}
});

app.listen(3000, () => console.log('LlamaIndex.TS API running on :3000'));

AxonFlow Policy Configuration

Create policies for LlamaIndex.TS applications:

{
"policies": [
{
"name": "llamaindex-rag-policy",
"description": "Policy for LlamaIndex RAG queries",
"enabled": true,
"rules": [
{
"type": "pii_protection",
"config": {
"fields": ["email", "phone", "ssn"],
"action": "mask"
}
},
{
"type": "rate_limit",
"config": {
"requests_per_minute": 30,
"action": "throttle"
}
}
]
},
{
"name": "llamaindex-restricted-policy",
"description": "Policy for restricted data indexes",
"enabled": true,
"rules": [
{
"type": "content_filter",
"config": {
"blocked_patterns": ["confidential", "internal only"],
"action": "block"
}
}
]
}
]
}

Best Practices

1. Always Use Context IDs

const preCheck = await axonflow.preCheck({ userToken, query });
const contextId = preCheck.contextId; // Store immediately

// ... execute query ...

await axonflow.auditLLMCall({ contextId, ... }); // Use same contextId

2. Include Source Attribution in Audits

await axonflow.auditLLMCall({
contextId,
responseSummary: response,
metadata: {
source_count: sources.length,
sources: sources.slice(0, 5), // First 5 sources
},
// ...
});

3. Handle Streaming Properly

Pre-check before streaming starts, audit after streaming completes with full response.

4. Use Index-Specific Context

const preCheck = await axonflow.preCheck({
userToken,
query,
context: {
index_name: 'financial-reports',
data_tier: 'restricted',
department: 'finance',
},
});

Troubleshooting

Common Issues

Issue: Pre-check returns 401 Unauthorized

  • Verify X-Client-Secret header is correct
  • Check X-License-Key if using enterprise features
  • Ensure client_id is registered in AxonFlow

Issue: Audit calls failing

  • Verify context_id is from a valid pre-check (not expired)
  • Check that AxonFlow agent is healthy (/health endpoint)

Issue: Streaming audits missing

  • Ensure audit is called after stream completes
  • Collect full response during streaming for audit

Additional Resources