LlamaIndex.TS + AxonFlow Integration
Overview
LlamaIndex.TS is the TypeScript/JavaScript version of the popular LlamaIndex framework for building LLM-powered applications with data connectivity. It provides a comprehensive toolkit for ingestion, indexing, and querying with support for RAG (Retrieval-Augmented Generation) workflows.
AxonFlow adds real-time policy enforcement, audit trails, and compliance controls to ensure LlamaIndex.TS applications operate within enterprise governance requirements.
Together, they enable enterprises to build governed RAG applications with full observability and compliance.
Why Use AxonFlow with LlamaIndex.TS?
LlamaIndex.TS Strengths
- Native TypeScript/JavaScript for modern web stacks
- Comprehensive data connectors (PDFs, databases, APIs)
- Advanced indexing strategies (vector, keyword, tree)
- Query engines with response synthesis
- Streaming support for real-time applications
- Agent and tool support
AxonFlow Strengths
- Real-time inference governance (policy enforcement at request time)
- RAG-specific policies (source attribution, data access controls)
- Cross-system audit trails (track queries and responses)
- Cost control (budget limits per application or user)
- PII protection (automatic masking in retrieved context)
The Perfect Combination
LlamaIndex.TS handles: Data ingestion, indexing, retrieval, response synthesis
AxonFlow handles: Governance, compliance, audit trails, access control
Integration Architecture
AxonFlow integrates with LlamaIndex.TS using Gateway Mode, which wraps LLM calls with policy pre-checks and audit logging:
[LlamaIndex.TS Query Engine]
|
v
[AxonFlow Pre-Check] --> Policy Evaluation
|
v (if approved)
[LLM Provider (OpenAI/Anthropic)]
|
v
[AxonFlow Audit] --> Compliance Logging
|
v
[Response to LlamaIndex.TS]
Note: AxonFlow uses its own API for governance, not an OpenAI-compatible endpoint. Integration requires wrapping your LLM calls with AxonFlow's pre-check and audit endpoints.
Quick Start
Prerequisites
- AxonFlow running locally or deployed (see Getting Started)
- Node.js 18+
- LlamaIndex.TS installed
Install Dependencies
npm install llamaindex @axonflow/sdk openai
AxonFlow API Overview
AxonFlow Gateway Mode uses two main endpoints:
| Endpoint | Purpose |
|---|---|
POST /api/policy/pre-check | Policy evaluation before LLM call |
POST /api/audit/llm-call | Audit logging after LLM call completes |
Required Headers:
Content-Type: application/jsonX-Client-Secret: your-client-secretX-License-Key: your-license-key(optional, for enterprise features)
TypeScript SDK Integration
Create Governed LlamaIndex Service
import {
Document,
VectorStoreIndex,
OpenAI,
Settings,
QueryEngine,
} from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';
interface GovernedQueryResult {
response: string;
sources: string[];
contextId: string;
blocked?: boolean;
reason?: string;
}
class GovernedLlamaIndex {
private axonflow: GatewayModeClient;
private index: VectorStoreIndex | null = null;
constructor(
axonflowUrl: string,
clientSecret: string,
openaiKey: string,
licenseKey?: string
) {
this.axonflow = new GatewayModeClient({
agentUrl: axonflowUrl,
clientId: 'llamaindex-ts-app',
clientSecret,
licenseKey,
});
// Configure LlamaIndex to use OpenAI
Settings.llm = new OpenAI({
apiKey: openaiKey,
model: 'gpt-4',
});
}
async createIndex(documents: Document[]): Promise<void> {
this.index = await VectorStoreIndex.fromDocuments(documents);
}
async governedQuery(
userToken: string,
query: string,
context?: Record<string, unknown>
): Promise<GovernedQueryResult> {
if (!this.index) {
throw new Error('Index not initialized. Call createIndex first.');
}
const startTime = Date.now();
// 1. Pre-check with AxonFlow
const preCheck = await this.axonflow.preCheck({
userToken,
query,
context: {
...context,
framework: 'llamaindex-ts',
query_type: 'rag',
},
});
if (!preCheck.approved) {
return {
response: '',
sources: [],
contextId: preCheck.contextId,
blocked: true,
reason: preCheck.blockReason,
};
}
try {
// 2. Execute LlamaIndex query
const queryEngine = this.index.asQueryEngine();
const result = await queryEngine.query({ query });
const response = result.response;
const sources = result.sourceNodes?.map(
(node) => node.node.metadata?.source || 'unknown'
) || [];
const latencyMs = Date.now() - startTime;
// 3. Audit the call
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: 500, // Estimate for RAG
completionTokens: 200,
totalTokens: 700,
},
latencyMs,
metadata: {
source_count: sources.length,
sources: sources.slice(0, 5), // Log first 5 sources
},
});
return {
response,
sources,
contextId: preCheck.contextId,
};
} catch (error) {
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: `Error: ${error instanceof Error ? error.message : 'Unknown'}`,
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs,
metadata: { error: true },
});
throw error;
}
}
}
// Usage
const llamaindex = new GovernedLlamaIndex(
'http://localhost:8080',
process.env.AXONFLOW_CLIENT_SECRET!,
process.env.OPENAI_API_KEY!
);
// Create index from documents
const documents = [
new Document({ text: 'Company policies document...', metadata: { source: 'policies.pdf' } }),
new Document({ text: 'Employee handbook...', metadata: { source: 'handbook.pdf' } }),
];
await llamaindex.createIndex(documents);
// Query with governance
const result = await llamaindex.governedQuery(
'user-123',
'What is the vacation policy?',
{ department: 'hr' }
);
console.log(result.response);
console.log('Sources:', result.sources);
Integration Patterns
Pattern 1: Governed Chat Engine
Create a governed chat engine with conversation memory:
import {
Document,
VectorStoreIndex,
ContextChatEngine,
ChatMessage,
} from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';
class GovernedChatEngine {
private axonflow: GatewayModeClient;
private chatEngine: ContextChatEngine | null = null;
private conversationHistory: ChatMessage[] = [];
constructor(axonflowUrl: string, clientSecret: string) {
this.axonflow = new GatewayModeClient({
agentUrl: axonflowUrl,
clientId: 'llamaindex-chat',
clientSecret,
});
}
async initialize(index: VectorStoreIndex): Promise<void> {
this.chatEngine = index.asContextChatEngine();
}
async chat(
userToken: string,
message: string,
context?: Record<string, unknown>
): Promise<{ response: string; contextId: string }> {
if (!this.chatEngine) {
throw new Error('Chat engine not initialized');
}
const startTime = Date.now();
// Pre-check
const preCheck = await this.axonflow.preCheck({
userToken,
query: message,
context: {
...context,
conversation_length: this.conversationHistory.length,
query_type: 'chat',
},
});
if (!preCheck.approved) {
throw new Error(`Chat blocked: ${preCheck.blockReason}`);
}
try {
// Execute chat
const response = await this.chatEngine.chat({
message,
chatHistory: this.conversationHistory,
});
// Update history
this.conversationHistory.push(
{ role: 'user', content: message },
{ role: 'assistant', content: response.response }
);
const latencyMs = Date.now() - startTime;
// Audit
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: response.response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 300, completionTokens: 150, totalTokens: 450 },
latencyMs,
});
return {
response: response.response,
contextId: preCheck.contextId,
};
} catch (error) {
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: `Error: ${error instanceof Error ? error.message : 'Unknown'}`,
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs,
metadata: { error: true },
});
throw error;
}
}
clearHistory(): void {
this.conversationHistory = [];
}
}
Pattern 2: Multi-Index Router with Governance
Route queries to different indexes based on context:
import { VectorStoreIndex, Document } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';
interface IndexConfig {
name: string;
dataTier: 'public' | 'internal' | 'restricted';
index: VectorStoreIndex;
}
class GovernedIndexRouter {
private axonflow: GatewayModeClient;
private indexes: Map<string, IndexConfig> = new Map();
constructor(axonflowUrl: string, clientSecret: string) {
this.axonflow = new GatewayModeClient({
agentUrl: axonflowUrl,
clientId: 'llamaindex-router',
clientSecret,
});
}
async addIndex(
name: string,
documents: Document[],
dataTier: 'public' | 'internal' | 'restricted'
): Promise<void> {
const index = await VectorStoreIndex.fromDocuments(documents);
this.indexes.set(name, { name, dataTier, index });
}
async query(
userToken: string,
query: string,
indexName: string
): Promise<{ response: string; sources: string[] }> {
const indexConfig = this.indexes.get(indexName);
if (!indexConfig) {
throw new Error(`Index '${indexName}' not found`);
}
const startTime = Date.now();
// Pre-check with index-specific context
const preCheck = await this.axonflow.preCheck({
userToken,
query,
context: {
index_name: indexConfig.name,
data_tier: indexConfig.dataTier,
query_type: 'rag',
},
});
if (!preCheck.approved) {
throw new Error(`Query blocked: ${preCheck.blockReason}`);
}
const queryEngine = indexConfig.index.asQueryEngine();
const result = await queryEngine.query({ query });
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: result.response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 400, completionTokens: 150, totalTokens: 550 },
latencyMs,
metadata: {
index_name: indexConfig.name,
data_tier: indexConfig.dataTier,
},
});
return {
response: result.response,
sources: result.sourceNodes?.map((n) => n.node.metadata?.source) || [],
};
}
}
// Usage
const router = new GovernedIndexRouter(
'http://localhost:8080',
process.env.AXONFLOW_CLIENT_SECRET!
);
// Add different indexes
await router.addIndex('public-docs', publicDocuments, 'public');
await router.addIndex('internal-wiki', internalDocuments, 'internal');
await router.addIndex('financial-reports', financialDocuments, 'restricted');
// Query with governance
const result = await router.query('user-123', 'What are our Q4 projections?', 'financial-reports');
Pattern 3: Streaming with Governance
Handle streaming responses with governance:
import { VectorStoreIndex } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';
class GovernedStreamingQuery {
private axonflow: GatewayModeClient;
private index: VectorStoreIndex;
constructor(
axonflow: GatewayModeClient,
index: VectorStoreIndex
) {
this.axonflow = axonflow;
this.index = index;
}
async *streamQuery(
userToken: string,
query: string
): AsyncGenerator<string, void, unknown> {
const startTime = Date.now();
let fullResponse = '';
// Pre-check before streaming
const preCheck = await this.axonflow.preCheck({
userToken,
query,
context: { streaming: true },
});
if (!preCheck.approved) {
throw new Error(`Query blocked: ${preCheck.blockReason}`);
}
try {
const queryEngine = this.index.asQueryEngine();
const stream = await queryEngine.query({
query,
stream: true,
});
for await (const chunk of stream) {
fullResponse += chunk.response;
yield chunk.response;
}
const latencyMs = Date.now() - startTime;
// Audit after streaming completes
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: fullResponse.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: 400,
completionTokens: fullResponse.split(' ').length,
totalTokens: 400 + fullResponse.split(' ').length,
},
latencyMs,
metadata: { streaming: true },
});
} catch (error) {
const latencyMs = Date.now() - startTime;
await this.axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: `Stream error: ${error instanceof Error ? error.message : 'Unknown'}`,
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs,
metadata: { error: true, streaming: true },
});
throw error;
}
}
}
// Usage
const streamingQuery = new GovernedStreamingQuery(axonflow, index);
for await (const chunk of streamingQuery.streamQuery('user-123', 'Explain our benefits')) {
process.stdout.write(chunk);
}
Express.js API Example
Complete API server with governed LlamaIndex.TS:
import express from 'express';
import { Document, VectorStoreIndex, Settings, OpenAI } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';
const app = express();
app.use(express.json());
// Initialize
Settings.llm = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-4',
});
const axonflow = new GatewayModeClient({
agentUrl: process.env.AXONFLOW_URL || 'http://localhost:8080',
clientId: 'llamaindex-api',
clientSecret: process.env.AXONFLOW_CLIENT_SECRET!,
});
let index: VectorStoreIndex | null = null;
// Initialize index
app.post('/api/index', async (req, res) => {
try {
const { documents } = req.body;
const docs = documents.map(
(d: { text: string; source: string }) =>
new Document({ text: d.text, metadata: { source: d.source } })
);
index = await VectorStoreIndex.fromDocuments(docs);
res.json({ success: true, documentCount: docs.length });
} catch (error) {
res.status(500).json({ error: 'Failed to create index' });
}
});
// Governed query
app.post('/api/query', async (req, res) => {
if (!index) {
return res.status(400).json({ error: 'Index not initialized' });
}
const { user_token, query, context } = req.body;
const startTime = Date.now();
try {
// Pre-check
const preCheck = await axonflow.preCheck({
userToken: user_token,
query,
context: { ...context, framework: 'llamaindex-ts' },
});
if (!preCheck.approved) {
return res.status(403).json({
error: 'Query blocked by policy',
reason: preCheck.blockReason,
});
}
// Execute query
const queryEngine = index.asQueryEngine();
const result = await queryEngine.query({ query });
const latencyMs = Date.now() - startTime;
// Audit
await axonflow.auditLLMCall({
contextId: preCheck.contextId,
responseSummary: result.response.slice(0, 200),
provider: 'openai',
model: 'gpt-4',
tokenUsage: { promptTokens: 400, completionTokens: 150, totalTokens: 550 },
latencyMs,
});
res.json({
response: result.response,
sources: result.sourceNodes?.map((n) => n.node.metadata?.source) || [],
context_id: preCheck.contextId,
});
} catch (error) {
res.status(500).json({ error: 'Query failed' });
}
});
app.listen(3000, () => console.log('LlamaIndex.TS API running on :3000'));
AxonFlow Policy Configuration
Create policies for LlamaIndex.TS applications:
{
"policies": [
{
"name": "llamaindex-rag-policy",
"description": "Policy for LlamaIndex RAG queries",
"enabled": true,
"rules": [
{
"type": "pii_protection",
"config": {
"fields": ["email", "phone", "ssn"],
"action": "mask"
}
},
{
"type": "rate_limit",
"config": {
"requests_per_minute": 30,
"action": "throttle"
}
}
]
},
{
"name": "llamaindex-restricted-policy",
"description": "Policy for restricted data indexes",
"enabled": true,
"rules": [
{
"type": "content_filter",
"config": {
"blocked_patterns": ["confidential", "internal only"],
"action": "block"
}
}
]
}
]
}
Best Practices
1. Always Use Context IDs
const preCheck = await axonflow.preCheck({ userToken, query });
const contextId = preCheck.contextId; // Store immediately
// ... execute query ...
await axonflow.auditLLMCall({ contextId, ... }); // Use same contextId
2. Include Source Attribution in Audits
await axonflow.auditLLMCall({
contextId,
responseSummary: response,
metadata: {
source_count: sources.length,
sources: sources.slice(0, 5), // First 5 sources
},
// ...
});
3. Handle Streaming Properly
Pre-check before streaming starts, audit after streaming completes with full response.
4. Use Index-Specific Context
const preCheck = await axonflow.preCheck({
userToken,
query,
context: {
index_name: 'financial-reports',
data_tier: 'restricted',
department: 'finance',
},
});
Troubleshooting
Common Issues
Issue: Pre-check returns 401 Unauthorized
- Verify
X-Client-Secretheader is correct - Check
X-License-Keyif using enterprise features - Ensure client_id is registered in AxonFlow
Issue: Audit calls failing
- Verify context_id is from a valid pre-check (not expired)
- Check that AxonFlow agent is healthy (
/healthendpoint)
Issue: Streaming audits missing
- Ensure audit is called after stream completes
- Collect full response during streaming for audit