Skip to main content

LlamaIndex.TS + AxonFlow Integration

How to use this doc:

  • Understanding the problem? Read sections 1–4
  • Ready to implement? Jump to Integration Pattern

What LlamaIndex.TS Does Well

LlamaIndex.TS is the TypeScript/JavaScript version of the popular LlamaIndex framework for building LLM applications with data connectivity. Its strengths are substantial:

Native TypeScript/JavaScript: First-class support for modern web stacks. Type safety, async/await patterns, and Node.js ecosystem integration.

Comprehensive Data Connectors: Load documents from PDFs, databases, APIs, and more. The ingestion pipeline handles parsing, chunking, and indexing.

Advanced Indexing Strategies: Vector, keyword, and tree-based indexes. Choose the right retrieval strategy for your use case.

Query Engines: Response synthesis with source attribution. Build RAG systems that cite their sources.

Streaming Support: Real-time responses for chat applications. Tokens stream as they're generated.

Agent and Tool Support: Build agents that use tools to accomplish tasks. LlamaIndex handles the orchestration.


What LlamaIndex.TS Doesn't Try to Solve

LlamaIndex.TS focuses on data connectivity and retrieval. These concerns are explicitly out of scope:

Production RequirementLlamaIndex.TS's Position
Policy enforcement before queriesNot provided—queries execute based on index configuration
PII detection in retrieved documentsNot addressed—retrieval returns whatever matches
SQL injection preventionNot provided—must implement at data source level
Per-user or per-query cost attributionNot tracked—requires external monitoring
Audit trailsNot built in—must implement logging externally
Source access controlNot addressed—any index can be queried
Token budget enforcementNot provided—queries can consume unlimited tokens

This isn't a criticism—it's a design choice. LlamaIndex.TS handles retrieval. Governance is a separate concern.


Where Teams Hit Production Friction

Based on real enterprise deployments, here are the blockers that appear after the prototype works:

1. The Runaway RAG Query

A query triggers similarity search. 50 documents match. All 50 are sent as context. The LLM generates a response. Repeat for every user query. Token costs spiral.

LlamaIndex.TS retrieved as requested. Nothing was watching how much it retrieved.

2. The "What Documents Were Used?" Question

A regulatory question was answered by an agent. The team needs:

  • What sources were retrieved?
  • What was the similarity score for each?
  • What was actually sent to the LLM?
  • Who asked the question?

LlamaIndex.TS returned source nodes. Without custom logging, the retrieval context is gone.

3. The PII in the Index

Customer support documents were indexed without preprocessing. Now customer emails, phone numbers, and addresses are being retrieved and sent to the LLM as context.

LlamaIndex.TS indexed what it was given. There's no built-in PII detection in the retrieval pipeline.

4. The Security Review Block

Security review: BLOCKED
- No audit trail for document retrieval
- PII can be retrieved and sent to LLM
- No policy enforcement on queries
- Cost controls missing
- Access control for indexes not implemented

The RAG system worked perfectly. It can't ship.

5. The Cross-Index Leakage

A multi-tenant deployment shares infrastructure. User A's query accidentally retrieves documents from User B's namespace due to a misconfigured filter. LlamaIndex.TS returned the results—there's no tenant isolation at the framework level.


How AxonFlow Plugs In

AxonFlow doesn't replace LlamaIndex.TS. It sits underneath it—providing the governance layer that LlamaIndex.TS intentionally doesn't include:

┌─────────────────┐
│ Your App │
└────────┬────────┘

v
┌─────────────────┐
│ LlamaIndex.TS │ <-- Data Ingestion, Indexing, Retrieval
└────────┬────────┘

v
┌─────────────────────────────────┐
│ AxonFlow │
│ ┌───────────┐ ┌────────────┐ │
│ │ Policy │ │ Audit │ │
│ │ Enforce │ │ Trail │ │
│ └───────────┘ └────────────┘ │
│ ┌───────────┐ ┌────────────┐ │
│ │ PII │ │ Cost │ │
│ │ Detection│ │ Control │ │
│ └───────────┘ └────────────┘ │
└────────────────┬────────────────┘

v
┌─────────────────┐
│ LLM Provider │
└─────────────────┘

What this gives you:

  • Every query logged with retrieved sources and user context
  • PII detected and blocked before retrieval reaches the LLM
  • SQL injection attempts blocked in database queries
  • Cost tracked per query, per index, per user
  • Full audit trail for the complete retrieval history

What stays the same:

  • Your LlamaIndex.TS code doesn't change
  • Index configurations work as before
  • No new abstractions to learn

Integration Pattern

Wrap LlamaIndex.TS queries with AxonFlow governance:

import { Document, VectorStoreIndex, Settings, OpenAI } from "llamaindex";
import { AxonFlow } from "@axonflow/sdk";

interface GovernedQueryResult {
response: string;
sources: string[];
contextId: string;
blocked?: boolean;
reason?: string;
}

class GovernedLlamaIndex {
private axonflow: AxonFlow;
private index: VectorStoreIndex | null = null;

constructor() {
this.axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT!,
tenant: "llamaindex-ts-app",
});

Settings.llm = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4",
});
}

async createIndex(documents: Document[]): Promise<void> {
this.index = await VectorStoreIndex.fromDocuments(documents);
}

async governedQuery(
userToken: string,
query: string,
context?: Record<string, unknown>
): Promise<GovernedQueryResult> {
if (!this.index) {
throw new Error("Index not initialized. Call createIndex first.");
}

const startTime = Date.now();

// 1. Pre-check with AxonFlow
const approval = await this.axonflow.getPolicyApprovedContext({
userToken,
query,
context: { ...context, framework: "llamaindex-ts", query_type: "rag" },
});

if (!approval.approved) {
return {
response: "",
sources: [],
contextId: approval.contextId,
blocked: true,
reason: approval.blockReason,
};
}

// 2. Execute LlamaIndex query
const queryEngine = this.index.asQueryEngine();
const result = await queryEngine.query({ query });

const response = result.response;
const sources =
result.sourceNodes?.map(
(node) => node.node.metadata?.source || "unknown"
) || [];

const latencyMs = Date.now() - startTime;

// 3. Audit the call
await this.axonflow.auditLLMCall({
contextId: approval.contextId,
responseSummary: response.slice(0, 200),
provider: "openai",
model: "gpt-4",
tokenUsage: { promptTokens: 500, completionTokens: 200, totalTokens: 700 },
latencyMs,
metadata: { source_count: sources.length, sources: sources.slice(0, 5) },
});

return { response, sources, contextId: approval.contextId };
}
}

// Usage
const llamaindex = new GovernedLlamaIndex();

const documents = [
new Document({
text: "Company policies...",
metadata: { source: "policies.pdf" },
}),
];
await llamaindex.createIndex(documents);

const result = await llamaindex.governedQuery(
"user-123",
"What is the vacation policy?",
{ department: "hr" }
);

More Examples

PatternLanguageLink
Multi-Index RouterTypeScriptllamaindex-ts/router
Streaming with GovernanceTypeScriptllamaindex-ts/streaming
Chat EngineTypeScriptllamaindex-ts/chat