LlamaIndex.TS + AxonFlow Integration
What The Current SDK Surface Gives RAG Teams
For LlamaIndex.TS, the value is not a separate RAG-specific adapter. It is that the same TypeScript SDK client already used in this integration now exposes:
explainDecision()for blocked or reviewed retrieval requests- audit search filters for
decisionId,policyName, andoverrideId - richer audit and decision correlation around the same governed SDK calls your retrieval layer already uses
That gives RAG teams a much better post-block and post-incident workflow: understand a deny through Decision Explainability and correlate the resulting audit trail without treating every blocked retrieval as a black box.
For LlamaIndex.TS teams, that is useful because many governance issues are not visible at query time alone. A retrieval can be blocked because of the user prompt, because of the document set being pulled into context, or because a prior exception changed what was allowed temporarily. Explainability and decision-scoped audit search make it much easier to answer "why was this retrieval denied?" or "why did this result set behave differently?" without combing through application logs and vector-store traces manually.
The important limitation is that standard LlamaIndex.TS integrations do not gain LangGraph-style checkpoints or step-level recovery from these additions alone. If a governed retrieval call is blocked, the application still has to decide how to recover. Explainability and audit search make that operationally tractable, but they do not create workflow resume semantics by themselves.
Why RAG Systems Need Governance
LlamaIndex.TS is the TypeScript/JavaScript implementation of the LlamaIndex framework, purpose-built for retrieval-augmented generation (RAG). It handles the full RAG pipeline: loading documents from PDFs, databases, and APIs; parsing and chunking content; creating vector embeddings; building searchable indexes; and synthesizing LLM responses with source attribution. For TypeScript and Node.js teams building RAG applications, it is the most comprehensive option available.
The governance challenge for RAG systems is different from simple LLM calls. In a RAG pipeline, sensitive data enters the LLM context through two paths: the user's query and the retrieved documents. A perfectly innocent query like "What are our top customer accounts?" can cause the retrieval step to pull documents containing customer emails, phone numbers, and addresses, which are then sent to the LLM as context. Without governance, there is no PII detection on retrieved documents, no access control on which indexes a user can query, and no audit trail showing what documents were retrieved and sent to the LLM.
Cost control is also a distinct concern for RAG. A broad similarity search might retrieve 50 documents, all of which get stuffed into the LLM context window. This drives up token costs significantly compared to a direct LLM call. Budget enforcement at the query level prevents individual queries from consuming excessive tokens through over-retrieval.
AxonFlow integrates with LlamaIndex.TS through the TypeScript SDK using gateway mode. The integration wraps LlamaIndex.TS query engine calls with getPolicyApprovedContext() before the query and auditLLMCall() after, including metadata about which sources were retrieved. AxonFlow's PII detection scans both the user query and the context sent to the LLM, catching sensitive data that enters through retrieval. Your existing LlamaIndex.TS index configurations, document loaders, and query engines remain unchanged.
What LlamaIndex.TS Does Well
LlamaIndex.TS is the TypeScript/JavaScript version of the popular LlamaIndex framework for building LLM applications with data connectivity. Its strengths are substantial:
Native TypeScript/JavaScript: First-class support for modern web stacks. Type safety, async/await patterns, and Node.js ecosystem integration.
Comprehensive Data Connectors: Load documents from PDFs, databases, APIs, and more. The ingestion pipeline handles parsing, chunking, and indexing.
Advanced Indexing Strategies: Vector, keyword, and tree-based indexes. Choose the right retrieval strategy for your use case.
Query Engines: Response synthesis with source attribution. Build RAG systems that cite their sources.
Streaming Support: Real-time responses for chat applications. Tokens stream as they're generated.
Agent and Tool Support: Build agents that use tools to accomplish tasks. LlamaIndex handles the orchestration.
What LlamaIndex.TS Doesn't Try to Solve
LlamaIndex.TS focuses on data connectivity and retrieval. These concerns are explicitly out of scope:
| Production Requirement | LlamaIndex.TS's Position |
|---|---|
| Policy enforcement before queries | Not provided—queries execute based on index configuration |
| PII detection in retrieved documents | Not addressed—retrieval returns whatever matches |
| SQL injection prevention | Not provided—must implement at data source level |
| Per-user or per-query cost attribution | Not tracked—requires external monitoring |
| Audit trails | Not built in—must implement logging externally |
| Source access control | Not addressed—any index can be queried |
| Token budget enforcement | Not provided—queries can consume unlimited tokens |
This isn't a criticism—it's a design choice. LlamaIndex.TS handles retrieval. Governance is a separate concern.
Where Teams Hit Production Friction
Based on real enterprise deployments, here are the blockers that appear after the prototype works:
1. The Runaway RAG Query
A query triggers similarity search. 50 documents match. All 50 are sent as context. The LLM generates a response. Repeat for every user query. Token costs spiral.
LlamaIndex.TS retrieved as requested. Nothing was watching how much it retrieved.
2. The "What Documents Were Used?" Question
A regulatory question was answered by an agent. The team needs:
- What sources were retrieved?
- What was the similarity score for each?
- What was actually sent to the LLM?
- Who asked the question?
LlamaIndex.TS returned source nodes. Without custom logging, the retrieval context is gone.
3. The PII in the Index
Customer support documents were indexed without preprocessing. Now customer emails, phone numbers, and addresses are being retrieved and sent to the LLM as context.
LlamaIndex.TS indexed what it was given. There's no built-in PII detection in the retrieval pipeline.
4. The Security Review Block
Security review: BLOCKED
- No audit trail for document retrieval
- PII can be retrieved and sent to LLM
- No policy enforcement on queries
- Cost controls missing
- Access control for indexes not implemented
The RAG system worked perfectly. It can't ship.
5. The Cross-Index Leakage
A multi-tenant deployment shares infrastructure. User A's query accidentally retrieves documents from User B's namespace due to a misconfigured filter. LlamaIndex.TS returned the results—there's no tenant isolation at the framework level.
How AxonFlow Plugs In
AxonFlow doesn't replace LlamaIndex.TS. It sits underneath it—providing the governance layer that LlamaIndex.TS intentionally doesn't include:
┌─────────────────┐
│ Your App │
└────────┬────────┘
│
v
┌─────────────────┐
│ LlamaIndex.TS │ <-- Data Ingestion, Indexing, Retrieval
└────────┬────────┘
│
v
┌─────────────────────────────────┐
│ AxonFlow │
│ ┌───────────┐ ┌────────────┐ │
│ │ Policy │ │ Audit │ │
│ │ Enforce │ │ Trail │ │
│ └───────────┘ └────────────┘ │
│ ┌───────────┐ ┌────────────┐ │
│ │ PII │ │ Cost │ │
│ │ Detection│ │ Control │ │
│ └───────────┘ └────────────┘ │
└────────────────┬────────────────┘
│
v
┌─────────────────┐
│ LLM Provider │
└─────────────────┘
What this gives you:
- Every query logged with retrieved sources and user context
- PII detected and blocked before retrieval reaches the LLM
- SQL injection attempts blocked in database queries
- Cost tracked per query, per index, per user
- Full audit trail for the complete retrieval history
What stays the same:
- Your LlamaIndex.TS code doesn't change
- Index configurations work as before
- No new abstractions to learn
Integration Pattern
Wrap LlamaIndex.TS queries with AxonFlow governance:
import { Document, VectorStoreIndex, Settings, OpenAI } from "llamaindex";
import { AxonFlow } from "@axonflow/sdk";
interface GovernedQueryResult {
response: string;
sources: string[];
contextId: string;
blocked?: boolean;
reason?: string;
}
class GovernedLlamaIndex {
private axonflow: AxonFlow;
private index: VectorStoreIndex | null = null;
constructor() {
this.axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT!,
clientId: process.env.AXONFLOW_CLIENT_ID!,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});
Settings.llm = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4",
});
}
async createIndex(documents: Document[]): Promise<void> {
this.index = await VectorStoreIndex.fromDocuments(documents);
}
async governedQuery(
userToken: string,
query: string,
context?: Record<string, unknown>
): Promise<GovernedQueryResult> {
if (!this.index) {
throw new Error("Index not initialized. Call createIndex first.");
}
const startTime = Date.now();
// 1. Pre-check with AxonFlow
const approval = await this.axonflow.getPolicyApprovedContext({
userToken,
query,
context: { ...context, framework: "llamaindex-ts", query_type: "rag" },
});
if (!approval.approved) {
return {
response: "",
sources: [],
contextId: approval.contextId,
blocked: true,
reason: approval.blockReason,
};
}
// 2. Execute LlamaIndex query
const queryEngine = this.index.asQueryEngine();
const result = await queryEngine.query({ query });
const response = result.response;
const sources =
result.sourceNodes?.map(
(node) => node.node.metadata?.source || "unknown"
) || [];
const latencyMs = Date.now() - startTime;
// 3. Audit the call
await this.axonflow.auditLLMCall({
contextId: approval.contextId,
responseSummary: response.slice(0, 200),
provider: "openai",
model: "gpt-4",
tokenUsage: { promptTokens: 500, completionTokens: 200, totalTokens: 700 },
latencyMs,
metadata: { source_count: sources.length, sources: sources.slice(0, 5) },
});
return { response, sources, contextId: approval.contextId };
}
}
// Usage
const llamaindex = new GovernedLlamaIndex();
const documents = [
new Document({
text: "Company policies...",
metadata: { source: "policies.pdf" },
}),
];
await llamaindex.createIndex(documents);
const result = await llamaindex.governedQuery(
"user-123",
"What is the vacation policy?",
{ department: "hr" }
);
RAG Policy Patterns
Retrieval-augmented generation introduces unique governance challenges: sensitive data can enter the LLM context through retrieved documents even when the user query itself is benign. AxonFlow addresses this with policies that target the RAG pipeline specifically.
Pattern 1: PII in Retrieved Documents
Block or redact PII found in documents before they reach the LLM:
// The governed query automatically checks both the user query
// AND the retrieved context against PII policies
const result = await llamaindex.governedQuery(
"analyst-user",
"What are our top customer accounts?",
{ query_type: "rag", data_source: "customer_db" }
);
// If retrieved documents contain PII (emails, phone numbers, SSNs),
// AxonFlow detects and redacts them before the LLM sees the context.
// The audit trail records which fields were redacted.
Pattern 2: Source Access Control
Enforce per-user access to specific document indexes:
// Create a policy that restricts index access by department
// POST http://localhost:8081/api/v1/policies
// {
// "name": "hr-docs-restricted",
// "pattern": "data_source.*hr_documents",
// "action": "block",
// "conditions": [
// {"field": "context.department", "operator": "not_equals", "value": "hr"}
// ]
// }
// Non-HR users are blocked from querying HR document indexes
const result = await llamaindex.governedQuery(
"engineering-user",
"What are the salary bands?",
{ query_type: "rag", data_source: "hr_documents", department: "engineering" }
);
// result.blocked === true
// result.reason === "Access to hr_documents restricted to HR department"
Pattern 3: Retrieval Volume Limits
Prevent excessive document retrieval that drives up token costs:
// AxonFlow tracks token usage per query. Configure budget policies
// that limit per-query spend, preventing runaway RAG retrieval:
//
// Budget policy: max $0.50 per individual query
// When exceeded: block (rather than sending 50 documents to the LLM)
These patterns ensure that RAG systems comply with data access policies and prevent sensitive information from leaking through the retrieval pipeline.
More Examples
| Pattern | Language | Link |
|---|---|---|
| Multi-Index Router | TypeScript | llamaindex-ts/router |
| Streaming with Governance | TypeScript | llamaindex-ts/streaming |
| Chat Engine | TypeScript | llamaindex-ts/chat |
