LlamaIndex.TS + AxonFlow Integration

What The Current SDK Surface Gives RAG Teams

For LlamaIndex.TS, the value is not a separate RAG-specific adapter. It is that the same TypeScript SDK client already used in this integration now exposes:

explainDecision() for blocked or reviewed retrieval requests
audit search filters for decisionId, policyName, and overrideId
richer audit and decision correlation around the same governed SDK calls your retrieval layer already uses

That gives RAG teams a much better post-block and post-incident workflow: understand a deny through Decision Explainability and correlate the resulting audit trail without treating every blocked retrieval as a black box.

For LlamaIndex.TS teams, that is useful because many governance issues are not visible at query time alone. A retrieval can be blocked because of the user prompt, because of the document set being pulled into context, or because a prior exception changed what was allowed temporarily. Explainability and decision-scoped audit search make it much easier to answer "why was this retrieval denied?" or "why did this result set behave differently?" without combing through application logs and vector-store traces manually.

The important limitation is that standard LlamaIndex.TS integrations do not gain LangGraph-style checkpoints or step-level recovery from these additions alone. If a governed retrieval call is blocked, the application still has to decide how to recover. Explainability and audit search make that operationally tractable, but they do not create workflow resume semantics by themselves.

Why RAG Systems Need Governance

LlamaIndex.TS is the TypeScript/JavaScript implementation of the LlamaIndex framework, purpose-built for retrieval-augmented generation (RAG). It handles the full RAG pipeline: loading documents from PDFs, databases, and APIs; parsing and chunking content; creating vector embeddings; building searchable indexes; and synthesizing LLM responses with source attribution. For TypeScript and Node.js teams building RAG applications, it is the most comprehensive option available.

The governance challenge for RAG systems is different from simple LLM calls. In a RAG pipeline, sensitive data enters the LLM context through two paths: the user's query and the retrieved documents. A perfectly innocent query like "What are our top customer accounts?" can cause the retrieval step to pull documents containing customer emails, phone numbers, and addresses, which are then sent to the LLM as context. Without governance, there is no PII detection on retrieved documents, no access control on which indexes a user can query, and no audit trail showing what documents were retrieved and sent to the LLM.

Cost control is also a distinct concern for RAG. A broad similarity search might retrieve 50 documents, all of which get stuffed into the LLM context window. This drives up token costs significantly compared to a direct LLM call. Budget enforcement at the query level prevents individual queries from consuming excessive tokens through over-retrieval.

AxonFlow integrates with LlamaIndex.TS through the TypeScript SDK using gateway mode. The integration wraps LlamaIndex.TS query engine calls with getPolicyApprovedContext() before the query and auditLLMCall() after, including metadata about which sources were retrieved. AxonFlow's PII detection scans both the user query and the context sent to the LLM, catching sensitive data that enters through retrieval. Your existing LlamaIndex.TS index configurations, document loaders, and query engines remain unchanged.

What LlamaIndex.TS Does Well

LlamaIndex.TS is the TypeScript/JavaScript version of the popular LlamaIndex framework for building LLM applications with data connectivity. Its strengths are substantial:

Native TypeScript/JavaScript: First-class support for modern web stacks. Type safety, async/await patterns, and Node.js ecosystem integration.

Comprehensive Data Connectors: Load documents from PDFs, databases, APIs, and more. The ingestion pipeline handles parsing, chunking, and indexing.

Advanced Indexing Strategies: Vector, keyword, and tree-based indexes. Choose the right retrieval strategy for your use case.

Query Engines: Response synthesis with source attribution. Build RAG systems that cite their sources.

Streaming Support: Real-time responses for chat applications. Tokens stream as they're generated.

Agent and Tool Support: Build agents that use tools to accomplish tasks. LlamaIndex handles the orchestration.

What LlamaIndex.TS Doesn't Try to Solve

LlamaIndex.TS focuses on data connectivity and retrieval. These concerns are explicitly out of scope:

Production Requirement	LlamaIndex.TS's Position
Policy enforcement before queries	Not provided—queries execute based on index configuration
PII detection in retrieved documents	Not addressed—retrieval returns whatever matches
SQL injection prevention	Not provided—must implement at data source level
Per-user or per-query cost attribution	Not tracked—requires external monitoring
Audit trails	Not built in—must implement logging externally
Source access control	Not addressed—any index can be queried
Token budget enforcement	Not provided—queries can consume unlimited tokens

This isn't a criticism—it's a design choice. LlamaIndex.TS handles retrieval. Governance is a separate concern.

Where Teams Hit Production Friction

Based on real enterprise deployments, here are the blockers that appear after the prototype works:

1. The Runaway RAG Query

A query triggers similarity search. 50 documents match. All 50 are sent as context. The LLM generates a response. Repeat for every user query. Token costs spiral.

LlamaIndex.TS retrieved as requested. Nothing was watching how much it retrieved.

2. The "What Documents Were Used?" Question

A regulatory question was answered by an agent. The team needs:

What sources were retrieved?
What was the similarity score for each?
What was actually sent to the LLM?
Who asked the question?

LlamaIndex.TS returned source nodes. Without custom logging, the retrieval context is gone.

3. The PII in the Index

Customer support documents were indexed without preprocessing. Now customer emails, phone numbers, and addresses are being retrieved and sent to the LLM as context.

LlamaIndex.TS indexed what it was given. There's no built-in PII detection in the retrieval pipeline.

4. The Security Review Block

Security review: BLOCKED
- No audit trail for document retrieval
- PII can be retrieved and sent to LLM
- No policy enforcement on queries
- Cost controls missing
- Access control for indexes not implemented

The RAG system worked perfectly. It can't ship.

5. The Cross-Index Leakage

A multi-tenant deployment shares infrastructure. User A's query accidentally retrieves documents from User B's namespace due to a misconfigured filter. LlamaIndex.TS returned the results—there's no tenant isolation at the framework level.

How AxonFlow Plugs In

AxonFlow doesn't replace LlamaIndex.TS. It sits underneath it—providing the governance layer that LlamaIndex.TS intentionally doesn't include:

┌─────────────────┐
│   Your App      │
└────────┬────────┘
         │
         v
┌─────────────────┐
│  LlamaIndex.TS  │  <-- Data Ingestion, Indexing, Retrieval
└────────┬────────┘
         │
         v
┌─────────────────────────────────┐
│         AxonFlow                │
│  ┌───────────┐  ┌────────────┐  │
│  │  Policy   │  │   Audit    │  │
│  │  Enforce  │  │   Trail    │  │
│  └───────────┘  └────────────┘  │
│  ┌───────────┐  ┌────────────┐  │
│  │    PII    │  │    Cost    │  │
│  │  Detection│  │   Control  │  │
│  └───────────┘  └────────────┘  │
└────────────────┬────────────────┘
                 │
                 v
┌─────────────────┐
│   LLM Provider  │
└─────────────────┘

What this gives you:

Every query logged with retrieved sources and user context
PII detected and blocked before retrieval reaches the LLM
SQL injection attempts blocked in database queries
Cost tracked per query, per index, per user
Full audit trail for the complete retrieval history

What stays the same:

Your LlamaIndex.TS code doesn't change
Index configurations work as before
No new abstractions to learn

Integration Pattern

Wrap LlamaIndex.TS queries with AxonFlow governance:

import { Document, VectorStoreIndex, Settings, OpenAI } from "llamaindex";
import { AxonFlow } from "@axonflow/sdk";

interface GovernedQueryResult {
  response: string;
  sources: string[];
  contextId: string;
  blocked?: boolean;
  reason?: string;
}

class GovernedLlamaIndex {
  private axonflow: AxonFlow;
  private index: VectorStoreIndex | null = null;

  constructor() {
    this.axonflow = new AxonFlow({
      endpoint: process.env.AXONFLOW_ENDPOINT!,
      clientId: process.env.AXONFLOW_CLIENT_ID!,
      clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
    });

    Settings.llm = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY,
      model: "gpt-4",
    });
  }

  async createIndex(documents: Document[]): Promise<void> {
    this.index = await VectorStoreIndex.fromDocuments(documents);
  }

  async governedQuery(
    userToken: string,
    query: string,
    context?: Record<string, unknown>
  ): Promise<GovernedQueryResult> {
    if (!this.index) {
      throw new Error("Index not initialized. Call createIndex first.");
    }

    const startTime = Date.now();

    // 1. Pre-check with AxonFlow
    const approval = await this.axonflow.getPolicyApprovedContext({
      userToken,
      query,
      context: { ...context, framework: "llamaindex-ts", query_type: "rag" },
    });

    if (!approval.approved) {
      return {
        response: "",
        sources: [],
        contextId: approval.contextId,
        blocked: true,
        reason: approval.blockReason,
      };
    }

    // 2. Execute LlamaIndex query
    const queryEngine = this.index.asQueryEngine();
    const result = await queryEngine.query({ query });

    const response = result.response;
    const sources =
      result.sourceNodes?.map(
        (node) => node.node.metadata?.source || "unknown"
      ) || [];

    const latencyMs = Date.now() - startTime;

    // 3. Audit the call
    await this.axonflow.auditLLMCall({
      contextId: approval.contextId,
      responseSummary: response.slice(0, 200),
      provider: "openai",
      model: "gpt-4",
      tokenUsage: { promptTokens: 500, completionTokens: 200, totalTokens: 700 },
      latencyMs,
      metadata: { source_count: sources.length, sources: sources.slice(0, 5) },
    });

    return { response, sources, contextId: approval.contextId };
  }
}

// Usage
const llamaindex = new GovernedLlamaIndex();

const documents = [
  new Document({
    text: "Company policies...",
    metadata: { source: "policies.pdf" },
  }),
];
await llamaindex.createIndex(documents);

const result = await llamaindex.governedQuery(
  "user-123",
  "What is the vacation policy?",
  { department: "hr" }
);

RAG Policy Patterns

Retrieval-augmented generation introduces unique governance challenges: sensitive data can enter the LLM context through retrieved documents even when the user query itself is benign. AxonFlow addresses this with policies that target the RAG pipeline specifically.

Pattern 1: PII in Retrieved Documents

Block or redact PII found in documents before they reach the LLM:

// The governed query automatically checks both the user query
// AND the retrieved context against PII policies
const result = await llamaindex.governedQuery(
  "analyst-user",
  "What are our top customer accounts?",
  { query_type: "rag", data_source: "customer_db" }
);

// If retrieved documents contain PII (emails, phone numbers, SSNs),
// AxonFlow detects and redacts them before the LLM sees the context.
// The audit trail records which fields were redacted.

Pattern 2: Source Access Control

Enforce per-user access to specific document indexes:

// Create a policy that restricts index access by department
// POST http://localhost:8081/api/v1/policies
// {
//   "name": "hr-docs-restricted",
//   "pattern": "data_source.*hr_documents",
//   "action": "block",
//   "conditions": [
//     {"field": "context.department", "operator": "not_equals", "value": "hr"}
//   ]
// }

// Non-HR users are blocked from querying HR document indexes
const result = await llamaindex.governedQuery(
  "engineering-user",
  "What are the salary bands?",
  { query_type: "rag", data_source: "hr_documents", department: "engineering" }
);
// result.blocked === true
// result.reason === "Access to hr_documents restricted to HR department"

Pattern 3: Retrieval Volume Limits

Prevent excessive document retrieval that drives up token costs:

// AxonFlow tracks token usage per query. Configure budget policies
// that limit per-query spend, preventing runaway RAG retrieval:
//
// Budget policy: max $0.50 per individual query
// When exceeded: block (rather than sending 50 documents to the LLM)

These patterns ensure that RAG systems comply with data access policies and prevent sensitive information from leaking through the retrieval pipeline.

More Examples

Pattern	Language	Link
Multi-Index Router	TypeScript	llamaindex-ts/router
Streaming with Governance	TypeScript	llamaindex-ts/streaming
Chat Engine	TypeScript	llamaindex-ts/chat

What The Current SDK Surface Gives RAG Teams​

Why RAG Systems Need Governance​

What LlamaIndex.TS Does Well​

What LlamaIndex.TS Doesn't Try to Solve​

Where Teams Hit Production Friction​

1. The Runaway RAG Query​

2. The "What Documents Were Used?" Question​

3. The PII in the Index​

4. The Security Review Block​

5. The Cross-Index Leakage​

How AxonFlow Plugs In​

Integration Pattern​

RAG Policy Patterns​

Pattern 1: PII in Retrieved Documents​

Pattern 2: Source Access Control​

Pattern 3: Retrieval Volume Limits​

More Examples​

Related Resources​