LlamaIndex.TS + AxonFlow Integration

Overview

LlamaIndex.TS is the TypeScript/JavaScript version of the popular LlamaIndex framework for building LLM-powered applications with data connectivity. It provides a comprehensive toolkit for ingestion, indexing, and querying with support for RAG (Retrieval-Augmented Generation) workflows.

AxonFlow adds real-time policy enforcement, audit trails, and compliance controls to ensure LlamaIndex.TS applications operate within enterprise governance requirements.

Together, they enable enterprises to build governed RAG applications with full observability and compliance.

Why Use AxonFlow with LlamaIndex.TS?

LlamaIndex.TS Strengths

Native TypeScript/JavaScript for modern web stacks
Comprehensive data connectors (PDFs, databases, APIs)
Advanced indexing strategies (vector, keyword, tree)
Query engines with response synthesis
Streaming support for real-time applications
Agent and tool support

AxonFlow Strengths

Real-time inference governance (policy enforcement at request time)
RAG-specific policies (source attribution, data access controls)
Cross-system audit trails (track queries and responses)
Cost control (budget limits per application or user)
PII protection (automatic masking in retrieved context)

The Perfect Combination

LlamaIndex.TS handles: Data ingestion, indexing, retrieval, response synthesis
AxonFlow handles: Governance, compliance, audit trails, access control

Integration Architecture

AxonFlow integrates with LlamaIndex.TS using Gateway Mode, which wraps LLM calls with policy pre-checks and audit logging:

[LlamaIndex.TS Query Engine]
       |
       v
[AxonFlow Pre-Check] --> Policy Evaluation
       |
       v (if approved)
[LLM Provider (OpenAI/Anthropic)]
       |
       v
[AxonFlow Audit] --> Compliance Logging
       |
       v
[Response to LlamaIndex.TS]

Note: AxonFlow uses its own API for governance, not an OpenAI-compatible endpoint. Integration requires wrapping your LLM calls with AxonFlow's pre-check and audit endpoints.

Quick Start

Prerequisites

AxonFlow running locally or deployed (see Getting Started)
Node.js 18+
LlamaIndex.TS installed

Install Dependencies

npm install llamaindex @axonflow/sdk openai

AxonFlow API Overview

AxonFlow Gateway Mode uses two main endpoints:

Endpoint	Purpose
`POST /api/policy/pre-check`	Policy evaluation before LLM call
`POST /api/audit/llm-call`	Audit logging after LLM call completes

Required Headers:

Content-Type: application/json
X-Client-Secret: your-client-secret
X-License-Key: your-license-key (optional, for enterprise features)

TypeScript SDK Integration

Create Governed LlamaIndex Service

import {
  Document,
  VectorStoreIndex,
  OpenAI,
  Settings,
  QueryEngine,
} from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

interface GovernedQueryResult {
  response: string;
  sources: string[];
  contextId: string;
  blocked?: boolean;
  reason?: string;
}

class GovernedLlamaIndex {
  private axonflow: GatewayModeClient;
  private index: VectorStoreIndex | null = null;

  constructor(
    axonflowUrl: string,
    clientSecret: string,
    openaiKey: string,
    licenseKey?: string
  ) {
    this.axonflow = new GatewayModeClient({
      agentUrl: axonflowUrl,
      clientId: 'llamaindex-ts-app',
      clientSecret,
      licenseKey,
    });

    // Configure LlamaIndex to use OpenAI
    Settings.llm = new OpenAI({
      apiKey: openaiKey,
      model: 'gpt-4',
    });
  }

  async createIndex(documents: Document[]): Promise<void> {
    this.index = await VectorStoreIndex.fromDocuments(documents);
  }

  async governedQuery(
    userToken: string,
    query: string,
    context?: Record<string, unknown>
  ): Promise<GovernedQueryResult> {
    if (!this.index) {
      throw new Error('Index not initialized. Call createIndex first.');
    }

    const startTime = Date.now();

    // 1. Pre-check with AxonFlow
    const preCheck = await this.axonflow.preCheck({
      userToken,
      query,
      context: {
        ...context,
        framework: 'llamaindex-ts',
        query_type: 'rag',
      },
    });

    if (!preCheck.approved) {
      return {
        response: '',
        sources: [],
        contextId: preCheck.contextId,
        blocked: true,
        reason: preCheck.blockReason,
      };
    }

    try {
      // 2. Execute LlamaIndex query
      const queryEngine = this.index.asQueryEngine();
      const result = await queryEngine.query({ query });

      const response = result.response;
      const sources = result.sourceNodes?.map(
        (node) => node.node.metadata?.source || 'unknown'
      ) || [];

      const latencyMs = Date.now() - startTime;

      // 3. Audit the call
      await this.axonflow.auditLLMCall({
        contextId: preCheck.contextId,
        responseSummary: response.slice(0, 200),
        provider: 'openai',
        model: 'gpt-4',
        tokenUsage: {
          promptTokens: 500, // Estimate for RAG
          completionTokens: 200,
          totalTokens: 700,
        },
        latencyMs,
        metadata: {
          source_count: sources.length,
          sources: sources.slice(0, 5), // Log first 5 sources
        },
      });

      return {
        response,
        sources,
        contextId: preCheck.contextId,
      };

    } catch (error) {
      const latencyMs = Date.now() - startTime;
      await this.axonflow.auditLLMCall({
        contextId: preCheck.contextId,
        responseSummary: `Error: ${error instanceof Error ? error.message : 'Unknown'}`,
        provider: 'openai',
        model: 'gpt-4',
        tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
        latencyMs,
        metadata: { error: true },
      });
      throw error;
    }
  }
}

// Usage
const llamaindex = new GovernedLlamaIndex(
  'http://localhost:8080',
  process.env.AXONFLOW_CLIENT_SECRET!,
  process.env.OPENAI_API_KEY!
);

// Create index from documents
const documents = [
  new Document({ text: 'Company policies document...', metadata: { source: 'policies.pdf' } }),
  new Document({ text: 'Employee handbook...', metadata: { source: 'handbook.pdf' } }),
];

await llamaindex.createIndex(documents);

// Query with governance
const result = await llamaindex.governedQuery(
  'user-123',
  'What is the vacation policy?',
  { department: 'hr' }
);

console.log(result.response);
console.log('Sources:', result.sources);

Integration Patterns

Pattern 1: Governed Chat Engine

Create a governed chat engine with conversation memory:

import {
  Document,
  VectorStoreIndex,
  ContextChatEngine,
  ChatMessage,
} from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

class GovernedChatEngine {
  private axonflow: GatewayModeClient;
  private chatEngine: ContextChatEngine | null = null;
  private conversationHistory: ChatMessage[] = [];

  constructor(axonflowUrl: string, clientSecret: string) {
    this.axonflow = new GatewayModeClient({
      agentUrl: axonflowUrl,
      clientId: 'llamaindex-chat',
      clientSecret,
    });
  }

  async initialize(index: VectorStoreIndex): Promise<void> {
    this.chatEngine = index.asContextChatEngine();
  }

  async chat(
    userToken: string,
    message: string,
    context?: Record<string, unknown>
  ): Promise<{ response: string; contextId: string }> {
    if (!this.chatEngine) {
      throw new Error('Chat engine not initialized');
    }

    const startTime = Date.now();

    // Pre-check
    const preCheck = await this.axonflow.preCheck({
      userToken,
      query: message,
      context: {
        ...context,
        conversation_length: this.conversationHistory.length,
        query_type: 'chat',
      },
    });

    if (!preCheck.approved) {
      throw new Error(`Chat blocked: ${preCheck.blockReason}`);
    }

    try {
      // Execute chat
      const response = await this.chatEngine.chat({
        message,
        chatHistory: this.conversationHistory,
      });

      // Update history
      this.conversationHistory.push(
        { role: 'user', content: message },
        { role: 'assistant', content: response.response }
      );

      const latencyMs = Date.now() - startTime;

      // Audit
      await this.axonflow.auditLLMCall({
        contextId: preCheck.contextId,
        responseSummary: response.response.slice(0, 200),
        provider: 'openai',
        model: 'gpt-4',
        tokenUsage: { promptTokens: 300, completionTokens: 150, totalTokens: 450 },
        latencyMs,
      });

      return {
        response: response.response,
        contextId: preCheck.contextId,
      };

    } catch (error) {
      const latencyMs = Date.now() - startTime;
      await this.axonflow.auditLLMCall({
        contextId: preCheck.contextId,
        responseSummary: `Error: ${error instanceof Error ? error.message : 'Unknown'}`,
        provider: 'openai',
        model: 'gpt-4',
        tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
        latencyMs,
        metadata: { error: true },
      });
      throw error;
    }
  }

  clearHistory(): void {
    this.conversationHistory = [];
  }
}

Pattern 2: Multi-Index Router with Governance

Route queries to different indexes based on context:

import { VectorStoreIndex, Document } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

interface IndexConfig {
  name: string;
  dataTier: 'public' | 'internal' | 'restricted';
  index: VectorStoreIndex;
}

class GovernedIndexRouter {
  private axonflow: GatewayModeClient;
  private indexes: Map<string, IndexConfig> = new Map();

  constructor(axonflowUrl: string, clientSecret: string) {
    this.axonflow = new GatewayModeClient({
      agentUrl: axonflowUrl,
      clientId: 'llamaindex-router',
      clientSecret,
    });
  }

  async addIndex(
    name: string,
    documents: Document[],
    dataTier: 'public' | 'internal' | 'restricted'
  ): Promise<void> {
    const index = await VectorStoreIndex.fromDocuments(documents);
    this.indexes.set(name, { name, dataTier, index });
  }

  async query(
    userToken: string,
    query: string,
    indexName: string
  ): Promise<{ response: string; sources: string[] }> {
    const indexConfig = this.indexes.get(indexName);
    if (!indexConfig) {
      throw new Error(`Index '${indexName}' not found`);
    }

    const startTime = Date.now();

    // Pre-check with index-specific context
    const preCheck = await this.axonflow.preCheck({
      userToken,
      query,
      context: {
        index_name: indexConfig.name,
        data_tier: indexConfig.dataTier,
        query_type: 'rag',
      },
    });

    if (!preCheck.approved) {
      throw new Error(`Query blocked: ${preCheck.blockReason}`);
    }

    const queryEngine = indexConfig.index.asQueryEngine();
    const result = await queryEngine.query({ query });

    const latencyMs = Date.now() - startTime;

    await this.axonflow.auditLLMCall({
      contextId: preCheck.contextId,
      responseSummary: result.response.slice(0, 200),
      provider: 'openai',
      model: 'gpt-4',
      tokenUsage: { promptTokens: 400, completionTokens: 150, totalTokens: 550 },
      latencyMs,
      metadata: {
        index_name: indexConfig.name,
        data_tier: indexConfig.dataTier,
      },
    });

    return {
      response: result.response,
      sources: result.sourceNodes?.map((n) => n.node.metadata?.source) || [],
    };
  }
}

// Usage
const router = new GovernedIndexRouter(
  'http://localhost:8080',
  process.env.AXONFLOW_CLIENT_SECRET!
);

// Add different indexes
await router.addIndex('public-docs', publicDocuments, 'public');
await router.addIndex('internal-wiki', internalDocuments, 'internal');
await router.addIndex('financial-reports', financialDocuments, 'restricted');

// Query with governance
const result = await router.query('user-123', 'What are our Q4 projections?', 'financial-reports');

Pattern 3: Streaming with Governance

Handle streaming responses with governance:

import { VectorStoreIndex } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

class GovernedStreamingQuery {
  private axonflow: GatewayModeClient;
  private index: VectorStoreIndex;

  constructor(
    axonflow: GatewayModeClient,
    index: VectorStoreIndex
  ) {
    this.axonflow = axonflow;
    this.index = index;
  }

  async *streamQuery(
    userToken: string,
    query: string
  ): AsyncGenerator<string, void, unknown> {
    const startTime = Date.now();
    let fullResponse = '';

    // Pre-check before streaming
    const preCheck = await this.axonflow.preCheck({
      userToken,
      query,
      context: { streaming: true },
    });

    if (!preCheck.approved) {
      throw new Error(`Query blocked: ${preCheck.blockReason}`);
    }

    try {
      const queryEngine = this.index.asQueryEngine();
      const stream = await queryEngine.query({
        query,
        stream: true,
      });

      for await (const chunk of stream) {
        fullResponse += chunk.response;
        yield chunk.response;
      }

      const latencyMs = Date.now() - startTime;

      // Audit after streaming completes
      await this.axonflow.auditLLMCall({
        contextId: preCheck.contextId,
        responseSummary: fullResponse.slice(0, 200),
        provider: 'openai',
        model: 'gpt-4',
        tokenUsage: {
          promptTokens: 400,
          completionTokens: fullResponse.split(' ').length,
          totalTokens: 400 + fullResponse.split(' ').length,
        },
        latencyMs,
        metadata: { streaming: true },
      });

    } catch (error) {
      const latencyMs = Date.now() - startTime;
      await this.axonflow.auditLLMCall({
        contextId: preCheck.contextId,
        responseSummary: `Stream error: ${error instanceof Error ? error.message : 'Unknown'}`,
        provider: 'openai',
        model: 'gpt-4',
        tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
        latencyMs,
        metadata: { error: true, streaming: true },
      });
      throw error;
    }
  }
}

// Usage
const streamingQuery = new GovernedStreamingQuery(axonflow, index);

for await (const chunk of streamingQuery.streamQuery('user-123', 'Explain our benefits')) {
  process.stdout.write(chunk);
}

Express.js API Example

Complete API server with governed LlamaIndex.TS:

import express from 'express';
import { Document, VectorStoreIndex, Settings, OpenAI } from 'llamaindex';
import { GatewayModeClient } from '@axonflow/sdk';

const app = express();
app.use(express.json());

// Initialize
Settings.llm = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  model: 'gpt-4',
});

const axonflow = new GatewayModeClient({
  agentUrl: process.env.AXONFLOW_URL || 'http://localhost:8080',
  clientId: 'llamaindex-api',
  clientSecret: process.env.AXONFLOW_CLIENT_SECRET!,
});

let index: VectorStoreIndex | null = null;

// Initialize index
app.post('/api/index', async (req, res) => {
  try {
    const { documents } = req.body;
    const docs = documents.map(
      (d: { text: string; source: string }) =>
        new Document({ text: d.text, metadata: { source: d.source } })
    );
    index = await VectorStoreIndex.fromDocuments(docs);
    res.json({ success: true, documentCount: docs.length });
  } catch (error) {
    res.status(500).json({ error: 'Failed to create index' });
  }
});

// Governed query
app.post('/api/query', async (req, res) => {
  if (!index) {
    return res.status(400).json({ error: 'Index not initialized' });
  }

  const { user_token, query, context } = req.body;
  const startTime = Date.now();

  try {
    // Pre-check
    const preCheck = await axonflow.preCheck({
      userToken: user_token,
      query,
      context: { ...context, framework: 'llamaindex-ts' },
    });

    if (!preCheck.approved) {
      return res.status(403).json({
        error: 'Query blocked by policy',
        reason: preCheck.blockReason,
      });
    }

    // Execute query
    const queryEngine = index.asQueryEngine();
    const result = await queryEngine.query({ query });

    const latencyMs = Date.now() - startTime;

    // Audit
    await axonflow.auditLLMCall({
      contextId: preCheck.contextId,
      responseSummary: result.response.slice(0, 200),
      provider: 'openai',
      model: 'gpt-4',
      tokenUsage: { promptTokens: 400, completionTokens: 150, totalTokens: 550 },
      latencyMs,
    });

    res.json({
      response: result.response,
      sources: result.sourceNodes?.map((n) => n.node.metadata?.source) || [],
      context_id: preCheck.contextId,
    });

  } catch (error) {
    res.status(500).json({ error: 'Query failed' });
  }
});

app.listen(3000, () => console.log('LlamaIndex.TS API running on :3000'));

AxonFlow Policy Configuration

Create policies for LlamaIndex.TS applications:

{
  "policies": [
    {
      "name": "llamaindex-rag-policy",
      "description": "Policy for LlamaIndex RAG queries",
      "enabled": true,
      "rules": [
        {
          "type": "pii_protection",
          "config": {
            "fields": ["email", "phone", "ssn"],
            "action": "mask"
          }
        },
        {
          "type": "rate_limit",
          "config": {
            "requests_per_minute": 30,
            "action": "throttle"
          }
        }
      ]
    },
    {
      "name": "llamaindex-restricted-policy",
      "description": "Policy for restricted data indexes",
      "enabled": true,
      "rules": [
        {
          "type": "content_filter",
          "config": {
            "blocked_patterns": ["confidential", "internal only"],
            "action": "block"
          }
        }
      ]
    }
  ]
}

Best Practices

1. Always Use Context IDs

const preCheck = await axonflow.preCheck({ userToken, query });
const contextId = preCheck.contextId; // Store immediately

// ... execute query ...

await axonflow.auditLLMCall({ contextId, ... }); // Use same contextId

2. Include Source Attribution in Audits

await axonflow.auditLLMCall({
  contextId,
  responseSummary: response,
  metadata: {
    source_count: sources.length,
    sources: sources.slice(0, 5), // First 5 sources
  },
  // ...
});

3. Handle Streaming Properly

Pre-check before streaming starts, audit after streaming completes with full response.

4. Use Index-Specific Context

const preCheck = await axonflow.preCheck({
  userToken,
  query,
  context: {
    index_name: 'financial-reports',
    data_tier: 'restricted',
    department: 'finance',
  },
});

Troubleshooting

Common Issues

Issue: Pre-check returns 401 Unauthorized

Verify X-Client-Secret header is correct
Check X-License-Key if using enterprise features
Ensure client_id is registered in AxonFlow

Issue: Audit calls failing

Verify context_id is from a valid pre-check (not expired)
Check that AxonFlow agent is healthy (/health endpoint)

Issue: Streaming audits missing

Ensure audit is called after stream completes
Collect full response during streaming for audit

Overview​

Why Use AxonFlow with LlamaIndex.TS?​

LlamaIndex.TS Strengths​

AxonFlow Strengths​

The Perfect Combination​

Integration Architecture​

Quick Start​

Prerequisites​

Install Dependencies​

AxonFlow API Overview​

TypeScript SDK Integration​

Create Governed LlamaIndex Service​

Integration Patterns​

Pattern 1: Governed Chat Engine​

Pattern 2: Multi-Index Router with Governance​

Pattern 3: Streaming with Governance​

Express.js API Example​

AxonFlow Policy Configuration​

Best Practices​

1. Always Use Context IDs​

2. Include Source Attribution in Audits​

3. Handle Streaming Properly​

4. Use Index-Specific Context​

Troubleshooting​

Common Issues​

Additional Resources​