Gateway Mode - Lowest Latency AI Governance

Gateway Mode provides the lowest latency governance by letting you make direct LLM calls while AxonFlow handles policy enforcement and audit logging separately.

Why Gateway Mode?

Benefit	Description
Lowest Latency	Direct LLM calls, only 10-20ms overhead for pre-check
Your LLM Keys	Use your own API keys and accounts
Full Control	Choose any provider, model, or configuration
Complete Audit Trail	Every LLM call logged with context
Cost Tracking	Automatic token usage and cost estimation

How It Works

Your app calls getPolicyApprovedContext() or preCheck()
AxonFlow evaluates policies and returns approval
If approved, you make your LLM call directly
Call auditLLMCall() to log the audit trail

Quick Start

TypeScript

import { AxonFlow } from '@axonflow/sdk';
import OpenAI from 'openai';

const axonflow = new AxonFlow({
  endpoint: process.env.AXONFLOW_ENDPOINT,
  licenseKey: process.env.AXONFLOW_LICENSE_KEY
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function queryWithGovernance(userToken: string, query: string) {
  // 1. Pre-check: Get policy-approved context
  const ctx = await axonflow.getPolicyApprovedContext({
    userToken,
    query,
    dataSources: ['postgres']
  });

  if (!ctx.approved) {
    throw new Error(`Query blocked: ${ctx.blockReason}`);
  }

  // 2. Make LLM call with approved data
  const startTime = Date.now();
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: query }]
  });
  const latencyMs = Date.now() - startTime;

  // 3. Audit the call
  await axonflow.auditLLMCall({
    contextId: ctx.contextId,
    responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
    provider: 'openai',
    model: 'gpt-4',
    tokenUsage: {
      promptTokens: response.usage?.prompt_tokens || 0,
      completionTokens: response.usage?.completion_tokens || 0,
      totalTokens: response.usage?.total_tokens || 0
    },
    latencyMs
  });

  return response.choices[0].message.content;
}

Go

import (
    "github.com/getaxonflow/axonflow-sdk-go"
    "time"
)

client := axonflow.NewClient(axonflow.AxonFlowConfig{
    AgentURL:     os.Getenv("AXONFLOW_AGENT_URL"),
    ClientID:     os.Getenv("AXONFLOW_CLIENT_ID"),
    ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})

// 1. Pre-check
ctx, err := client.GetPolicyApprovedContext(
    userToken,
    query,
    []string{"postgres"},
    nil,
)
if err != nil {
    return err
}
if !ctx.Approved {
    return fmt.Errorf("blocked: %s", ctx.BlockReason)
}

// 2. Make LLM call (your code)
startTime := time.Now()
llmResponse, err := openaiClient.CreateChatCompletion(...)
latencyMs := time.Since(startTime).Milliseconds()

// 3. Audit
_, err = client.AuditLLMCall(
    ctx.ContextID,
    llmResponse.Choices[0].Message.Content[:100],
    "openai",
    "gpt-4",
    axonflow.TokenUsage{
        PromptTokens:     llmResponse.Usage.PromptTokens,
        CompletionTokens: llmResponse.Usage.CompletionTokens,
        TotalTokens:      llmResponse.Usage.TotalTokens,
    },
    latencyMs,
    nil,
)

Python

from axonflow import AxonFlow, TokenUsage
from openai import AsyncOpenAI
import time

openai = AsyncOpenAI()

async with AxonFlow(
    agent_url=os.environ["AXONFLOW_AGENT_URL"],
    client_id=os.environ["AXONFLOW_CLIENT_ID"],
    client_secret=os.environ["AXONFLOW_CLIENT_SECRET"]
) as client:
    # 1. Pre-check
    ctx = await client.get_policy_approved_context(
        user_token="user-jwt",
        query="Analyze customer data",
        data_sources=["postgres"]
    )

    if not ctx.approved:
        raise Exception(f"Blocked: {ctx.block_reason}")

    # 2. Direct LLM call
    start = time.time()
    response = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": query}]
    )
    latency_ms = int((time.time() - start) * 1000)

    # 3. Audit
    await client.audit_llm_call(
        context_id=ctx.context_id,
        response_summary=response.choices[0].message.content[:100],
        provider="openai",
        model="gpt-4",
        token_usage=TokenUsage(
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
            total_tokens=response.usage.total_tokens
        ),
        latency_ms=latency_ms
    )

When to Use Gateway Mode

Choose Gateway Mode when:

You need the lowest possible latency
You want to use your own LLM API keys
Your compliance allows client-side LLM calls
You're integrating with existing LLM workflows

Choose Proxy Mode when:

You need response filtering (PII detection)
You require 100% guaranteed audit trails
You want automatic LLM failover
Compliance requires all calls through a proxy

Configuration

const axonflow = new AxonFlow({
  licenseKey: process.env.AXONFLOW_LICENSE_KEY,
  endpoint: process.env.AXONFLOW_AGENT_URL,
  tenant: 'your-tenant-id',
  mode: 'production'
});

API Reference

Pre-Check Methods

`getPolicyApprovedContext()` / `preCheck()`

Both methods are equivalent - preCheck() is an alias for simpler API.

TypeScript:

const ctx = await axonflow.getPolicyApprovedContext({
  userToken: string,      // User's JWT token
  query: string,          // The query/prompt
  dataSources?: string[], // Data sources being accessed
  context?: object        // Additional context for policy evaluation
});

Go:

ctx, err := client.GetPolicyApprovedContext(
    userToken string,
    query string,
    dataSources []string,
    context map[string]interface{},
)

Python:

ctx = await client.get_policy_approved_context(
    user_token="...",
    query="...",
    data_sources=["..."],
)

Response:

{
  contextId: string;      // Unique ID to correlate with audit
  approved: boolean;      // Whether request was approved
  approvedData: object;   // Filtered data (if policies modified it)
  policies: string[];     // Policies that were evaluated
  expiresAt: Date;        // When this approval expires
  blockReason?: string;   // Reason for blocking (if !approved)
  rateLimitInfo?: {       // Rate limit info (if applicable)
    limit: number;
    remaining: number;
    resetAt: Date;
  }
}

Audit Method

`auditLLMCall()`

TypeScript:

await axonflow.auditLLMCall({
  contextId: string,        // From pre-check response
  responseSummary: string,  // Summary of LLM response
  provider: string,         // e.g., "openai", "anthropic"
  model: string,            // e.g., "gpt-4", "claude-3-opus"
  tokenUsage: {
    promptTokens: number,
    completionTokens: number,
    totalTokens: number
  },
  latencyMs: number,        // Call duration in milliseconds
  metadata?: object         // Optional additional metadata
});

Go:

result, err := client.AuditLLMCall(
    contextId string,
    responseSummary string,
    provider string,
    model string,
    tokenUsage TokenUsage,
    latencyMs int64,
    metadata map[string]interface{},
)

Python:

await client.audit_llm_call(
    context_id="...",
    response_summary="...",
    provider="openai",
    model="gpt-4",
    token_usage=TokenUsage(...),
    latency_ms=250,
)

Audit Reliability

Gateway Mode audit logging provides high reliability through a multi-layer architecture:

Compliance Mode (Default)

In compliance mode, audit operations are synchronous:

Pre-check context is written to database before response returns
LLM call audits are written before acknowledgment
Guarantees 100% audit trail persistence
Required for EU AI Act and HIPAA compliance

Performance Mode (Optional)

For high-throughput systems, enable performance mode:

Audit operations are queued and processed asynchronously
~99.9%+ audit reliability with disk fallback
Latency reduced to ~10-20ms overhead

Automatic Retry & Recovery

All audit operations include:

Exponential Backoff Retry - 3 attempts (100ms, 200ms, 400ms delays)
Disk Fallback - If database unavailable, entries written to fallback file
Automatic Recovery - On restart, entries from fallback file are replayed to database

This ensures audit trails are never lost, even during database outages.

Audit Entry → Queue → Retry (3x) → ✅ DB Success
                              └─→ ❌ Fallback File → Recovered on Restart

Configuration

# Enable performance mode (optional)
AGENT_PERFORMANCE_MODE=true

# Fallback file location
AUDIT_FALLBACK_PATH=/var/lib/axonflow/audit/audit_fallback.jsonl

Cost Tracking

AxonFlow automatically estimates costs based on provider and model:

Provider	Model	Estimated Cost
OpenAI	gpt-4	$0.03/1K tokens
OpenAI	gpt-3.5-turbo	$0.002/1K tokens
Anthropic	claude-3-sonnet	$0.003/1K tokens
Bedrock	Various	Based on AWS pricing
Ollama	Local	$0 (self-hosted)

Comparison with Proxy Mode

Feature	Proxy Mode	Gateway Mode
Integration Effort	Minimal	Moderate
Latency Overhead	Higher (~50-100ms)	Lower (~10-20ms)
Response Filtering	Yes	No
Audit Coverage	100% automatic	Manual (call audit API)
LLM Control	Limited	Full
Best For	Simple apps	Frameworks, performance

See Choosing a Mode for detailed guidance.

Next Steps

Choosing a Mode - Decision guide
Proxy Mode - Simpler alternative
TypeScript SDK - Full TypeScript documentation
Go SDK - Full Go documentation
Python SDK - Full Python documentation
Authentication - License key setup

Enterprise customers: See the Gateway Mode Migration Guide for detailed architecture, migration steps, and troubleshooting.

Why Gateway Mode?​

How It Works​

Quick Start​

TypeScript​

Go​

Python​

When to Use Gateway Mode​

Configuration​

API Reference​

Pre-Check Methods​

getPolicyApprovedContext() / preCheck()​

Audit Method​

auditLLMCall()​

Audit Reliability​

Compliance Mode (Default)​

Performance Mode (Optional)​

Automatic Retry & Recovery​

Configuration​

Cost Tracking​

Comparison with Proxy Mode​

Next Steps​

Why Gateway Mode?

How It Works

Quick Start

TypeScript

Go

Python

When to Use Gateway Mode

Configuration

API Reference

Pre-Check Methods

`getPolicyApprovedContext()` / `preCheck()`

Audit Method

`auditLLMCall()`

Audit Reliability

Compliance Mode (Default)

Performance Mode (Optional)

Automatic Retry & Recovery

Configuration

Cost Tracking

Comparison with Proxy Mode

Next Steps