Skip to main content

Gateway Mode - Lowest Latency AI Governance

Gateway Mode provides the lowest latency governance by letting you make direct LLM calls while AxonFlow handles policy enforcement and audit logging separately.

Why Gateway Mode?

BenefitDescription
Lowest LatencyDirect LLM calls, only 10-20ms overhead for pre-check
Your LLM KeysUse your own API keys and accounts
Full ControlChoose any provider, model, or configuration
Complete Audit TrailEvery LLM call logged with context
Cost TrackingAutomatic token usage and cost estimation

How It Works

  1. Your app calls getPolicyApprovedContext() or preCheck()
  2. AxonFlow evaluates policies and returns approval
  3. If approved, you make your LLM call directly
  4. Call auditLLMCall() to log the audit trail

Quick Start

TypeScript

import { AxonFlow } from '@axonflow/sdk';
import OpenAI from 'openai';

const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
licenseKey: process.env.AXONFLOW_LICENSE_KEY
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function queryWithGovernance(userToken: string, query: string) {
// 1. Pre-check: Get policy-approved context
const ctx = await axonflow.getPolicyApprovedContext({
userToken,
query,
dataSources: ['postgres']
});

if (!ctx.approved) {
throw new Error(`Query blocked: ${ctx.blockReason}`);
}

// 2. Make LLM call with approved data
const startTime = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: query }]
});
const latencyMs = Date.now() - startTime;

// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0
},
latencyMs
});

return response.choices[0].message.content;
}

Go

import (
"github.com/getaxonflow/axonflow-sdk-go"
"time"
)

client := axonflow.NewClient(axonflow.AxonFlowConfig{
AgentURL: os.Getenv("AXONFLOW_AGENT_URL"),
ClientID: os.Getenv("AXONFLOW_CLIENT_ID"),
ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})

// 1. Pre-check
ctx, err := client.GetPolicyApprovedContext(
userToken,
query,
[]string{"postgres"},
nil,
)
if err != nil {
return err
}
if !ctx.Approved {
return fmt.Errorf("blocked: %s", ctx.BlockReason)
}

// 2. Make LLM call (your code)
startTime := time.Now()
llmResponse, err := openaiClient.CreateChatCompletion(...)
latencyMs := time.Since(startTime).Milliseconds()

// 3. Audit
_, err = client.AuditLLMCall(
ctx.ContextID,
llmResponse.Choices[0].Message.Content[:100],
"openai",
"gpt-4",
axonflow.TokenUsage{
PromptTokens: llmResponse.Usage.PromptTokens,
CompletionTokens: llmResponse.Usage.CompletionTokens,
TotalTokens: llmResponse.Usage.TotalTokens,
},
latencyMs,
nil,
)

Python

from axonflow import AxonFlow, TokenUsage
from openai import AsyncOpenAI
import time

openai = AsyncOpenAI()

async with AxonFlow(
agent_url=os.environ["AXONFLOW_AGENT_URL"],
client_id=os.environ["AXONFLOW_CLIENT_ID"],
client_secret=os.environ["AXONFLOW_CLIENT_SECRET"]
) as client:
# 1. Pre-check
ctx = await client.get_policy_approved_context(
user_token="user-jwt",
query="Analyze customer data",
data_sources=["postgres"]
)

if not ctx.approved:
raise Exception(f"Blocked: {ctx.block_reason}")

# 2. Direct LLM call
start = time.time()
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
latency_ms = int((time.time() - start) * 1000)

# 3. Audit
await client.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.choices[0].message.content[:100],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
total_tokens=response.usage.total_tokens
),
latency_ms=latency_ms
)

When to Use Gateway Mode

Choose Gateway Mode when:

  • You need the lowest possible latency
  • You want to use your own LLM API keys
  • Your compliance allows client-side LLM calls
  • You're integrating with existing LLM workflows

Choose Proxy Mode when:

  • You need response filtering (PII detection)
  • You require 100% guaranteed audit trails
  • You want automatic LLM failover
  • Compliance requires all calls through a proxy

Configuration

const axonflow = new AxonFlow({
licenseKey: process.env.AXONFLOW_LICENSE_KEY,
endpoint: process.env.AXONFLOW_AGENT_URL,
tenant: 'your-tenant-id',
mode: 'production'
});

API Reference

Pre-Check Methods

getPolicyApprovedContext() / preCheck()

Both methods are equivalent - preCheck() is an alias for simpler API.

TypeScript:

const ctx = await axonflow.getPolicyApprovedContext({
userToken: string, // User's JWT token
query: string, // The query/prompt
dataSources?: string[], // Data sources being accessed
context?: object // Additional context for policy evaluation
});

Go:

ctx, err := client.GetPolicyApprovedContext(
userToken string,
query string,
dataSources []string,
context map[string]interface{},
)

Python:

ctx = await client.get_policy_approved_context(
user_token="...",
query="...",
data_sources=["..."],
)

Response:

{
contextId: string; // Unique ID to correlate with audit
approved: boolean; // Whether request was approved
approvedData: object; // Filtered data (if policies modified it)
policies: string[]; // Policies that were evaluated
expiresAt: Date; // When this approval expires
blockReason?: string; // Reason for blocking (if !approved)
rateLimitInfo?: { // Rate limit info (if applicable)
limit: number;
remaining: number;
resetAt: Date;
}
}

Audit Method

auditLLMCall()

TypeScript:

await axonflow.auditLLMCall({
contextId: string, // From pre-check response
responseSummary: string, // Summary of LLM response
provider: string, // e.g., "openai", "anthropic"
model: string, // e.g., "gpt-4", "claude-3-opus"
tokenUsage: {
promptTokens: number,
completionTokens: number,
totalTokens: number
},
latencyMs: number, // Call duration in milliseconds
metadata?: object // Optional additional metadata
});

Go:

result, err := client.AuditLLMCall(
contextId string,
responseSummary string,
provider string,
model string,
tokenUsage TokenUsage,
latencyMs int64,
metadata map[string]interface{},
)

Python:

await client.audit_llm_call(
context_id="...",
response_summary="...",
provider="openai",
model="gpt-4",
token_usage=TokenUsage(...),
latency_ms=250,
)

Audit Reliability

Gateway Mode audit logging provides high reliability through a multi-layer architecture:

Compliance Mode (Default)

In compliance mode, audit operations are synchronous:

  • Pre-check context is written to database before response returns
  • LLM call audits are written before acknowledgment
  • Guarantees 100% audit trail persistence
  • Required for EU AI Act and HIPAA compliance

Performance Mode (Optional)

For high-throughput systems, enable performance mode:

  • Audit operations are queued and processed asynchronously
  • ~99.9%+ audit reliability with disk fallback
  • Latency reduced to ~10-20ms overhead

Automatic Retry & Recovery

All audit operations include:

  1. Exponential Backoff Retry - 3 attempts (100ms, 200ms, 400ms delays)
  2. Disk Fallback - If database unavailable, entries written to fallback file
  3. Automatic Recovery - On restart, entries from fallback file are replayed to database

This ensures audit trails are never lost, even during database outages.

Audit Entry → Queue → Retry (3x) → ✅ DB Success
└─→ ❌ Fallback File → Recovered on Restart

Configuration

# Enable performance mode (optional)
AGENT_PERFORMANCE_MODE=true

# Fallback file location
AUDIT_FALLBACK_PATH=/var/lib/axonflow/audit/audit_fallback.jsonl

Cost Tracking

AxonFlow automatically estimates costs based on provider and model:

ProviderModelEstimated Cost
OpenAIgpt-4$0.03/1K tokens
OpenAIgpt-3.5-turbo$0.002/1K tokens
Anthropicclaude-3-sonnet$0.003/1K tokens
BedrockVariousBased on AWS pricing
OllamaLocal$0 (self-hosted)

Comparison with Proxy Mode

FeatureProxy ModeGateway Mode
Integration EffortMinimalModerate
Latency OverheadHigher (~50-100ms)Lower (~10-20ms)
Response FilteringYesNo
Audit Coverage100% automaticManual (call audit API)
LLM ControlLimitedFull
Best ForSimple appsFrameworks, performance

See Choosing a Mode for detailed guidance.

Next Steps


Enterprise customers: See the Gateway Mode Migration Guide for detailed architecture, migration steps, and troubleshooting.