Gateway Mode - Lowest Latency AI Governance
Gateway Mode provides the lowest latency governance by letting you make direct LLM calls while AxonFlow handles policy enforcement and audit logging separately.
Why Gateway Mode?
| Benefit | Description |
|---|---|
| Lowest Latency | Direct LLM calls, only 10-20ms overhead for pre-check |
| Your LLM Keys | Use your own API keys and accounts |
| Full Control | Choose any provider, model, or configuration |
| Complete Audit Trail | Every LLM call logged with context |
| Cost Tracking | Automatic token usage and cost estimation |
How It Works
- Your app calls
getPolicyApprovedContext()orpreCheck() - AxonFlow evaluates policies and returns approval
- If approved, you make your LLM call directly
- Call
auditLLMCall()to log the audit trail
Quick Start
TypeScript
import { AxonFlow } from '@axonflow/sdk';
import OpenAI from 'openai';
const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
licenseKey: process.env.AXONFLOW_LICENSE_KEY
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function queryWithGovernance(userToken: string, query: string) {
// 1. Pre-check: Get policy-approved context
const ctx = await axonflow.getPolicyApprovedContext({
userToken,
query,
dataSources: ['postgres']
});
if (!ctx.approved) {
throw new Error(`Query blocked: ${ctx.blockReason}`);
}
// 2. Make LLM call with approved data
const startTime = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: query }]
});
const latencyMs = Date.now() - startTime;
// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0
},
latencyMs
});
return response.choices[0].message.content;
}
Go
import (
"github.com/getaxonflow/axonflow-sdk-go"
"time"
)
client := axonflow.NewClient(axonflow.AxonFlowConfig{
AgentURL: os.Getenv("AXONFLOW_AGENT_URL"),
ClientID: os.Getenv("AXONFLOW_CLIENT_ID"),
ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})
// 1. Pre-check
ctx, err := client.GetPolicyApprovedContext(
userToken,
query,
[]string{"postgres"},
nil,
)
if err != nil {
return err
}
if !ctx.Approved {
return fmt.Errorf("blocked: %s", ctx.BlockReason)
}
// 2. Make LLM call (your code)
startTime := time.Now()
llmResponse, err := openaiClient.CreateChatCompletion(...)
latencyMs := time.Since(startTime).Milliseconds()
// 3. Audit
_, err = client.AuditLLMCall(
ctx.ContextID,
llmResponse.Choices[0].Message.Content[:100],
"openai",
"gpt-4",
axonflow.TokenUsage{
PromptTokens: llmResponse.Usage.PromptTokens,
CompletionTokens: llmResponse.Usage.CompletionTokens,
TotalTokens: llmResponse.Usage.TotalTokens,
},
latencyMs,
nil,
)
Python
from axonflow import AxonFlow, TokenUsage
from openai import AsyncOpenAI
import time
openai = AsyncOpenAI()
async with AxonFlow(
agent_url=os.environ["AXONFLOW_AGENT_URL"],
client_id=os.environ["AXONFLOW_CLIENT_ID"],
client_secret=os.environ["AXONFLOW_CLIENT_SECRET"]
) as client:
# 1. Pre-check
ctx = await client.get_policy_approved_context(
user_token="user-jwt",
query="Analyze customer data",
data_sources=["postgres"]
)
if not ctx.approved:
raise Exception(f"Blocked: {ctx.block_reason}")
# 2. Direct LLM call
start = time.time()
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
latency_ms = int((time.time() - start) * 1000)
# 3. Audit
await client.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.choices[0].message.content[:100],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
total_tokens=response.usage.total_tokens
),
latency_ms=latency_ms
)
When to Use Gateway Mode
Choose Gateway Mode when:
- You need the lowest possible latency
- You want to use your own LLM API keys
- Your compliance allows client-side LLM calls
- You're integrating with existing LLM workflows
Choose Proxy Mode when:
- You need response filtering (PII detection)
- You require 100% guaranteed audit trails
- You want automatic LLM failover
- Compliance requires all calls through a proxy
Configuration
const axonflow = new AxonFlow({
licenseKey: process.env.AXONFLOW_LICENSE_KEY,
endpoint: process.env.AXONFLOW_AGENT_URL,
tenant: 'your-tenant-id',
mode: 'production'
});
API Reference
Pre-Check Methods
getPolicyApprovedContext() / preCheck()
Both methods are equivalent - preCheck() is an alias for simpler API.
TypeScript:
const ctx = await axonflow.getPolicyApprovedContext({
userToken: string, // User's JWT token
query: string, // The query/prompt
dataSources?: string[], // Data sources being accessed
context?: object // Additional context for policy evaluation
});
Go:
ctx, err := client.GetPolicyApprovedContext(
userToken string,
query string,
dataSources []string,
context map[string]interface{},
)
Python:
ctx = await client.get_policy_approved_context(
user_token="...",
query="...",
data_sources=["..."],
)
Response:
{
contextId: string; // Unique ID to correlate with audit
approved: boolean; // Whether request was approved
approvedData: object; // Filtered data (if policies modified it)
policies: string[]; // Policies that were evaluated
expiresAt: Date; // When this approval expires
blockReason?: string; // Reason for blocking (if !approved)
rateLimitInfo?: { // Rate limit info (if applicable)
limit: number;
remaining: number;
resetAt: Date;
}
}
Audit Method
auditLLMCall()
TypeScript:
await axonflow.auditLLMCall({
contextId: string, // From pre-check response
responseSummary: string, // Summary of LLM response
provider: string, // e.g., "openai", "anthropic"
model: string, // e.g., "gpt-4", "claude-3-opus"
tokenUsage: {
promptTokens: number,
completionTokens: number,
totalTokens: number
},
latencyMs: number, // Call duration in milliseconds
metadata?: object // Optional additional metadata
});
Go:
result, err := client.AuditLLMCall(
contextId string,
responseSummary string,
provider string,
model string,
tokenUsage TokenUsage,
latencyMs int64,
metadata map[string]interface{},
)
Python:
await client.audit_llm_call(
context_id="...",
response_summary="...",
provider="openai",
model="gpt-4",
token_usage=TokenUsage(...),
latency_ms=250,
)
Audit Reliability
Gateway Mode audit logging provides high reliability through a multi-layer architecture:
Compliance Mode (Default)
In compliance mode, audit operations are synchronous:
- Pre-check context is written to database before response returns
- LLM call audits are written before acknowledgment
- Guarantees 100% audit trail persistence
- Required for EU AI Act and HIPAA compliance
Performance Mode (Optional)
For high-throughput systems, enable performance mode:
- Audit operations are queued and processed asynchronously
- ~99.9%+ audit reliability with disk fallback
- Latency reduced to ~10-20ms overhead
Automatic Retry & Recovery
All audit operations include:
- Exponential Backoff Retry - 3 attempts (100ms, 200ms, 400ms delays)
- Disk Fallback - If database unavailable, entries written to fallback file
- Automatic Recovery - On restart, entries from fallback file are replayed to database
This ensures audit trails are never lost, even during database outages.
Audit Entry → Queue → Retry (3x) → ✅ DB Success
└─→ ❌ Fallback File → Recovered on Restart
Configuration
# Enable performance mode (optional)
AGENT_PERFORMANCE_MODE=true
# Fallback file location
AUDIT_FALLBACK_PATH=/var/lib/axonflow/audit/audit_fallback.jsonl
Cost Tracking
AxonFlow automatically estimates costs based on provider and model:
| Provider | Model | Estimated Cost |
|---|---|---|
| OpenAI | gpt-4 | $0.03/1K tokens |
| OpenAI | gpt-3.5-turbo | $0.002/1K tokens |
| Anthropic | claude-3-sonnet | $0.003/1K tokens |
| Bedrock | Various | Based on AWS pricing |
| Ollama | Local | $0 (self-hosted) |
Comparison with Proxy Mode
| Feature | Proxy Mode | Gateway Mode |
|---|---|---|
| Integration Effort | Minimal | Moderate |
| Latency Overhead | Higher (~50-100ms) | Lower (~10-20ms) |
| Response Filtering | Yes | No |
| Audit Coverage | 100% automatic | Manual (call audit API) |
| LLM Control | Limited | Full |
| Best For | Simple apps | Frameworks, performance |
See Choosing a Mode for detailed guidance.
Next Steps
- Choosing a Mode - Decision guide
- Proxy Mode - Simpler alternative
- TypeScript SDK - Full TypeScript documentation
- Go SDK - Full Go documentation
- Python SDK - Full Python documentation
- Authentication - License key setup
Enterprise customers: See the Gateway Mode Migration Guide for detailed architecture, migration steps, and troubleshooting.