Gateway Mode - Lowest Latency AI Governance
Gateway Mode provides the lowest latency governance by letting you make direct LLM calls while AxonFlow handles policy enforcement and audit logging separately.
Gateway Mode only evaluates static policies (PII detection, SQL injection blocking, dangerous query prevention). Custom policies created in the Customer Portal UI or via the Orchestrator API will NOT be enforced.
Static policies include:
- SQL injection detection (
sql_injection) - PII detection (SSN, credit cards, Aadhaar, PAN, email, phone)
- Dangerous query blocking (
DROP TABLE,TRUNCATE, etc.) - Admin access controls
Not supported in Gateway Mode:
- Custom dynamic policies created via UI or API
- Response content filtering
Need custom policies? Use Proxy Mode instead.
Prerequisites
| Language | Minimum Version | SDK Package | Install |
|---|---|---|---|
| TypeScript | Node.js 18+ | @axonflow/sdk v3.8.0 | npm install @axonflow/sdk |
| Python | 3.9+ | axonflow v3.8.0 | pip install axonflow |
| Go | 1.21+ | github.com/getaxonflow/axonflow-sdk-go/v3 v3.8.0 | go get github.com/getaxonflow/axonflow-sdk-go/v3 |
| Java | 11+ | com.getaxonflow:axonflow-sdk v3.8.0 | Add to pom.xml or build.gradle |
You also need:
- A running AxonFlow Agent (local Docker or SaaS endpoint)
AXONFLOW_CLIENT_IDandAXONFLOW_CLIENT_SECRET- Your own LLM provider API key (e.g.,
OPENAI_API_KEY)
Why Gateway Mode?
| Benefit | Description |
|---|---|
| Lowest Latency | Direct LLM calls, only 10-20ms overhead for pre-check |
| Your LLM Keys | Use your own API keys and accounts |
| Full Control | Choose any provider, model, or configuration |
| Complete Audit Trail | Every LLM call logged with context |
| Cost Tracking | Automatic token usage and cost estimation |
How It Works
- Your app calls
getPolicyApprovedContext()orpreCheck() - AxonFlow evaluates policies and returns approval
- If approved, you make your LLM call directly
- Call
auditLLMCall()to log the audit trail
Quick Start
TypeScript
import { AxonFlow } from '@axonflow/sdk'; // v3.2.0+
import OpenAI from 'openai';
const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function queryWithGovernance(userToken: string, query: string) {
// 1. Pre-check: Get policy-approved context
const ctx = await axonflow.getPolicyApprovedContext({
userToken,
query,
dataSources: ['postgres']
});
if (!ctx.approved) {
throw new Error(`Query blocked: ${ctx.blockReason}`);
}
// 2. Make LLM call with approved data
const startTime = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: query }]
});
const latencyMs = Date.now() - startTime;
// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0
},
latencyMs
});
return response.choices[0].message.content;
}
Go
import (
"github.com/getaxonflow/axonflow-sdk-go/v3" // v3.2.0+
"time"
)
client := axonflow.NewClient(axonflow.AxonFlowConfig{
Endpoint: os.Getenv("AXONFLOW_ENDPOINT"),
ClientID: os.Getenv("AXONFLOW_CLIENT_ID"),
ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})
// 1. Pre-check
ctx, err := client.GetPolicyApprovedContext(
userToken,
query,
[]string{"postgres"},
nil,
)
if err != nil {
return err
}
if !ctx.Approved {
return fmt.Errorf("blocked: %s", ctx.BlockReason)
}
// 2. Make LLM call (your code)
startTime := time.Now()
llmResponse, err := openaiClient.CreateChatCompletion(...)
latencyMs := time.Since(startTime).Milliseconds()
// 3. Audit
_, err = client.AuditLLMCall(
ctx.ContextID,
llmResponse.Choices[0].Message.Content[:100],
"openai",
"gpt-4",
axonflow.TokenUsage{
PromptTokens: llmResponse.Usage.PromptTokens,
CompletionTokens: llmResponse.Usage.CompletionTokens,
TotalTokens: llmResponse.Usage.TotalTokens,
},
latencyMs,
nil,
)
Python
from axonflow import AxonFlow, TokenUsage # v3.2.0+
from openai import AsyncOpenAI
import time
openai = AsyncOpenAI()
async with AxonFlow(
endpoint=os.environ.get("AXONFLOW_ENDPOINT", "http://localhost:8080"),
client_id=os.environ["AXONFLOW_CLIENT_ID"],
client_secret=os.environ["AXONFLOW_CLIENT_SECRET"]
) as client:
# 1. Pre-check
ctx = await client.get_policy_approved_context(
user_token="user-jwt",
query="Analyze customer data",
data_sources=["postgres"]
)
if not ctx.approved:
raise Exception(f"Blocked: {ctx.block_reason}")
# 2. Direct LLM call
start = time.time()
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
latency_ms = int((time.time() - start) * 1000)
# 3. Audit
await client.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.choices[0].message.content[:100],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
total_tokens=response.usage.total_tokens
),
latency_ms=latency_ms
)
Java
import com.getaxonflow.sdk.AxonFlow; // v3.2.0+
import com.getaxonflow.sdk.PolicyApprovalResult;
import com.getaxonflow.sdk.TokenUsage;
import com.openai.OpenAIClient;
AxonFlow axonflow = AxonFlow.builder()
.endpoint(System.getenv("AXONFLOW_ENDPOINT"))
.clientId(System.getenv("AXONFLOW_CLIENT_ID"))
.clientSecret(System.getenv("AXONFLOW_CLIENT_SECRET"))
.build();
OpenAIClient openai = new OpenAIClient(System.getenv("OPENAI_API_KEY"));
// 1. Pre-check
PolicyApprovalResult ctx = axonflow.getPolicyApprovedContext(
PolicyApprovalRequest.builder()
.userToken("user-123")
.query("Analyze customer data")
.dataSources(List.of("postgres"))
.build()
);
if (!ctx.isApproved()) {
throw new RuntimeException("Blocked: " + ctx.getBlockReason());
}
// 2. Direct LLM call
long startTime = System.currentTimeMillis();
ChatCompletion response = openai.chat().completions().create(
ChatCompletionRequest.builder()
.model("gpt-4")
.messages(List.of(
ChatMessage.user("Analyze customer data")
))
.build()
);
long latencyMs = System.currentTimeMillis() - startTime;
// 3. Audit
axonflow.auditLLMCall(
AuditRequest.builder()
.contextId(ctx.getContextId())
.responseSummary(response.choices().get(0).message().content().substring(0, 100))
.provider("openai")
.model("gpt-4")
.tokenUsage(TokenUsage.builder()
.promptTokens(response.usage().promptTokens())
.completionTokens(response.usage().completionTokens())
.totalTokens(response.usage().totalTokens())
.build())
.latencyMs(latencyMs)
.build()
);
Pre-Check Response Fields
The getPolicyApprovedContext() / preCheck() call returns an object with the following fields:
| Field | Type | Description |
|---|---|---|
contextId | string | Unique identifier to correlate pre-check with audit. Pass this to auditLLMCall(). |
approved | boolean | true if the request passed all policy checks. |
approvedData | object | Filtered/sanitized data safe to send to the LLM. May differ from original if policies modified content. |
policies | string[] | List of policy names that were evaluated during the pre-check. |
expiresAt | Date / time.Time | When this approval expires. Make your LLM call before this time. |
blockReason | string (optional) | Human-readable reason the request was blocked. Only present when approved is false. |
rateLimitInfo | object (optional) | Rate limit status: limit (max requests), remaining (requests left), resetAt (reset time). |
Error Handling
TypeScript
try {
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: prompt,
});
if (!ctx.approved) {
// Policy blocked the request -- do not make LLM call
console.log('Blocked:', ctx.blockReason);
return;
}
const start = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
});
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0,
},
latencyMs: Date.now() - start,
});
} catch (error) {
if (error.code === 'ECONNREFUSED') {
console.error('Cannot reach AxonFlow Agent - check endpoint');
} else if (error.code === 'TIMEOUT') {
console.error('Pre-check timed out');
} else if (error.status === 401) {
console.error('Authentication failed - check credentials');
} else {
console.error('Unexpected error:', error.message);
}
}
Go
ctx, err := client.GetPolicyApprovedContext(userToken, query, dataSources, nil)
if err != nil {
// Network error, timeout, or auth failure on pre-check
log.Printf("Pre-check failed: %v", err)
return
}
if !ctx.Approved {
log.Printf("Blocked: %s", ctx.BlockReason)
return
}
// Make LLM call...
start := time.Now()
llmResp, err := openaiClient.CreateChatCompletion(context.Background(), req)
if err != nil {
log.Printf("LLM call failed: %v", err)
return
}
// Audit -- log errors but do not fail the request
_, auditErr := client.AuditLLMCall(
ctx.ContextID, summary, "openai", "gpt-4", tokenUsage,
time.Since(start).Milliseconds(), nil,
)
if auditErr != nil {
log.Printf("Warning: audit logging failed: %v", auditErr)
}
When to Use Gateway Mode
Choose Gateway Mode when:
- You need the lowest possible latency
- You want to use your own LLM API keys
- Your compliance allows client-side LLM calls
- You're integrating with existing LLM workflows
Choose Proxy Mode when:
- You need response filtering (PII detection)
- You require 100% guaranteed audit trails
- You want automatic LLM failover
- Compliance requires all calls through a proxy
Configuration
const axonflow = new AxonFlow({
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
endpoint: process.env.AXONFLOW_ENDPOINT,
tenant: 'your-tenant-id',
mode: 'production'
});
API Reference
Pre-Check Methods
getPolicyApprovedContext() / preCheck()
Both methods are equivalent - preCheck() is an alias for simpler API.
TypeScript:
const ctx = await axonflow.getPolicyApprovedContext({
userToken: string, // User's JWT token
query: string, // The query/prompt
dataSources?: string[], // Data sources being accessed
context?: object // Additional context for policy evaluation
});
Go:
ctx, err := client.GetPolicyApprovedContext(
userToken string,
query string,
dataSources []string,
context map[string]interface{},
)
Python:
ctx = await client.get_policy_approved_context(
user_token="...",
query="...",
data_sources=["..."],
)
Response:
{
contextId: string; // Unique ID to correlate with audit
approved: boolean; // Whether request was approved
approvedData: object; // Filtered data (if policies modified it)
policies: string[]; // Policies that were evaluated
expiresAt: Date; // When this approval expires
blockReason?: string; // Reason for blocking (if !approved)
rateLimitInfo?: { // Rate limit info (if applicable)
limit: number;
remaining: number;
resetAt: Date;
}
}
Audit Method
auditLLMCall()
TypeScript:
await axonflow.auditLLMCall({
contextId: string, // From pre-check response
responseSummary: string, // Summary of LLM response
provider: string, // e.g., "openai", "anthropic"
model: string, // e.g., "gpt-4", "claude-opus-4"
tokenUsage: {
promptTokens: number,
completionTokens: number,
totalTokens: number
},
latencyMs: number, // Call duration in milliseconds
metadata?: object // Optional additional metadata
});
Go:
result, err := client.AuditLLMCall(
contextId string,
responseSummary string,
provider string,
model string,
tokenUsage TokenUsage,
latencyMs int64,
metadata map[string]interface{},
)
Python:
await client.audit_llm_call(
context_id="...",
response_summary="...",
provider="openai",
model="gpt-4",
token_usage=TokenUsage(...),
latency_ms=250,
)
Audit Reliability
Gateway Mode audit logging provides high reliability through a multi-layer architecture:
Compliance Mode (Default)
In compliance mode, audit operations are synchronous:
- Pre-check context is written to database before response returns
- LLM call audits are written before acknowledgment
- Guarantees 100% audit trail persistence
- Required for EU AI Act and HIPAA compliance
Performance Mode (Optional)
For high-throughput systems, enable performance mode:
- Audit operations are queued and processed asynchronously
- ~99.9%+ audit reliability with disk fallback
- Latency reduced to ~10-20ms overhead
Automatic Retry & Recovery
All audit operations include:
- Exponential Backoff Retry - 3 attempts (100ms, 200ms, 400ms delays)
- Disk Fallback - If database unavailable, entries written to fallback file
- Automatic Recovery - On restart, entries from fallback file are replayed to database
This ensures audit trails are never lost, even during database outages.
Audit Entry → Queue → Retry (3x) → ✅ DB Success
└─→ ❌ Fallback File → Recovered on Restart
Configuration
# Enable performance mode (optional)
AGENT_PERFORMANCE_MODE=true
# Fallback file location
AUDIT_FALLBACK_PATH=/var/lib/axonflow/audit/audit_fallback.jsonl
Cost Tracking
AxonFlow automatically estimates costs based on provider and model:
| Provider | Model | Estimated Cost |
|---|---|---|
| OpenAI | gpt-4 | $0.03/1K tokens |
| OpenAI | gpt-4o-mini | $0.002/1K tokens |
| Anthropic | claude-sonnet-4 | $0.003/1K tokens |
| Bedrock | Various | Based on AWS pricing |
| Ollama | Local | $0 (self-hosted) |
Comparison with Proxy Mode
| Feature | Proxy Mode | Gateway Mode |
|---|---|---|
| Integration Effort | Minimal | Moderate |
| Latency Overhead | Higher (~50-100ms) | Lower (~10-20ms) |
| Static Policies (PII, SQL injection) | ✅ Yes | ✅ Yes |
| Dynamic Policies (custom rules) | ✅ Yes | ❌ No |
| Response Filtering | Yes | No |
| Audit Coverage | 100% automatic | Manual (call audit API) |
| LLM Control | Limited | Full |
| Best For | Simple apps, custom policies | Frameworks, performance |
See Choosing a Mode for detailed guidance.
Next Steps
- Choosing a Mode - Decision guide
- Proxy Mode - Simpler alternative
- LLM Interceptors - Wrapper functions for LLM clients
- TypeScript SDK - Full TypeScript documentation
- Python SDK - Full Python documentation
- Go SDK - Full Go documentation
- Java SDK - Full Java documentation
- Authentication - License key setup
Enterprise customers: See the Gateway Mode Migration Guide for detailed architecture, migration steps, and troubleshooting.
