Gateway Mode - Lowest Latency AI Governance
Gateway Mode provides the lowest latency governance by letting you make direct LLM calls while AxonFlow handles policy enforcement and audit logging separately.
Gateway Mode only evaluates static policies (PII detection, SQL injection blocking, dangerous query prevention). Custom policies created in the Customer Portal UI or via the Orchestrator API will NOT be enforced.
Static policies include:
- SQL injection detection (
sql_injection) - PII detection (SSN, credit cards, Aadhaar, PAN, email, phone)
- Dangerous query blocking (
DROP TABLE,TRUNCATE, etc.) - Admin access controls
Not supported in Gateway Mode:
- Custom dynamic policies created via UI or API
- Response content filtering
Need custom policies? Use Proxy Mode instead.
Why Gateway Mode?
| Benefit | Description |
|---|---|
| Lowest Latency | Direct LLM calls, only 10-20ms overhead for pre-check |
| Your LLM Keys | Use your own API keys and accounts |
| Full Control | Choose any provider, model, or configuration |
| Complete Audit Trail | Every LLM call logged with context |
| Cost Tracking | Automatic token usage and cost estimation |
How It Works
- Your app calls
getPolicyApprovedContext()orpreCheck() - AxonFlow evaluates policies and returns approval
- If approved, you make your LLM call directly
- Call
auditLLMCall()to log the audit trail
Quick Start
TypeScript
import { AxonFlow } from '@axonflow/sdk'; // v1.7.1+
import OpenAI from 'openai';
const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function queryWithGovernance(userToken: string, query: string) {
// 1. Pre-check: Get policy-approved context
const ctx = await axonflow.getPolicyApprovedContext({
userToken,
query,
dataSources: ['postgres']
});
if (!ctx.approved) {
throw new Error(`Query blocked: ${ctx.blockReason}`);
}
// 2. Make LLM call with approved data
const startTime = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: query }]
});
const latencyMs = Date.now() - startTime;
// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0
},
latencyMs
});
return response.choices[0].message.content;
}
Go
import (
"github.com/getaxonflow/axonflow-sdk-go" // v1.10.0+
"time"
)
client := axonflow.NewClient(axonflow.AxonFlowConfig{
Endpoint: os.Getenv("AXONFLOW_ENDPOINT"),
ClientID: os.Getenv("AXONFLOW_CLIENT_ID"),
ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})
// 1. Pre-check
ctx, err := client.GetPolicyApprovedContext(
userToken,
query,
[]string{"postgres"},
nil,
)
if err != nil {
return err
}
if !ctx.Approved {
return fmt.Errorf("blocked: %s", ctx.BlockReason)
}
// 2. Make LLM call (your code)
startTime := time.Now()
llmResponse, err := openaiClient.CreateChatCompletion(...)
latencyMs := time.Since(startTime).Milliseconds()
// 3. Audit
_, err = client.AuditLLMCall(
ctx.ContextID,
llmResponse.Choices[0].Message.Content[:100],
"openai",
"gpt-4",
axonflow.TokenUsage{
PromptTokens: llmResponse.Usage.PromptTokens,
CompletionTokens: llmResponse.Usage.CompletionTokens,
TotalTokens: llmResponse.Usage.TotalTokens,
},
latencyMs,
nil,
)
Python
from axonflow import AxonFlow, TokenUsage # v0.5.0+
from openai import AsyncOpenAI
import time
openai = AsyncOpenAI()
async with AxonFlow(
endpoint=os.environ.get("AXONFLOW_ENDPOINT", "http://localhost:8080"),
client_id=os.environ["AXONFLOW_CLIENT_ID"],
client_secret=os.environ["AXONFLOW_CLIENT_SECRET"]
) as client:
# 1. Pre-check
ctx = await client.get_policy_approved_context(
user_token="user-jwt",
query="Analyze customer data",
data_sources=["postgres"]
)
if not ctx.approved:
raise Exception(f"Blocked: {ctx.block_reason}")
# 2. Direct LLM call
start = time.time()
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
latency_ms = int((time.time() - start) * 1000)
# 3. Audit
await client.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.choices[0].message.content[:100],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
total_tokens=response.usage.total_tokens
),
latency_ms=latency_ms
)
Java
import com.getaxonflow.sdk.AxonFlow; // v2.7.1+
import com.getaxonflow.sdk.PolicyApprovalResult;
import com.getaxonflow.sdk.TokenUsage;
import com.openai.OpenAIClient;
AxonFlow axonflow = AxonFlow.builder()
.endpoint(System.getenv("AXONFLOW_ENDPOINT"))
.clientId(System.getenv("AXONFLOW_CLIENT_ID"))
.clientSecret(System.getenv("AXONFLOW_CLIENT_SECRET"))
.build();
OpenAIClient openai = new OpenAIClient(System.getenv("OPENAI_API_KEY"));
// 1. Pre-check
PolicyApprovalResult ctx = axonflow.getPolicyApprovedContext(
PolicyApprovalRequest.builder()
.userToken("user-123")
.query("Analyze customer data")
.dataSources(List.of("postgres"))
.build()
);
if (!ctx.isApproved()) {
throw new RuntimeException("Blocked: " + ctx.getBlockReason());
}
// 2. Direct LLM call
long startTime = System.currentTimeMillis();
ChatCompletion response = openai.chat().completions().create(
ChatCompletionRequest.builder()
.model("gpt-4")
.messages(List.of(
ChatMessage.user("Analyze customer data")
))
.build()
);
long latencyMs = System.currentTimeMillis() - startTime;
// 3. Audit
axonflow.auditLLMCall(
AuditRequest.builder()
.contextId(ctx.getContextId())
.responseSummary(response.choices().get(0).message().content().substring(0, 100))
.provider("openai")
.model("gpt-4")
.tokenUsage(TokenUsage.builder()
.promptTokens(response.usage().promptTokens())
.completionTokens(response.usage().completionTokens())
.totalTokens(response.usage().totalTokens())
.build())
.latencyMs(latencyMs)
.build()
);
When to Use Gateway Mode
Choose Gateway Mode when:
- You need the lowest possible latency
- You want to use your own LLM API keys
- Your compliance allows client-side LLM calls
- You're integrating with existing LLM workflows
Choose Proxy Mode when:
- You need response filtering (PII detection)
- You require 100% guaranteed audit trails
- You want automatic LLM failover
- Compliance requires all calls through a proxy
Configuration
const axonflow = new AxonFlow({
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
endpoint: process.env.AXONFLOW_ENDPOINT,
tenant: 'your-tenant-id',
mode: 'production'
});
API Reference
Pre-Check Methods
getPolicyApprovedContext() / preCheck()
Both methods are equivalent - preCheck() is an alias for simpler API.
TypeScript:
const ctx = await axonflow.getPolicyApprovedContext({
userToken: string, // User's JWT token
query: string, // The query/prompt
dataSources?: string[], // Data sources being accessed
context?: object // Additional context for policy evaluation
});
Go:
ctx, err := client.GetPolicyApprovedContext(
userToken string,
query string,
dataSources []string,
context map[string]interface{},
)
Python:
ctx = await client.get_policy_approved_context(
user_token="...",
query="...",
data_sources=["..."],
)
Response:
{
contextId: string; // Unique ID to correlate with audit
approved: boolean; // Whether request was approved
approvedData: object; // Filtered data (if policies modified it)
policies: string[]; // Policies that were evaluated
expiresAt: Date; // When this approval expires
blockReason?: string; // Reason for blocking (if !approved)
rateLimitInfo?: { // Rate limit info (if applicable)
limit: number;
remaining: number;
resetAt: Date;
}
}
Audit Method
auditLLMCall()
TypeScript:
await axonflow.auditLLMCall({
contextId: string, // From pre-check response
responseSummary: string, // Summary of LLM response
provider: string, // e.g., "openai", "anthropic"
model: string, // e.g., "gpt-4", "claude-3-opus"
tokenUsage: {
promptTokens: number,
completionTokens: number,
totalTokens: number
},
latencyMs: number, // Call duration in milliseconds
metadata?: object // Optional additional metadata
});
Go:
result, err := client.AuditLLMCall(
contextId string,
responseSummary string,
provider string,
model string,
tokenUsage TokenUsage,
latencyMs int64,
metadata map[string]interface{},
)
Python:
await client.audit_llm_call(
context_id="...",
response_summary="...",
provider="openai",
model="gpt-4",
token_usage=TokenUsage(...),
latency_ms=250,
)
Audit Reliability
Gateway Mode audit logging provides high reliability through a multi-layer architecture:
Compliance Mode (Default)
In compliance mode, audit operations are synchronous:
- Pre-check context is written to database before response returns
- LLM call audits are written before acknowledgment
- Guarantees 100% audit trail persistence
- Required for EU AI Act and HIPAA compliance
Performance Mode (Optional)
For high-throughput systems, enable performance mode:
- Audit operations are queued and processed asynchronously
- ~99.9%+ audit reliability with disk fallback
- Latency reduced to ~10-20ms overhead
Automatic Retry & Recovery
All audit operations include:
- Exponential Backoff Retry - 3 attempts (100ms, 200ms, 400ms delays)
- Disk Fallback - If database unavailable, entries written to fallback file
- Automatic Recovery - On restart, entries from fallback file are replayed to database
This ensures audit trails are never lost, even during database outages.
Audit Entry → Queue → Retry (3x) → ✅ DB Success
└─→ ❌ Fallback File → Recovered on Restart
Configuration
# Enable performance mode (optional)
AGENT_PERFORMANCE_MODE=true
# Fallback file location
AUDIT_FALLBACK_PATH=/var/lib/axonflow/audit/audit_fallback.jsonl
Cost Tracking
AxonFlow automatically estimates costs based on provider and model:
| Provider | Model | Estimated Cost |
|---|---|---|
| OpenAI | gpt-4 | $0.03/1K tokens |
| OpenAI | gpt-3.5-turbo | $0.002/1K tokens |
| Anthropic | claude-3-sonnet | $0.003/1K tokens |
| Bedrock | Various | Based on AWS pricing |
| Ollama | Local | $0 (self-hosted) |
Comparison with Proxy Mode
| Feature | Proxy Mode | Gateway Mode |
|---|---|---|
| Integration Effort | Minimal | Moderate |
| Latency Overhead | Higher (~50-100ms) | Lower (~10-20ms) |
| Static Policies (PII, SQL injection) | ✅ Yes | ✅ Yes |
| Dynamic Policies (custom rules) | ✅ Yes | ❌ No |
| Response Filtering | Yes | No |
| Audit Coverage | 100% automatic | Manual (call audit API) |
| LLM Control | Limited | Full |
| Best For | Simple apps, custom policies | Frameworks, performance |
See Choosing a Mode for detailed guidance.
Next Steps
- Choosing a Mode - Decision guide
- Proxy Mode - Simpler alternative
- LLM Interceptors - Wrapper functions for LLM clients
- TypeScript SDK - Full TypeScript documentation
- Python SDK - Full Python documentation
- Go SDK - Full Go documentation
- Java SDK - Full Java documentation
- Authentication - License key setup
Enterprise customers: See the Gateway Mode Migration Guide for detailed architecture, migration steps, and troubleshooting.