Skip to main content

Gateway Mode - Lowest Latency AI Governance

Gateway Mode provides the lowest latency governance by letting you make direct LLM calls while AxonFlow handles policy enforcement and audit logging separately.

Static Policies Only

Gateway Mode only evaluates static policies (PII detection, SQL injection blocking, dangerous query prevention). Custom policies created in the Customer Portal UI or via the Orchestrator API will NOT be enforced.

Static policies include:

  • SQL injection detection (sql_injection)
  • PII detection (SSN, credit cards, Aadhaar, PAN, email, phone)
  • Dangerous query blocking (DROP TABLE, TRUNCATE, etc.)
  • Admin access controls

Not supported in Gateway Mode:

  • Custom dynamic policies created via UI or API
  • Response content filtering

Need custom policies? Use Proxy Mode instead.

Prerequisites

LanguageMinimum VersionSDK PackageInstall
TypeScriptNode.js 18+@axonflow/sdk v3.8.0npm install @axonflow/sdk
Python3.9+axonflow v3.8.0pip install axonflow
Go1.21+github.com/getaxonflow/axonflow-sdk-go/v3 v3.8.0go get github.com/getaxonflow/axonflow-sdk-go/v3
Java11+com.getaxonflow:axonflow-sdk v3.8.0Add to pom.xml or build.gradle

You also need:

  • A running AxonFlow Agent (local Docker or SaaS endpoint)
  • AXONFLOW_CLIENT_ID and AXONFLOW_CLIENT_SECRET
  • Your own LLM provider API key (e.g., OPENAI_API_KEY)

Why Gateway Mode?

BenefitDescription
Lowest LatencyDirect LLM calls, only 10-20ms overhead for pre-check
Your LLM KeysUse your own API keys and accounts
Full ControlChoose any provider, model, or configuration
Complete Audit TrailEvery LLM call logged with context
Cost TrackingAutomatic token usage and cost estimation

How It Works

  1. Your app calls getPolicyApprovedContext() or preCheck()
  2. AxonFlow evaluates policies and returns approval
  3. If approved, you make your LLM call directly
  4. Call auditLLMCall() to log the audit trail

Quick Start

TypeScript

import { AxonFlow } from '@axonflow/sdk';  // v3.2.0+
import OpenAI from 'openai';

const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function queryWithGovernance(userToken: string, query: string) {
// 1. Pre-check: Get policy-approved context
const ctx = await axonflow.getPolicyApprovedContext({
userToken,
query,
dataSources: ['postgres']
});

if (!ctx.approved) {
throw new Error(`Query blocked: ${ctx.blockReason}`);
}

// 2. Make LLM call with approved data
const startTime = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: query }]
});
const latencyMs = Date.now() - startTime;

// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0
},
latencyMs
});

return response.choices[0].message.content;
}

Go

import (
"github.com/getaxonflow/axonflow-sdk-go/v3" // v3.2.0+
"time"
)

client := axonflow.NewClient(axonflow.AxonFlowConfig{
Endpoint: os.Getenv("AXONFLOW_ENDPOINT"),
ClientID: os.Getenv("AXONFLOW_CLIENT_ID"),
ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})

// 1. Pre-check
ctx, err := client.GetPolicyApprovedContext(
userToken,
query,
[]string{"postgres"},
nil,
)
if err != nil {
return err
}
if !ctx.Approved {
return fmt.Errorf("blocked: %s", ctx.BlockReason)
}

// 2. Make LLM call (your code)
startTime := time.Now()
llmResponse, err := openaiClient.CreateChatCompletion(...)
latencyMs := time.Since(startTime).Milliseconds()

// 3. Audit
_, err = client.AuditLLMCall(
ctx.ContextID,
llmResponse.Choices[0].Message.Content[:100],
"openai",
"gpt-4",
axonflow.TokenUsage{
PromptTokens: llmResponse.Usage.PromptTokens,
CompletionTokens: llmResponse.Usage.CompletionTokens,
TotalTokens: llmResponse.Usage.TotalTokens,
},
latencyMs,
nil,
)

Python

from axonflow import AxonFlow, TokenUsage  # v3.2.0+
from openai import AsyncOpenAI
import time

openai = AsyncOpenAI()

async with AxonFlow(
endpoint=os.environ.get("AXONFLOW_ENDPOINT", "http://localhost:8080"),
client_id=os.environ["AXONFLOW_CLIENT_ID"],
client_secret=os.environ["AXONFLOW_CLIENT_SECRET"]
) as client:
# 1. Pre-check
ctx = await client.get_policy_approved_context(
user_token="user-jwt",
query="Analyze customer data",
data_sources=["postgres"]
)

if not ctx.approved:
raise Exception(f"Blocked: {ctx.block_reason}")

# 2. Direct LLM call
start = time.time()
response = await openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
latency_ms = int((time.time() - start) * 1000)

# 3. Audit
await client.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.choices[0].message.content[:100],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
total_tokens=response.usage.total_tokens
),
latency_ms=latency_ms
)

Java

import com.getaxonflow.sdk.AxonFlow;  // v3.2.0+
import com.getaxonflow.sdk.PolicyApprovalResult;
import com.getaxonflow.sdk.TokenUsage;
import com.openai.OpenAIClient;

AxonFlow axonflow = AxonFlow.builder()
.endpoint(System.getenv("AXONFLOW_ENDPOINT"))
.clientId(System.getenv("AXONFLOW_CLIENT_ID"))
.clientSecret(System.getenv("AXONFLOW_CLIENT_SECRET"))
.build();

OpenAIClient openai = new OpenAIClient(System.getenv("OPENAI_API_KEY"));

// 1. Pre-check
PolicyApprovalResult ctx = axonflow.getPolicyApprovedContext(
PolicyApprovalRequest.builder()
.userToken("user-123")
.query("Analyze customer data")
.dataSources(List.of("postgres"))
.build()
);

if (!ctx.isApproved()) {
throw new RuntimeException("Blocked: " + ctx.getBlockReason());
}

// 2. Direct LLM call
long startTime = System.currentTimeMillis();
ChatCompletion response = openai.chat().completions().create(
ChatCompletionRequest.builder()
.model("gpt-4")
.messages(List.of(
ChatMessage.user("Analyze customer data")
))
.build()
);
long latencyMs = System.currentTimeMillis() - startTime;

// 3. Audit
axonflow.auditLLMCall(
AuditRequest.builder()
.contextId(ctx.getContextId())
.responseSummary(response.choices().get(0).message().content().substring(0, 100))
.provider("openai")
.model("gpt-4")
.tokenUsage(TokenUsage.builder()
.promptTokens(response.usage().promptTokens())
.completionTokens(response.usage().completionTokens())
.totalTokens(response.usage().totalTokens())
.build())
.latencyMs(latencyMs)
.build()
);

Pre-Check Response Fields

The getPolicyApprovedContext() / preCheck() call returns an object with the following fields:

FieldTypeDescription
contextIdstringUnique identifier to correlate pre-check with audit. Pass this to auditLLMCall().
approvedbooleantrue if the request passed all policy checks.
approvedDataobjectFiltered/sanitized data safe to send to the LLM. May differ from original if policies modified content.
policiesstring[]List of policy names that were evaluated during the pre-check.
expiresAtDate / time.TimeWhen this approval expires. Make your LLM call before this time.
blockReasonstring (optional)Human-readable reason the request was blocked. Only present when approved is false.
rateLimitInfoobject (optional)Rate limit status: limit (max requests), remaining (requests left), resetAt (reset time).

Error Handling

TypeScript

try {
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: prompt,
});

if (!ctx.approved) {
// Policy blocked the request -- do not make LLM call
console.log('Blocked:', ctx.blockReason);
return;
}

const start = Date.now();
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
});

await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.choices[0].message.content?.substring(0, 100) || '',
provider: 'openai',
model: 'gpt-4',
tokenUsage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0,
},
latencyMs: Date.now() - start,
});
} catch (error) {
if (error.code === 'ECONNREFUSED') {
console.error('Cannot reach AxonFlow Agent - check endpoint');
} else if (error.code === 'TIMEOUT') {
console.error('Pre-check timed out');
} else if (error.status === 401) {
console.error('Authentication failed - check credentials');
} else {
console.error('Unexpected error:', error.message);
}
}

Go

ctx, err := client.GetPolicyApprovedContext(userToken, query, dataSources, nil)
if err != nil {
// Network error, timeout, or auth failure on pre-check
log.Printf("Pre-check failed: %v", err)
return
}

if !ctx.Approved {
log.Printf("Blocked: %s", ctx.BlockReason)
return
}

// Make LLM call...
start := time.Now()
llmResp, err := openaiClient.CreateChatCompletion(context.Background(), req)
if err != nil {
log.Printf("LLM call failed: %v", err)
return
}

// Audit -- log errors but do not fail the request
_, auditErr := client.AuditLLMCall(
ctx.ContextID, summary, "openai", "gpt-4", tokenUsage,
time.Since(start).Milliseconds(), nil,
)
if auditErr != nil {
log.Printf("Warning: audit logging failed: %v", auditErr)
}

When to Use Gateway Mode

Choose Gateway Mode when:

  • You need the lowest possible latency
  • You want to use your own LLM API keys
  • Your compliance allows client-side LLM calls
  • You're integrating with existing LLM workflows

Choose Proxy Mode when:

  • You need response filtering (PII detection)
  • You require 100% guaranteed audit trails
  • You want automatic LLM failover
  • Compliance requires all calls through a proxy

Configuration

const axonflow = new AxonFlow({
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
endpoint: process.env.AXONFLOW_ENDPOINT,
tenant: 'your-tenant-id',
mode: 'production'
});

API Reference

Pre-Check Methods

getPolicyApprovedContext() / preCheck()

Both methods are equivalent - preCheck() is an alias for simpler API.

TypeScript:

const ctx = await axonflow.getPolicyApprovedContext({
userToken: string, // User's JWT token
query: string, // The query/prompt
dataSources?: string[], // Data sources being accessed
context?: object // Additional context for policy evaluation
});

Go:

ctx, err := client.GetPolicyApprovedContext(
userToken string,
query string,
dataSources []string,
context map[string]interface{},
)

Python:

ctx = await client.get_policy_approved_context(
user_token="...",
query="...",
data_sources=["..."],
)

Response:

{
contextId: string; // Unique ID to correlate with audit
approved: boolean; // Whether request was approved
approvedData: object; // Filtered data (if policies modified it)
policies: string[]; // Policies that were evaluated
expiresAt: Date; // When this approval expires
blockReason?: string; // Reason for blocking (if !approved)
rateLimitInfo?: { // Rate limit info (if applicable)
limit: number;
remaining: number;
resetAt: Date;
}
}

Audit Method

auditLLMCall()

TypeScript:

await axonflow.auditLLMCall({
contextId: string, // From pre-check response
responseSummary: string, // Summary of LLM response
provider: string, // e.g., "openai", "anthropic"
model: string, // e.g., "gpt-4", "claude-opus-4"
tokenUsage: {
promptTokens: number,
completionTokens: number,
totalTokens: number
},
latencyMs: number, // Call duration in milliseconds
metadata?: object // Optional additional metadata
});

Go:

result, err := client.AuditLLMCall(
contextId string,
responseSummary string,
provider string,
model string,
tokenUsage TokenUsage,
latencyMs int64,
metadata map[string]interface{},
)

Python:

await client.audit_llm_call(
context_id="...",
response_summary="...",
provider="openai",
model="gpt-4",
token_usage=TokenUsage(...),
latency_ms=250,
)

Audit Reliability

Gateway Mode audit logging provides high reliability through a multi-layer architecture:

Compliance Mode (Default)

In compliance mode, audit operations are synchronous:

  • Pre-check context is written to database before response returns
  • LLM call audits are written before acknowledgment
  • Guarantees 100% audit trail persistence
  • Required for EU AI Act and HIPAA compliance

Performance Mode (Optional)

For high-throughput systems, enable performance mode:

  • Audit operations are queued and processed asynchronously
  • ~99.9%+ audit reliability with disk fallback
  • Latency reduced to ~10-20ms overhead

Automatic Retry & Recovery

All audit operations include:

  1. Exponential Backoff Retry - 3 attempts (100ms, 200ms, 400ms delays)
  2. Disk Fallback - If database unavailable, entries written to fallback file
  3. Automatic Recovery - On restart, entries from fallback file are replayed to database

This ensures audit trails are never lost, even during database outages.

Audit Entry → Queue → Retry (3x) → ✅ DB Success
└─→ ❌ Fallback File → Recovered on Restart

Configuration

# Enable performance mode (optional)
AGENT_PERFORMANCE_MODE=true

# Fallback file location
AUDIT_FALLBACK_PATH=/var/lib/axonflow/audit/audit_fallback.jsonl

Cost Tracking

AxonFlow automatically estimates costs based on provider and model:

ProviderModelEstimated Cost
OpenAIgpt-4$0.03/1K tokens
OpenAIgpt-4o-mini$0.002/1K tokens
Anthropicclaude-sonnet-4$0.003/1K tokens
BedrockVariousBased on AWS pricing
OllamaLocal$0 (self-hosted)

Comparison with Proxy Mode

FeatureProxy ModeGateway Mode
Integration EffortMinimalModerate
Latency OverheadHigher (~50-100ms)Lower (~10-20ms)
Static Policies (PII, SQL injection)✅ Yes✅ Yes
Dynamic Policies (custom rules)✅ Yes❌ No
Response FilteringYesNo
Audit Coverage100% automaticManual (call audit API)
LLM ControlLimitedFull
Best ForSimple apps, custom policiesFrameworks, performance

See Choosing a Mode for detailed guidance.

Next Steps


Enterprise customers: See the Gateway Mode Migration Guide for detailed architecture, migration steps, and troubleshooting.