LLM Provider Routing
Configure how AxonFlow routes requests across LLM providers using simple environment variables.
Overview
AxonFlow's provider routing allows operators to:
- Optimize costs by preferring cheaper providers
- Meet compliance requirements (e.g., HIPAA with Bedrock-only routing)
- Improve performance by favoring faster providers
- Configure failover for high availability
Quick Start
Set environment variables on the AxonFlow Orchestrator:
# Force all traffic through Bedrock (HIPAA compliance)
LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
# Or distribute traffic by weight
LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=bedrock:60,anthropic:30,openai:10
Configuration
Environment Variables
| Variable | Values | Default | Description |
|---|---|---|---|
LLM_ROUTING_STRATEGY | weighted, round_robin, failover, cost_optimized* | weighted | Routing strategy |
PROVIDER_WEIGHTS | provider:weight,... | Equal weights | Traffic distribution |
DEFAULT_LLM_PROVIDER | bedrock, openai, etc. | None | Primary provider for failover |
PROVIDER_COSTS* | provider:cost,... | See defaults | Cost per 1K tokens for cost_optimized |
* Enterprise only
Docker Compose
services:
axonflow-orchestrator:
environment:
# Routing configuration
LLM_ROUTING_STRATEGY: ${LLM_ROUTING_STRATEGY:-weighted}
PROVIDER_WEIGHTS: ${PROVIDER_WEIGHTS:-}
DEFAULT_LLM_PROVIDER: ${DEFAULT_LLM_PROVIDER:-}
# Provider API keys
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
Routing Strategies
Weighted (Default)
Distributes requests based on configured weights. Weights are normalized to sum to 1.0.
PROVIDER_WEIGHTS=openai:50,anthropic:30,bedrock:20
# Results in: ~50% OpenAI, ~30% Anthropic, ~20% Bedrock
Use cases:
- Gradual migration between providers
- Cost optimization by favoring cheaper providers
- A/B testing new providers
Round Robin
Cycles through healthy providers equally, ignoring weights.
LLM_ROUTING_STRATEGY=round_robin
# Cycles: openai -> anthropic -> bedrock -> openai -> ...
Use cases:
- Even load distribution
- Testing provider parity
- Avoiding rate limits
Failover
Uses primary provider exclusively, falls back to others only on failure.
LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
# Always uses Bedrock, falls back to others if Bedrock unhealthy
Use cases:
- HIPAA/compliance (Bedrock-only)
- Primary/backup configuration
- Predictable routing
Cost Optimized (Enterprise)
Automatically routes to the cheapest healthy provider based on configured cost per 1K tokens.
LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,bedrock:0.02,anthropic:0.025,openai:0.03
Default costs (if PROVIDER_COSTS not set):
| Provider | Cost per 1K tokens |
|---|---|
| ollama | $0.00 (self-hosted) |
| bedrock | $0.02 |
| gemini | $0.025 |
| anthropic | $0.025 |
| openai | $0.03 |
Use cases:
- Cost optimization without manual weight tuning
- Automatic selection of cheapest healthy provider
- Hybrid deployments with local (Ollama) + cloud fallback
How it works:
- On each request, evaluates all healthy providers
- Selects the provider with the lowest configured cost
- Providers without cost config default to $1.00 (deprioritized)
- Falls back to first healthy provider if all costs equal
Examples
HIPAA Compliance (Healthcare)
Force all traffic through AWS Bedrock:
LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
PROVIDER_WEIGHTS=bedrock:100
Cost Optimization (Community)
Prefer cheaper providers via manual weights:
LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=bedrock:60,anthropic:30,openai:10
Cost Optimization (Enterprise)
Automatic selection of cheapest healthy provider:
LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,bedrock:0.02,anthropic:0.025,openai:0.03
# Automatically uses cheapest healthy provider
High Availability
Round-robin with automatic failover on unhealthy providers:
LLM_ROUTING_STRATEGY=round_robin
# Unhealthy providers automatically skipped
SDK Usage
Clients can provide hints for provider selection, but the server makes final routing decisions.
Request-Level Pinning (Advanced)
Per-request provider semantics are:
context.provider→ provider preference (fallback allowed).context.strict_provider=true→ hard pin for that request (no fallback).
You can also set a server default:
LLM_STRICT_PROVIDER_DEFAULT=false # Recommended default
When set to true, strict pinning is enabled by default unless a request explicitly sets context.strict_provider=false.
TypeScript
import { AxonFlow } from "@axonflow/sdk";
const client = new AxonFlow({
endpoint: "http://localhost:8080",
tenant: "demo",
});
// Default routing (server decides)
const response = await client.proxy({
query: "What is 2 + 2?",
requestType: "chat",
});
// Request specific provider (hint)
const bedrockResponse = await client.proxy({
query: "What is the capital of France?",
requestType: "chat",
context: {
provider: "bedrock", // Provider preference
strict_provider: true, // Optional hard pin
model: "anthropic.claude-haiku-4-5-20251001-v1:0",
},
});
// Check provider health
const health = await client.health();
console.log("Providers:", health.providers);
Python
import asyncio
from axonflow import AxonFlow
async def main():
client = AxonFlow(
endpoint="http://localhost:8080",
tenant="demo",
)
# Default routing (server decides)
response = await client.proxy(
query="What is 2 + 2?",
request_type="chat",
)
# Request specific provider (hint)
bedrock_response = await client.proxy(
query="What is the capital of France?",
request_type="chat",
context={"provider": "bedrock"},
)
# Check provider health
health = await client.health()
print("Status:", health.status)
asyncio.run(main())
Go
package main
import (
"context"
"fmt"
"log"
"github.com/getaxonflow/axonflow-sdk-go/v3"
)
func main() {
client, _ := axonflow.NewClient(
axonflow.WithEndpoint("http://localhost:8080"),
axonflow.WithTenant("demo"),
)
ctx := context.Background()
// Default routing (server decides)
resp, _ := client.Proxy(ctx, &axonflow.ProxyRequest{
Query: "What is 2 + 2?",
RequestType: "chat",
})
fmt.Println("Response:", resp.Response)
// Request specific provider (hint)
bedrockResp, _ := client.Proxy(ctx, &axonflow.ProxyRequest{
Query: "What is the capital of France?",
RequestType: "chat",
Context: map[string]interface{}{
"provider": "bedrock",
},
})
fmt.Println("Provider:", bedrockResp.Metadata["provider"])
}
Java
import com.getaxonflow.sdk.AxonFlow;
import com.getaxonflow.sdk.ProxyRequest;
import com.getaxonflow.sdk.RequestType;
import java.util.Map;
public class ProviderRouting {
public static void main(String[] args) {
var client = new AxonFlow(
AxonFlowConfig.builder()
.endpoint("http://localhost:8080")
.tenant("demo")
.build()
);
// Default routing (server decides)
var response = client.proxy(ProxyRequest.builder()
.query("What is 2 + 2?")
.requestType(RequestType.CHAT)
.build());
// Request specific provider (hint)
var bedrockResponse = client.proxy(ProxyRequest.builder()
.query("What is the capital of France?")
.requestType(RequestType.CHAT)
.context(Map.of("provider", "bedrock"))
.build());
}
}
Health Checking
Check provider health via the health endpoint:
curl http://localhost:8080/health
Response includes provider status:
{
"status": "healthy",
"providers": {
"openai": {"healthy": true},
"anthropic": {"healthy": true},
"bedrock": {"healthy": true}
}
}
Unhealthy providers are automatically excluded from routing.
Automatic Failover
When a provider fails:
- Request retried with next available provider
- After threshold failures, provider marked unhealthy
- Traffic routes to healthy providers only
- Health checks restore provider when recovered
Community vs Enterprise
| Feature | Community | Enterprise |
|---|---|---|
| Routing strategies | weighted, round_robin, failover | + cost_optimized |
| Configuration | Environment variables | + Customer Portal UI |
| Scope | Global (all tenants) | Per-tenant settings |
| Cost tracking | - | Per-provider cost metrics |
| Runtime configuration | Restart required | Hot reload via API |
For per-tenant configuration and advanced routing, see Enterprise Provider Features.
Complete Examples
Full working examples are available in the AxonFlow repository:
SDK Examples
HTTP/curl Example
For users who prefer direct HTTP calls without an SDK:
# Quick test with curl
curl -X POST http://localhost:8080/api/request \
-H "Content-Type: application/json" \
-d '{
"query": "What is 2 + 2?",
"user_token": "demo-user",
"client_id": "curl-example",
"request_type": "llm_chat",
"context": {"provider": "openai"}
}'
Each example demonstrates default routing, provider hints, model overrides, and health checking.