LLM Provider Routing

Configure how AxonFlow routes requests across LLM providers using simple environment variables.

Overview

AxonFlow's provider routing allows operators to:

Optimize costs by preferring cheaper providers
Meet compliance requirements (e.g., HIPAA with Bedrock-only routing)
Improve performance by favoring faster providers
Configure failover for high availability

Quick Start

Set environment variables on the AxonFlow Orchestrator:

# Force all traffic through Bedrock (HIPAA compliance)
LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock

# Or distribute traffic by weight
LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=bedrock:60,anthropic:30,openai:10

Configuration

Environment Variables

Variable	Values	Default	Description
`LLM_ROUTING_STRATEGY`	`weighted`, `round_robin`, `failover`, `cost_optimized`*	`weighted`	Routing strategy
`PROVIDER_WEIGHTS`	`provider:weight,...`	Equal weights	Traffic distribution
`DEFAULT_LLM_PROVIDER`	`bedrock`, `openai`, etc.	None	Primary provider for failover
`PROVIDER_COSTS`*	`provider:cost,...`	See defaults	Cost per 1K tokens for cost_optimized

* Enterprise only

Docker Compose

services:
  axonflow-orchestrator:
    environment:
      # Routing configuration
      LLM_ROUTING_STRATEGY: ${LLM_ROUTING_STRATEGY:-weighted}
      PROVIDER_WEIGHTS: ${PROVIDER_WEIGHTS:-}
      DEFAULT_LLM_PROVIDER: ${DEFAULT_LLM_PROVIDER:-}

      # Provider API keys
      OPENAI_API_KEY: ${OPENAI_API_KEY:-}
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}

Routing Strategies

Weighted (Default)

Distributes requests based on configured weights. Weights are normalized to sum to 1.0.

PROVIDER_WEIGHTS=openai:50,anthropic:30,bedrock:20
# Results in: ~50% OpenAI, ~30% Anthropic, ~20% Bedrock

Use cases:

Gradual migration between providers
Cost optimization by favoring cheaper providers
A/B testing new providers

Round Robin

Cycles through healthy providers equally, ignoring weights.

LLM_ROUTING_STRATEGY=round_robin
# Cycles: openai -> anthropic -> bedrock -> openai -> ...

Use cases:

Even load distribution
Testing provider parity
Avoiding rate limits

Failover

Uses primary provider exclusively, falls back to others only on failure.

LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
# Always uses Bedrock, falls back to others if Bedrock unhealthy

Use cases:

HIPAA/compliance (Bedrock-only)
Primary/backup configuration
Predictable routing

Cost Optimized (Enterprise)

Automatically routes to the cheapest healthy provider based on configured cost per 1K tokens.

LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,bedrock:0.02,anthropic:0.025,openai:0.03

Default costs (if PROVIDER_COSTS not set):

Provider	Cost per 1K tokens
ollama	$0.00 (self-hosted)
bedrock	$0.02
gemini	$0.025
anthropic	$0.025
openai	$0.03

Use cases:

Cost optimization without manual weight tuning
Automatic selection of cheapest healthy provider
Hybrid deployments with local (Ollama) + cloud fallback

How it works:

On each request, evaluates all healthy providers
Selects the provider with the lowest configured cost
Providers without cost config default to $1.00 (deprioritized)
Falls back to first healthy provider if all costs equal

Examples

HIPAA Compliance (Healthcare)

Force all traffic through AWS Bedrock:

LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
PROVIDER_WEIGHTS=bedrock:100

Cost Optimization (Community)

Prefer cheaper providers via manual weights:

LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=bedrock:60,anthropic:30,openai:10

Cost Optimization (Enterprise)

Automatic selection of cheapest healthy provider:

LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,bedrock:0.02,anthropic:0.025,openai:0.03
# Automatically uses cheapest healthy provider

High Availability

Round-robin with automatic failover on unhealthy providers:

LLM_ROUTING_STRATEGY=round_robin
# Unhealthy providers automatically skipped

SDK Usage

Clients can provide hints for provider selection, but the server makes final routing decisions.

Request-Level Pinning (Advanced)

Per-request provider semantics are:

context.provider → provider preference (fallback allowed).
context.strict_provider=true → hard pin for that request (no fallback).

You can also set a server default:

LLM_STRICT_PROVIDER_DEFAULT=false  # Recommended default

When set to true, strict pinning is enabled by default unless a request explicitly sets context.strict_provider=false.

TypeScript

import { AxonFlow } from "@axonflow/sdk";

const client = new AxonFlow({
  endpoint: "http://localhost:8080",
  tenant: "demo",
});

// Default routing (server decides)
const response = await client.proxy({
  query: "What is 2 + 2?",
  requestType: "chat",
});

// Request specific provider (hint)
const bedrockResponse = await client.proxy({
  query: "What is the capital of France?",
  requestType: "chat",
  context: {
    provider: "bedrock",          // Provider preference
    strict_provider: true,         // Optional hard pin
    model: "anthropic.claude-haiku-4-5-20251001-v1:0",
  },
});

// Check provider health
const health = await client.health();
console.log("Providers:", health.providers);

Python

import asyncio
from axonflow import AxonFlow

async def main():
    client = AxonFlow(
        endpoint="http://localhost:8080",
        tenant="demo",
    )

    # Default routing (server decides)
    response = await client.proxy(
        query="What is 2 + 2?",
        request_type="chat",
    )

    # Request specific provider (hint)
    bedrock_response = await client.proxy(
        query="What is the capital of France?",
        request_type="chat",
        context={"provider": "bedrock"},
    )

    # Check provider health
    health = await client.health()
    print("Status:", health.status)

asyncio.run(main())

Go

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/getaxonflow/axonflow-sdk-go/v3"
)

func main() {
    client, _ := axonflow.NewClient(
        axonflow.WithEndpoint("http://localhost:8080"),
        axonflow.WithTenant("demo"),
    )

    ctx := context.Background()

    // Default routing (server decides)
    resp, _ := client.Proxy(ctx, &axonflow.ProxyRequest{
        Query:       "What is 2 + 2?",
        RequestType: "chat",
    })
    fmt.Println("Response:", resp.Response)

    // Request specific provider (hint)
    bedrockResp, _ := client.Proxy(ctx, &axonflow.ProxyRequest{
        Query:       "What is the capital of France?",
        RequestType: "chat",
        Context: map[string]interface{}{
            "provider": "bedrock",
        },
    })
    fmt.Println("Provider:", bedrockResp.Metadata["provider"])
}

Java

import com.getaxonflow.sdk.AxonFlow;
import com.getaxonflow.sdk.ProxyRequest;
import com.getaxonflow.sdk.RequestType;
import java.util.Map;

public class ProviderRouting {
    public static void main(String[] args) {
        var client = new AxonFlow(
            AxonFlowConfig.builder()
                .endpoint("http://localhost:8080")
                .tenant("demo")
                .build()
        );

        // Default routing (server decides)
        var response = client.proxy(ProxyRequest.builder()
            .query("What is 2 + 2?")
            .requestType(RequestType.CHAT)
            .build());

        // Request specific provider (hint)
        var bedrockResponse = client.proxy(ProxyRequest.builder()
            .query("What is the capital of France?")
            .requestType(RequestType.CHAT)
            .context(Map.of("provider", "bedrock"))
            .build());
    }
}

Health Checking

Check provider health via the health endpoint:

curl http://localhost:8080/health

Response includes provider status:

{
  "status": "healthy",
  "providers": {
    "openai": {"healthy": true},
    "anthropic": {"healthy": true},
    "bedrock": {"healthy": true}
  }
}

Unhealthy providers are automatically excluded from routing.

Automatic Failover

When a provider fails:

Request retried with next available provider
After threshold failures, provider marked unhealthy
Traffic routes to healthy providers only
Health checks restore provider when recovered

Community vs Enterprise

Feature	Community	Enterprise
Routing strategies	`weighted`, `round_robin`, `failover`	+ `cost_optimized`
Configuration	Environment variables	+ Customer Portal UI
Scope	Global (all tenants)	Per-tenant settings
Cost tracking	-	Per-provider cost metrics
Runtime configuration	Restart required	Hot reload via API

For per-tenant configuration and advanced routing, see Enterprise Provider Features.

Complete Examples

Full working examples are available in the AxonFlow repository:

SDK Examples

HTTP/curl Example

For users who prefer direct HTTP calls without an SDK:

HTTP/curl Example

# Quick test with curl
curl -X POST http://localhost:8080/api/request \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is 2 + 2?",
    "user_token": "demo-user",
    "client_id": "curl-example",
    "request_type": "llm_chat",
    "context": {"provider": "openai"}
  }'

Each example demonstrates default routing, provider hints, model overrides, and health checking.

Overview​

Quick Start​

Configuration​

Environment Variables​

Docker Compose​

Routing Strategies​

Weighted (Default)​

Round Robin​

Failover​

Cost Optimized (Enterprise)​

Examples​

HIPAA Compliance (Healthcare)​

Cost Optimization (Community)​

Cost Optimization (Enterprise)​

High Availability​

SDK Usage​

Request-Level Pinning (Advanced)​

TypeScript​

Python​

Go​

Java​

Health Checking​

Automatic Failover​

Community vs Enterprise​

Complete Examples​

SDK Examples​

HTTP/curl Example​

Overview

Quick Start

Configuration

Environment Variables

Docker Compose

Routing Strategies

Weighted (Default)

Round Robin

Failover

Cost Optimized (Enterprise)

Examples

HIPAA Compliance (Healthcare)

Cost Optimization (Community)

Cost Optimization (Enterprise)

High Availability

SDK Usage

Request-Level Pinning (Advanced)

TypeScript

Python

Go

Java

Health Checking

Automatic Failover

Community vs Enterprise

Complete Examples

SDK Examples

HTTP/curl Example