LiteLLM + AxonFlow Integration
Prerequisites: Python 3.10+, AxonFlow running (Getting Started), pip install axonflow-litellm
What This Integration Gives You
LiteLLM lets you call 100+ LLM providers through a unified API. The axonflow-litellm package adds AxonFlow governance to every call — policy enforcement, human-in-the-loop approval gates, and audit trail — via a drop-in wrapper around litellm.completion().
| Capability | How it surfaces |
|---|---|
| Pre-LLM policy enforcement | pre_check before every completion. Denials raise PolicyDeniedError. |
| HITL approval gate | When policy returns require_approval, the wrapper polls until approved/rejected/timed out. |
| Audit trail | Every LLM call is recorded to AxonFlow with model, provider, token usage, and latency. |
| Fail-open resilience | AxonFlow outage does not break LLM calls (default). Explicit fail_open=False for high-stakes workloads. |
| Circuit breaker | Half-open breaker prevents thundering herd on AxonFlow recovery. |
The package reuses the published axonflow Python SDK (v8.2+) and inherits its auth, retry, and observability. There is no second HTTP client.
Quick Start
pip install axonflow-litellm
from axonflow_litellm import AxonFlowLogger, AxonFlowLoggerConfig, PolicyDeniedError
logger = AxonFlowLogger(AxonFlowLoggerConfig(
endpoint="http://localhost:8080",
client_id="my-app",
client_secret="...",
))
try:
response = logger.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize quarterly earnings"}],
)
print(response.choices[0].message.content)
except PolicyDeniedError as e:
print(f"Blocked: {e.reason}")
All litellm.completion() kwargs are forwarded — model, messages, temperature, tools, etc.
Two Integration Modes
Governance Mode (recommended)
Use logger.completion() or logger.acompletion() as drop-in replacements:
# Sync
response = logger.completion(model="gpt-4o", messages=[...])
# Async (recommended for production)
response = await logger.acompletion(
model="claude-sonnet-4-6",
messages=[...],
user_token="jwt-from-your-auth",
)
Flow: pre-check → HITL (if required) → LLM call → audit.
Audit-Only Mode
Register as a LiteLLM callback for observability without blocking:
import litellm
litellm.callbacks = [logger]
response = await litellm.acompletion(model="gpt-4o", messages=[...])
Every call is recorded to AxonFlow. Policy denials are logged as warnings but cannot block the request (a LiteLLM SDK limitation — callback exceptions are silently swallowed).
HITL Approval Flow
When a policy returns require_approval:
- The wrapper creates a HITL queue entry via
create_hitl_request - Polls
get_hitl_requestevery 2 seconds (configurable) - On approved → proceeds to LLM call
- On rejected or expired → raises
ApprovalRejected - On timeout (client-side deadline) → raises
ApprovalTimeout
from axonflow_litellm import ApprovalRejected, ApprovalTimeout
try:
response = logger.completion(model="gpt-4o", messages=[...])
except ApprovalTimeout:
print("No reviewer responded in time")
except ApprovalRejected:
print("Reviewer denied the request")
except PolicyDeniedError as e:
print(f"Policy denied: {e.reason}")
Set enable_hitl_polling=False to deny immediately on require_approval without creating a queue entry.
Configuration
AxonFlowLoggerConfig(
endpoint="http://localhost:8080", # AxonFlow agent URL
client_id="my-app", # AxonFlow client identifier
client_secret="...", # AxonFlow client secret
# Governance behavior
fail_open=True, # Allow LLM calls when AxonFlow is unreachable
call_timeout_seconds=5.0, # Per-hook timeout
breaker_failure_threshold=5, # Consecutive failures before circuit opens
breaker_recovery_seconds=30.0, # Recovery window before probe
# HITL
enable_hitl_polling=True, # Enable HITL approval flow
approval_poll_interval_seconds=2.0, # Polling interval
approval_max_wait_seconds=300.0, # Client-side deadline
# Identity
default_user_token="anonymous", # Default token for policy evaluation
tenant_id=None, # AxonFlow tenant identifier
extra_context={}, # Additional context sent with every pre-check
)
Fail-Open vs. Fail-Closed
Default (fail_open=True): AxonFlow timeout or outage → LLM call proceeds normally. Best for most workloads.
High-stakes (fail_open=False): AxonFlow timeout or outage → raises PolicyDeniedError. No LLM call proceeds without explicit policy approval.
Exceptions
| Exception | When raised |
|---|---|
PolicyDeniedError | Policy denied the request |
ApprovalRejected | HITL reviewer rejected, or HITL row expired server-side |
ApprovalTimeout | Client-side deadline exceeded while polling |
All carry .reason (string) and .policies (list of policy IDs).
Using with an Existing AxonFlow Client
If your application already manages an AxonFlow client instance:
from axonflow import AxonFlow
from axonflow_litellm import AxonFlowLogger, AxonFlowLoggerConfig
client = AxonFlow(endpoint="http://localhost:8080", client_id="my-app")
logger = AxonFlowLogger.from_client(client, AxonFlowLoggerConfig(
endpoint="http://localhost:8080",
client_id="my-app",
))
The caller owns the client lifecycle — logger.aclose() is a no-op.
MCP Governance
LiteLLM is LLM-completion-focused. For MCP tool governance, see MCP Connectors Overview.
