Ollama Setup
Ollama is available in AxonFlow Community and Enterprise and is the default choice for local, self-hosted, and air-gapped model execution.
Runtime Defaults
AxonFlow's Ollama configuration defaults to:
- Endpoint:
http://localhost:11434 - Model:
llama3.2:latest
Quick Start
Start Ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2:latest
Configure AxonFlow
export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llama3.2:latest
export OLLAMA_TIMEOUT_SECONDS=300 # Request timeout in seconds (default: 300)
info
The default timeout for Ollama is 300 seconds (5 minutes), intentionally higher than the 120-second default for cloud providers. Local models can be slower to load on first inference.
YAML Configuration
version: "1.0"
llm_providers:
ollama:
enabled: true
config:
endpoint: ${OLLAMA_ENDPOINT:-http://localhost:11434}
model: ${OLLAMA_MODEL:-llama3.2:latest}
Good Fits
- Local development without external provider keys
- On-prem or air-gapped deployments
- Teams that want predictable spend and infrastructure-level control
- Enterprises that need data sovereignty, private AI infrastructure, or internal developer platforms built on local models
Proxy Mode
import { AxonFlow } from '@axonflow/sdk';
const axonflow = new AxonFlow({
endpoint: 'http://localhost:8080',
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});
const response = await axonflow.proxyLLMCall({
userToken: 'user-123',
query: 'Explain the design tradeoffs in this distributed system.',
requestType: 'chat',
context: {
provider: 'ollama',
model: 'llama3.2:latest',
},
});
console.log(response.data);
Operational Notes
- Ollama is a strong option for community deployments that cannot send prompts to a third-party cloud provider.
- Proxy Mode is usually the simplest starting point.
- Gateway Mode is also possible if your application already uses the Ollama HTTP API directly and you want AxonFlow only for pre-check and audit.
Gateway Mode Example
import { AxonFlow } from '@axonflow/sdk';
import ollama from 'ollama';
const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});
// 1. Pre-check
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: prompt,
});
if (!ctx.approved) throw new Error(ctx.blockReason);
// 2. Your direct Ollama call
const start = Date.now();
const response = await ollama.chat({
model: 'llama3.2',
messages: [{ role: 'user', content: prompt }],
});
// 3. Audit
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.message.content.substring(0, 100),
provider: 'ollama',
model: 'llama3.2',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs: Date.now() - start,
});
Health Check
You can verify the local Ollama instance directly:
curl http://localhost:11434/api/tags
Troubleshooting
Connection errors
- Confirm Ollama is listening on the same host or network as AxonFlow.
- Verify the
OLLAMA_ENDPOINTvalue includes the correct port.
Slow responses
- Start with a smaller local model.
- Check whether inference is CPU-only or GPU-backed.
- Measure end-to-end latency in your application path before assuming the governance layer is the bottleneck.
