Ollama Setup
Ollama is available in AxonFlow Community and Enterprise and is the default choice for local, self-hosted, and air-gapped model execution.
Runtime Defaults
AxonFlow's Ollama configuration defaults to:
- Endpoint:
http://localhost:11434 - Model:
llama3.2:latest
Quick Start
Start Ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2:latest
Configure AxonFlow
export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llama3.2:latest
export OLLAMA_TIMEOUT_SECONDS=300 # Request timeout in seconds (default: 300)
info
The default timeout for Ollama is 300 seconds (5 minutes), intentionally higher than the 120-second default for cloud providers. Local models can be slower to load on first inference.
YAML Configuration
version: "1.0"
llm_providers:
ollama:
enabled: true
config:
endpoint: ${OLLAMA_ENDPOINT:-http://localhost:11434}
model: ${OLLAMA_MODEL:-llama3.2:latest}
Good Fits
- Local development without external provider keys
- On-prem or air-gapped deployments
- Teams that want predictable spend and infrastructure-level control
- Enterprises that need data sovereignty, private AI infrastructure, or internal developer platforms built on local models
Proxy Mode
import { AxonFlow } from '@axonflow/sdk';
const axonflow = new AxonFlow({
endpoint: 'http://localhost:8080',
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});
const response = await axonflow.proxyLLMCall({
userToken: 'user-123',
query: 'Explain the design tradeoffs in this distributed system.',
requestType: 'chat',
context: {
provider: 'ollama',
model: 'llama3.2:latest',
},
});
console.log(response.data);
Operational Notes
- Ollama is a strong option for community deployments that cannot send prompts to a third-party cloud provider.
- Proxy Mode is usually the simplest starting point.
- Gateway Mode is also possible if your application already uses the Ollama HTTP API directly and you want AxonFlow only for pre-check and audit.
Gateway Mode Example
import { AxonFlow } from '@axonflow/sdk';
import ollama from 'ollama';
const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});
// 1. Pre-check
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: prompt,
});
if (!ctx.approved) throw new Error(ctx.blockReason);
// 2. Your direct Ollama call
const start = Date.now();
const response = await ollama.chat({
model: 'llama3.2',
messages: [{ role: 'user', content: prompt }],
});
// 3. Audit
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.message.content.substring(0, 100),
provider: 'ollama',
model: 'llama3.2',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs: Date.now() - start,
});
Health Check
You can verify the local Ollama instance directly:
curl http://localhost:11434/api/tags
Troubleshooting
Connection errors
- Confirm Ollama is listening on the same host or network as AxonFlow.
- Verify the
OLLAMA_ENDPOINTvalue includes the correct port.
Slow responses
- Start with a smaller local model.
- Check whether inference is CPU-only or GPU-backed.
- Measure end-to-end latency in your application path before assuming the governance layer is the bottleneck.
Next Steps
Provider Rollout Checklist
Before this provider path becomes production traffic, connect it to the runtime operating model:
- decide whether calls should use Proxy Mode, Gateway Mode, or Direct Orchestrator
- document credential ownership with the Provider And Credential Matrix
- use Provider Routing for failover, pinning, and cost-aware routing choices
- review Community vs Evaluation vs Enterprise when runtime provider management, Bedrock, or portal operations become requirements
