Skip to main content

Ollama Setup

Ollama is available in AxonFlow Community and Enterprise and is the default choice for local, self-hosted, and air-gapped model execution.

Runtime Defaults

AxonFlow's Ollama configuration defaults to:

  • Endpoint: http://localhost:11434
  • Model: llama3.2:latest

Quick Start

Start Ollama

docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2:latest

Configure AxonFlow

export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llama3.2:latest
export OLLAMA_TIMEOUT_SECONDS=300 # Request timeout in seconds (default: 300)
info

The default timeout for Ollama is 300 seconds (5 minutes), intentionally higher than the 120-second default for cloud providers. Local models can be slower to load on first inference.

YAML Configuration

version: "1.0"

llm_providers:
ollama:
enabled: true
config:
endpoint: ${OLLAMA_ENDPOINT:-http://localhost:11434}
model: ${OLLAMA_MODEL:-llama3.2:latest}

Good Fits

  • Local development without external provider keys
  • On-prem or air-gapped deployments
  • Teams that want predictable spend and infrastructure-level control
  • Enterprises that need data sovereignty, private AI infrastructure, or internal developer platforms built on local models

Proxy Mode

import { AxonFlow } from '@axonflow/sdk';

const axonflow = new AxonFlow({
endpoint: 'http://localhost:8080',
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});

const response = await axonflow.proxyLLMCall({
userToken: 'user-123',
query: 'Explain the design tradeoffs in this distributed system.',
requestType: 'chat',
context: {
provider: 'ollama',
model: 'llama3.2:latest',
},
});

console.log(response.data);

Operational Notes

  • Ollama is a strong option for community deployments that cannot send prompts to a third-party cloud provider.
  • Proxy Mode is usually the simplest starting point.
  • Gateway Mode is also possible if your application already uses the Ollama HTTP API directly and you want AxonFlow only for pre-check and audit.

Gateway Mode Example

import { AxonFlow } from '@axonflow/sdk';
import ollama from 'ollama';

const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});

// 1. Pre-check
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: prompt,
});

if (!ctx.approved) throw new Error(ctx.blockReason);

// 2. Your direct Ollama call
const start = Date.now();
const response = await ollama.chat({
model: 'llama3.2',
messages: [{ role: 'user', content: prompt }],
});

// 3. Audit
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.message.content.substring(0, 100),
provider: 'ollama',
model: 'llama3.2',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs: Date.now() - start,
});

Health Check

You can verify the local Ollama instance directly:

curl http://localhost:11434/api/tags

Troubleshooting

Connection errors

  • Confirm Ollama is listening on the same host or network as AxonFlow.
  • Verify the OLLAMA_ENDPOINT value includes the correct port.

Slow responses

  • Start with a smaller local model.
  • Check whether inference is CPU-only or GPU-backed.
  • Measure end-to-end latency in your application path before assuming the governance layer is the bottleneck.

Next Steps