Skip to main content

Ollama Setup

Ollama is available in AxonFlow Community and Enterprise and is the default choice for local, self-hosted, and air-gapped model execution.

Runtime Defaults

AxonFlow's Ollama configuration defaults to:

  • Endpoint: http://localhost:11434
  • Model: llama3.2:latest

Quick Start

Start Ollama

docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2:latest

Configure AxonFlow

export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llama3.2:latest
export OLLAMA_TIMEOUT_SECONDS=300 # Request timeout in seconds (default: 300)
info

The default timeout for Ollama is 300 seconds (5 minutes), intentionally higher than the 120-second default for cloud providers. Local models can be slower to load on first inference.

YAML Configuration

version: "1.0"

llm_providers:
ollama:
enabled: true
config:
endpoint: ${OLLAMA_ENDPOINT:-http://localhost:11434}
model: ${OLLAMA_MODEL:-llama3.2:latest}

Good Fits

  • Local development without external provider keys
  • On-prem or air-gapped deployments
  • Teams that want predictable spend and infrastructure-level control
  • Enterprises that need data sovereignty, private AI infrastructure, or internal developer platforms built on local models

Proxy Mode

import { AxonFlow } from '@axonflow/sdk';

const axonflow = new AxonFlow({
endpoint: 'http://localhost:8080',
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});

const response = await axonflow.proxyLLMCall({
userToken: 'user-123',
query: 'Explain the design tradeoffs in this distributed system.',
requestType: 'chat',
context: {
provider: 'ollama',
model: 'llama3.2:latest',
},
});

console.log(response.data);

Operational Notes

  • Ollama is a strong option for community deployments that cannot send prompts to a third-party cloud provider.
  • Proxy Mode is usually the simplest starting point.
  • Gateway Mode is also possible if your application already uses the Ollama HTTP API directly and you want AxonFlow only for pre-check and audit.

Gateway Mode Example

import { AxonFlow } from '@axonflow/sdk';
import ollama from 'ollama';

const axonflow = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});

// 1. Pre-check
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: prompt,
});

if (!ctx.approved) throw new Error(ctx.blockReason);

// 2. Your direct Ollama call
const start = Date.now();
const response = await ollama.chat({
model: 'llama3.2',
messages: [{ role: 'user', content: prompt }],
});

// 3. Audit
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.message.content.substring(0, 100),
provider: 'ollama',
model: 'llama3.2',
tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
latencyMs: Date.now() - start,
});

Health Check

You can verify the local Ollama instance directly:

curl http://localhost:11434/api/tags

Troubleshooting

Connection errors

  • Confirm Ollama is listening on the same host or network as AxonFlow.
  • Verify the OLLAMA_ENDPOINT value includes the correct port.

Slow responses

  • Start with a smaller local model.
  • Check whether inference is CPU-only or GPU-backed.
  • Measure end-to-end latency in your application path before assuming the governance layer is the bottleneck.

Next Steps

Provider Rollout Checklist

Before this provider path becomes production traffic, connect it to the runtime operating model: