Ollama Setup

Ollama is available in AxonFlow Community and Enterprise and is the default choice for local, self-hosted, and air-gapped model execution.

Runtime Defaults

AxonFlow's Ollama configuration defaults to:

Endpoint: http://localhost:11434
Model: llama3.2:latest

Quick Start

Start Ollama

docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2:latest

Configure AxonFlow

export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llama3.2:latest
export OLLAMA_TIMEOUT_SECONDS=300    # Request timeout in seconds (default: 300)

info

The default timeout for Ollama is 300 seconds (5 minutes), intentionally higher than the 120-second default for cloud providers. Local models can be slower to load on first inference.

YAML Configuration

version: "1.0"

llm_providers:
  ollama:
    enabled: true
    config:
      endpoint: ${OLLAMA_ENDPOINT:-http://localhost:11434}
      model: ${OLLAMA_MODEL:-llama3.2:latest}

Good Fits

Local development without external provider keys
On-prem or air-gapped deployments
Teams that want predictable spend and infrastructure-level control
Enterprises that need data sovereignty, private AI infrastructure, or internal developer platforms built on local models

Proxy Mode

import { AxonFlow } from '@axonflow/sdk';

const axonflow = new AxonFlow({
  endpoint: 'http://localhost:8080',
  clientId: process.env.AXONFLOW_CLIENT_ID,
  clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});

const response = await axonflow.proxyLLMCall({
  userToken: 'user-123',
  query: 'Explain the design tradeoffs in this distributed system.',
  requestType: 'chat',
  context: {
    provider: 'ollama',
    model: 'llama3.2:latest',
  },
});

console.log(response.data);

Operational Notes

Ollama is a strong option for community deployments that cannot send prompts to a third-party cloud provider.
Proxy Mode is usually the simplest starting point.
Gateway Mode is also possible if your application already uses the Ollama HTTP API directly and you want AxonFlow only for pre-check and audit.

Gateway Mode Example

import { AxonFlow } from '@axonflow/sdk';
import ollama from 'ollama';

const axonflow = new AxonFlow({
  endpoint: process.env.AXONFLOW_ENDPOINT,
  clientId: process.env.AXONFLOW_CLIENT_ID,
  clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});

// 1. Pre-check
const ctx = await axonflow.getPolicyApprovedContext({
  userToken: 'user-123',
  query: prompt,
});

if (!ctx.approved) throw new Error(ctx.blockReason);

// 2. Your direct Ollama call
const start = Date.now();
const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: prompt }],
});

// 3. Audit
await axonflow.auditLLMCall({
  contextId: ctx.contextId,
  responseSummary: response.message.content.substring(0, 100),
  provider: 'ollama',
  model: 'llama3.2',
  tokenUsage: { promptTokens: 0, completionTokens: 0, totalTokens: 0 },
  latencyMs: Date.now() - start,
});

Health Check

You can verify the local Ollama instance directly:

curl http://localhost:11434/api/tags

Troubleshooting

Connection errors

Confirm Ollama is listening on the same host or network as AxonFlow.
Verify the OLLAMA_ENDPOINT value includes the correct port.

Slow responses

Start with a smaller local model.
Check whether inference is CPU-only or GPU-backed.
Measure end-to-end latency in your application path before assuming the governance layer is the bottleneck.

Runtime Defaults​

Quick Start​

Start Ollama​

Configure AxonFlow​

YAML Configuration​

Good Fits​

Proxy Mode​

Operational Notes​

Gateway Mode Example​

Health Check​

Troubleshooting​

Connection errors​

Slow responses​

Next Steps​