Grafana Dashboard

AxonFlow Community includes a pre-provisioned Grafana dashboard so engineers can see what the runtime is doing while they build and debug governed AI systems.

That makes it useful for more than demos. It is usually the fastest way to answer questions like whether policies are actually blocking, which connector types are getting slower, and whether token or cost curves changed after a routing or model update.

Access

docker compose up -d
open http://localhost:3000

Default local credentials:

username: admin
password: grafana_localdev456

What the Dashboard Covers

The community dashboard is driven by real Prometheus metrics from the local stack and focuses on the signals most teams need first.

Request and Policy Panels

Total Requests -- shows axonflow_agent_requests_total as a time series, giving you request volume over time.
Blocked Requests -- tracks axonflow_agent_blocked_requests_total, so you can see how often policies are blocking traffic and whether that rate changes after policy updates.
Policy Evaluations -- shows axonflow_agent_policy_evaluations_total, which counts how many times the policy engine ran. This is different from blocked requests because a policy evaluation can result in "allow" or "block".

Latency Panels

Agent Request Latency -- histogram from axonflow_agent_request_duration_milliseconds_bucket, showing p50/p95/p99 latency for the full agent request path.
Orchestrator Latency -- histogram from axonflow_orchestrator_request_duration_milliseconds_bucket, showing how long the orchestrator takes to route and execute LLM calls.
Gateway Pre-check Latency -- histogram from axonflow_gateway_precheck_duration_milliseconds_bucket, showing policy evaluation time for gateway pre-check calls.

Token and Cost Panels

Token Usage -- tracks axonflow_gateway_llm_tokens_total to show token consumption over time.
Estimated Cost -- tracks axonflow_gateway_llm_cost_usd_total to show estimated spend over time.

MCP Connector Panels

Connector Calls -- shows axonflow_connector_calls_total broken down by connector type.
Connector Latency -- histogram from axonflow_connector_duration_milliseconds_bucket.
Connector Errors -- tracks axonflow_connector_errors_total to surface connector failures that might otherwise be hidden behind successful LLM responses.

Dashboard Source

The bundled dashboard lives at:

config/grafana/dashboards/axonflow-community.json

Grafana provisioning is already wired through:

config/grafana/provisioning/datasources
config/grafana/provisioning/dashboards

Metrics Used by the Dashboard

Key metrics include:

axonflow_agent_requests_total
axonflow_agent_blocked_requests_total
axonflow_agent_policy_evaluations_total
axonflow_agent_request_duration_milliseconds_bucket
axonflow_gateway_precheck_requests_total
axonflow_gateway_precheck_duration_milliseconds_bucket
axonflow_gateway_llm_tokens_total
axonflow_gateway_llm_cost_usd_total
axonflow_orchestrator_llm_calls_total
axonflow_orchestrator_request_duration_milliseconds_bucket
axonflow_connector_calls_total
axonflow_connector_duration_milliseconds_bucket
axonflow_connector_errors_total

For the full list — including labels, edition availability, and the metrics that live in the usage store rather than Prometheus — see the Metrics Reference.

Manual Import

If you are running Grafana outside the default compose stack:

open Grafana
go to Dashboards -> Import
import config/grafana/dashboards/axonflow-community.json
select the Prometheus datasource

Enterprise Dashboard Availability

Enterprise deployments can provision dashboards based on deployment mode. The dashboard set is mode-aware, not one fixed list for every customer.

Common dashboards

These dashboards are expected in both SaaS and In-VPC enterprise modes:

UID	Title	Why operators use it
`executive-summary`	Executive Summary	high-level health and usage view
`axonflow-agent`	AxonFlow Agent Dashboard	runtime performance and request behavior
`load-test-results`	Load Test Results	performance and release validation

SaaS-only dashboard

UID	Title	Why it exists
`saas-tenant-overview`	SaaS Tenant Overview	tenant-scoped visibility for multi-tenant SaaS operations

In-VPC-only dashboards

UID	Title	Why it exists
`invpc-platform-overview`	In-VPC Platform Overview	platform-wide infrastructure and runtime visibility
`customer-node-usage`	Customer Node Usage	node visibility and license-pressure tracking

Claude Code Governance Dashboard

For teams standardizing on Claude Code (and other AI-development tools) as governed clients, AxonFlow ships a dedicated dashboard — "AxonFlow — Claude Code Governance (Allowed / Blocked / Redacted by Integration)" — that isolates decision-mode traffic by the integration it came from. It lives at grafana/dashboards/axonflow-claude-code-dashboard.json and is provisioned by the OpenTelemetry / Grafana observability overlay.

The `origin` label

The decision-mode metrics carry a low-cardinality origin label so you can slice governance activity by client type. origin is a closed set of six values — claude-code, claude-desktop, sdk, plugin, gateway, and unknown — classified from the X-Axonflow-Client request header and the caller's gateway_id. Raw hostnames, versions, and identities are bucketed into these fixed values and never surface as label cardinality.

The label rides these metrics:

axonflow_decision_requests_total — decisions by verdict, stage, and origin. This is the Allowed / Blocked view per integration.
axonflow_decision_obligations_total — obligations attached to an allow verdict, keyed by obligation, stage, and origin. The redact_pii obligation is the Redacted signal (on the decision plane, redaction is an obligation on an allow, not a separate verdict).
axonflow_decision_blocks_total — deny counts by the single blocking policy and origin, for a "top blocked policies" ranking. Per-tenant custom policy ids are collapsed to tenant_custom so the label stays bounded; seeded system and enterprise policy ids are kept verbatim.
axonflow_decision_duration_milliseconds — decision latency by origin.

Per-user breakdown

The dashboard also includes a per-user panel that splits decisions by developer email and verdict. This panel reads the OpenTelemetry span-metrics path (calls_total for the axonflow.decision span), not the Prometheus counters above — the Prometheus decision metrics deliberately do not carry a per-user label. It requires the OTel / Grafana observability overlay, and it populates by developer only once requests carry a per-developer identity (see Per-Developer Identity) and the span-metrics pipeline exposes user.email as a dimension. Until then it shows the per-verdict aggregate for each origin with the user dimension empty.

Claude Code Usage Dashboard (Enterprise)

Enterprise deployments that ingest Claude Code's native OpenTelemetry stream ship "AxonFlow — Claude Code Usage (Tokens / Cost / Activity by Developer & Session)" (grafana/dashboards/axonflow-claude-code-usage.json), which answers the fleet-usage questions the governance dashboard doesn't: tokens and cost per developer, per session, over time; lines of code, commits, and PRs; Claude Code's local edit-tool permission decisions; active time; and the OTLP export-reject counter for diagnosing misconfigured exporters.

The OTLP-ingest metrics it reads are available from platform v9.5.0. The dashboard itself ships in the Grafana image from platform v9.6.0 onward — on a v9.5.0 deployment the metrics are already landing, but you import the dashboard JSON manually (Dashboards → Import) until you upgrade to v9.6.0.

Two things make this dashboard different from the others on this page:

Its usage panels read the platform database, not Prometheus. The Claude Code usage counters land as canonical usage records (usage_events rows, event_type = 'claude_code_metric') via POST /v1/metrics — they are not Prometheus series. The panels query them through a provisioned PostgreSQL datasource. Only the ingest-health row (axonflow_otel_ingest_rejected_total) is a Prometheus query. See the Metrics Reference for the full breakdown of which metric lives in which store.
It requires the Enterprise OTLP ingest to be feeding data. Community deployments serve 501 on /v1/logs and /v1/metrics, so this dashboard has nothing to read there. Setup for the export is covered in Claude Cowork & Claude Code OTEL Ingest.
It is an operator surface, not a tenant surface. The usage store is row-level-secured, so the dashboard's datasource necessarily reads with an RLS-bypassing role — it sees usage across all orgs, and the Organization dropdown is a display filter, not a tenant-isolation boundary. Keep access to this Grafana operator-only (in particular, do not expose an instance with anonymous viewing to tenants); tenant-scoped visibility belongs to the customer portal.

Aggregate Metrics vs Detailed Logs

Grafana is the right surface for aggregates — volumes, rates, latency percentiles, token/cost totals, per-developer usage. It is not the audit trail.

Every individual policy decision is separately recorded as a per-decision audit record (who asked, which policy fired, verdict, what was redacted), signed into the decision chain. When the question is about a specific request or person rather than a trend, use the detailed logs instead:

Audit Logging — the canonical per-decision store and export paths
Customer Portal — decisions feed and log explorer with per-decision drill-down

Which Dashboard To Open First

If you are not sure where to start:

start with Executive Summary when the question is "is the platform broadly healthy?"
start with AxonFlow Agent Dashboard when the question is "what changed in runtime behavior?"
start with SaaS Tenant Overview when the issue is tenant-specific
start with In-VPC Platform Overview when the issue is environment-wide in a customer-controlled deployment
start with Load Test Results when validating a release or performance regression

Dashboard provisioning should not imply a tenant dashboard in every In-VPC install, node dashboards in SaaS mode, or one universal dashboard count for every enterprise customer.

Troubleshooting

docker compose ps grafana prometheus
docker compose logs grafana
docker compose logs prometheus
curl -s http://localhost:9090/-/healthy
curl -s http://localhost:8080/prometheus | head

If the dashboard shows no data, the usual causes are:

Prometheus is not scraping /prometheus
the services are healthy but have not received traffic yet
Grafana provisioning did not load on startup

Why This Matters

For serious AI application teams, dashboards are not decoration. They are how you prove:

policies are firing
latency remains acceptable
connectors are healthy
token and cost behavior are visible

That observability story is one of the reasons teams move from lightweight wrappers to a real control plane.

Next Steps

Operational Readiness Checklist

Before relying on this page in a production rollout, pair it with the core operations docs:

Deployment Mode Matrix for self-hosted, Evaluation, Enterprise, SaaS, and In-VPC fit
Failure Modes And Recovery for degraded-provider, connector, approval, and runtime behavior
Capacity Planning for sizing and growth signals
Community vs Evaluation vs Enterprise for limits, support surfaces, and upgrade triggers

Access​

What the Dashboard Covers​

Request and Policy Panels​

Latency Panels​

Token and Cost Panels​

MCP Connector Panels​

Dashboard Source​

Metrics Used by the Dashboard​

Manual Import​

Enterprise Dashboard Availability​

Common dashboards​

SaaS-only dashboard​

In-VPC-only dashboards​

Claude Code Governance Dashboard​

The origin label​

Per-user breakdown​

Claude Code Usage Dashboard (Enterprise)​

Aggregate Metrics vs Detailed Logs​

Which Dashboard To Open First​

Troubleshooting​

Why This Matters​

Next Steps​

Operational Readiness Checklist​