Skip to main content

AxonFlow v7.2.0 Release Notes

AxonFlow v7.2.0 is the Bug Bash Bonanza — a MINOR release focused on hardening. The new surface area is small and additive (two operator-tunable knobs, a Customer Portal request-body size cap, and a per-feature HTTP example set), but the fixes run across the full Customer Portal HTTP surface, the Multi-Agent Planning execution path, tenant-scope enforcement on every read-and-action endpoint, and a login-endpoint change that closes an organization-enumeration leak.

No breaking changes. Every v7.1.x caller keeps working without code changes.

The short version:

  • MAP plans now complete end-to-end against the server's 300-second budget instead of getting truncated at 60s or at the proxy boundary
  • Tenant-scoped endpoints fail closed with a structured 401 TENANT_REQUIRED response when called without X-Tenant-ID, rather than silently scoping SQL to an empty-string tenant
  • require_approval is a valid override action now, so operators can temporarily route a specific policy's decisions into the HITL queue without editing the policy itself
  • Java SDK v5.5.0 ships the mapTimeout client-config field so Java callers can match the new 300s server cap
  • The Customer Portal got a pass on tenant isolation, onboarding, compliance flows, and two dozen smaller bugs that surfaced once real customers started using it for real work

If you run Multi-Agent Planning or have a Customer Portal deployment, upgrade. The fixes materially change what works end-to-end.

New operator knobs

AXONFLOW_MAP_MAX_TIMEOUT_SECONDS

The orchestrator's MAP plan-execution budget used to cap at 60 seconds, which is well under what a typical 5-step plan (≈15s per LLM call) actually needs. v7.2.0 raises the default to 300 seconds and exposes the cap as an orchestrator environment variable:

AXONFLOW_MAP_MAX_TIMEOUT_SECONDS=300  # default; clamped to 60..1800

The effective value is logged at startup when non-default. If you front the orchestrator with a reverse proxy or load balancer, set its idle / read timeout to at least this value — otherwise long plans get cut off at the proxy before the orchestrator finishes the work it was prepared to do.

SDK note: the TypeScript SDK's mapTimeout defaults to 120 seconds. Clients relying on the default will still cut off at 120s before the server cap takes effect. Pass mapTimeout: 300000 on the SDK client config to match the server. Java SDK v5.5.0 adds the same mapTimeout field so Java callers can do the same.

AlbIdleTimeoutSeconds CloudFormation parameter

For Enterprise deployments on AWS Marketplace, v7.2.0 adds an AlbIdleTimeoutSeconds parameter to the CloudFormation template so the ALB idle timeout can be paired with the orchestrator cap without a template edit. Default is 300 to match the server. See Self-hosted deployment for the broader operator-knob context.

Customer Portal MaxBodyBytesMiddleware

The Enterprise Customer Portal now caps every POST / PUT / PATCH request body at 1 MiB by default via a new exported middleware.MaxBodyBytesMiddleware. A variant, MaxBodyBytesMiddlewareWithLimit(n int64), is available for routes that legitimately need a larger ceiling (SAML metadata today, future file uploads). GET / HEAD are not wrapped.

New structured error — TENANT_REQUIRED

Before v7.2.0, several tenant-scoped endpoints fell back to an empty-string tenant when neither X-Tenant-ID nor a session-stored tenant was present. Downstream SQL ran with WHERE tenant_id = '', which returned zero rows in practice but silently burned daily-quota slots for an empty bucket and would have leaked data the moment a downstream query stopped filtering on tenant. v7.2.0 makes every such endpoint fail closed with a new structured error:

{
"error": {
"code": "TENANT_REQUIRED",
"message": "Missing tenant context: caller must provide X-Tenant-ID header"
}
}

Endpoints that adopted the structured 401 TENANT_REQUIRED response:

  • POST /api/v1/evidence/export
  • GET /api/v1/evidence/summary
  • POST /api/v1/policies/simulate
  • POST /api/v1/policies/{id}/impact-report
  • POST /api/v1/cost/estimate
  • GET /api/v1/plans/{id}/cost
  • GET /api/v1/audit/tenant/{tenant_id} (header-required check tightened; URL-vs-header mismatch still returns 403)

See the API Error Codes page for the full catalog.

Overrides now accept require_approval

The override validator's allow-list was hand-written as {block, redact, warn, log} before v7.2.0, which silently dropped require_approval as a valid override action even though the rest of the stack accepted it end-to-end. Operators who wanted to temporarily route a specific policy's decisions into the HITL queue had to edit the policy itself.

v7.2.0 standardises on a single canonical list of terminal actions across policy authoring and overrides:

ContextValid actions
Policy authoringalert, block, log, modify_risk, redact, require_approval, route, warn
Override endpoint (terminal-action subset)block, require_approval, redact, warn, log

Authoring-only actions (alert, modify_risk, route) are deliberately excluded from the override endpoint — they have no terminal-action meaning and the agent's override repository would reject them anyway. See Overrides API for the full request shape and HITL Approval Gates for the approval surface that require_approval feeds.

Agent surface fixes

Three Community-tier fixes worth calling out because they affect what you see when you query the Agent from any tier:

  • /api/v1/euaiact/* routes are now proxied to the orchestrator. The single-entry-point mux listed /rbi, /sebi, and /masfeat alongside the rest of the compliance family but omitted /euaiact, so every EU AI Act call that landed on the front-door ALB returned 404 page not found and the Portal's Compliance page reported the module as "not enabled for this tenant" even though peer modules rendered fine. Added the prefix to both the router and the proxy-allow-list.
  • Canonical /api/v1/policy-overrides alias on the agent. The Portal's overrides handler proxies to this path, matching the policy-categories / static-policies / dynamic-policies naming pattern. The agent previously only exposed the tenant override list under /api/v1/static-policies/overrides, so callers using the canonical path hit 404 and the Portal's Policies → Overrides tab rendered empty. Same handler, new path, auth unchanged.
  • Agent /health includes tier. The validated license tier (Community / Evaluation / Professional / Enterprise / starting) is now surfaced on the health response. Operators querying curl /health | jq .tier used to get "unknown" because the field was not present.

Policy surface fixes

context_aware policies accepted on update

Three seeded system policies ship with policy_type=context_aware — Tenant Isolation, Debug Mode Restriction, and Sensitive Data Control — but any update via PUT /api/v1/policies/{id} returned 400 because the orchestrator's allowlist was missing that type. Now accepted.

Legacy snake-case policy IDs accepted

The policy-ID validator before v7.2.0 only accepted UUIDs and the sys_* prefix, so seeded policies like sensitive_data_control and tenant_isolation failed every per-policy action (Test / Edit / Delete / Versions) with "Invalid policy ID format". The allowlist now also accepts the legacy snake-case form. This restores those flows on the Portal's Policies page.

tier and category query params honoured

GET /api/v1/policies used to drop the tier and category query parameters at the handler boundary even though the repository supported them. Every Tier / Category dropdown in the Customer Portal's policies page therefore returned the full unfiltered list. Fixed.

Customer Portal Evidence Export has a download button now

The Compliance page showed per-type record counts (audit logs, workflow steps, HITL approvals) but had no way to actually pull the bundle. v7.2.0 adds a Download Evidence button that streams the JSON bundle as a blob with a 30-day default window (the backend still caps by tier) and saves as axonflow-evidence-<start>-to-<end>.json. Disabled with a tooltip when counts are zero; surfaces any backend error (tier, license, quota) as an inline alert instead of silently doing nothing. See Evidence Export for the tier limits.

Security fixes

Customer Portal login no longer leaks org existence or auth mode

POST /api/v1/auth/login used to return three distinguishable failure responses that together let an unauthenticated caller enumerate which org IDs existed and classify each one by auth mode:

  • Unknown org → 401 "Invalid credentials" (no bcrypt work)
  • Known org with no password set (SSO-only) → 401 "Password authentication not enabled for this organization" (no bcrypt work)
  • Known org, bad password → 401 "Invalid credentials" (full bcrypt compare)

The distinct no-password body outed which orgs existed and which were password-backed. Even with a uniform body, the missing bcrypt work on the first two branches leaked the same bit through response timing. v7.2.0 closes both channels: every failure now returns "Invalid credentials", and the no-password branch runs a throwaway bcrypt compare against a fixed placeholder hash so the timing profile matches a real check. SSO-only orgs cannot log in through this path — they are simply indistinguishable from wrong-password attempts to an external caller.

Internal-service auth fallback narrowed

Before v7.2.0, when the internal-service auth secret was unset, the agent fell through to a literal-string compare against a compiled-in constant. Any caller who knew that constant could supply internal-service headers and impersonate the orchestrator, injecting arbitrary X-Tenant-ID / X-Org-ID for cross-tenant access. v7.2.0 gates the fallback path to Community and Community-SaaS deployment modes only. Outside those, a one-time security warning is logged at startup and the fallback no longer accepts. HMAC and legacy plain-secret paths are unchanged.

Orchestrator audit handler fails closed

The audit handler previously skipped proxy-auth validation entirely when the token validator was nil — the same misconfiguration shape that enabled the fallback bypass above. An attacker reaching the orchestrator directly could spoof X-Org-ID for cross-tenant audit attribution. The handler now returns 403 if the validator is nil and the deployment mode is not Community.

Decision-explain cross-tenant lookup closed

GET /api/v1/decisions/{id}/explain used to look up the audit entry by decision_id only and post-check the tenant. The short-circuit on email failed open whenever the user-email column was NULL, and the email check itself was bypassable by an attacker who happened to share an email with the decision owner across tenants. v7.2.0 tightens this:

  • X-Tenant-ID is required
  • The SELECT binds on (decision_id, caller_tenant)
  • Cross-tenant requests return 404 (not 403) so the response shape does not leak whether the decision ID exists in a different tenant
  • The post-fetch tenant comparison is kept as defense-in-depth

Legacy V1 HMAC license format purged

The V2 Ed25519 license format has been the only accepted key for months. v7.2.0 removes the stale V1 HMAC code paths from the active license validator. The rejection-path code that returns "V1 license format no longer supported" is kept so an old key surfacing in a clean shell gets a clear error instead of silent acceptance. See Licensing for current key generation.

Marketplace CloudFormation fixes

Two fixes that only surface on AWS Marketplace stacks with non-default environment names:

  • Agent security group can reach the Customer Portal. Per the single-entry-point architecture, every public request funnels through the Agent, which proxies /api/v1/auth/*, /api/v1/portal/*, /api/v1/code-governance/*, and /api/v1/git-providers/* to the Portal over Cloud Map. The security group allowed ALB → Portal and Portal → Agent but nothing allowed Agent → Portal; auth calls on the raw stack domain timed out after 30 seconds and fell back to 503 Backend service unavailable. Fixed; applies via update-stack without recycling ECS tasks.
  • Orchestrator connector secrets resolve under the per-stack environment name. Fourteen connector secret paths used the wrong path component; non-default stacks came up with every connector holding empty credentials. TaskExecutionRole IAM grants now match the per-stack path, with an AllowedPattern on EnvironmentName so typos fail at CreateChangeSet time rather than at runtime.

Enterprise fixes worth calling out

RBI FREE-AI registration + board reports

The RBI FREE-AI compliance module had two 500-path bugs that blocked Portal flows end-to-end:

  • AI-system registration: the repository's INSERT listed board_approval_required, but that column is a stored-generated column computed from risk_category. PostgreSQL rejected every write. Removed from the INSERT and UPDATE statements; the Go struct field is still populated at read time.
  • Board-report generation: the service layer set generation_method = "automated" but the database check constraint only accepts 'automatic' or 'manual'. Fixed the literal.

Usage rollup no longer returns zeros

The daily rollup was defined but never invoked — no scheduler, no goroutine, no on-demand call — so the rollup table stayed empty forever and the Portal's Usage summary returned zeros even when the underlying event rows had real data. v7.2.0 makes the aggregator idempotent (re-running an overlapping window recomputes the bucket instead of adding to it) and calls it on-demand from the Usage handlers before they query the rollup. Self-healing, no scheduler required. A related latent bug that only surfaced once the rollup populated (COALESCE(AVG()) returning numeric into an int scan target) is fixed in the same change.

/api/v1/export/usage no longer 500s

The handler queried columns that didn't exist, constructed INTERVAL '$2 days' which is not valid PostgreSQL parameterization (the $2 inside a string literal is treated as literal characters), and ignored the start / end query params the UI sends. Rewritten against the per-request metering table with correct columns, proper date-range handling for RFC3339 timestamps or YYYY-MM-DD dates, and surfaces DB errors in the server log instead of swallowing them behind a generic "Database error" response.

Admin org list + detail work against the real heartbeat schema

Both the admin org list and detail endpoints joined agent_heartbeats on column names that don't exist (organization_id, last_heartbeat_at); the real columns are org_id and last_heartbeat. Every org detail page 500'd with pq: column "organization_id" does not exist. Fixed.

Customer Portal Docker image builds with the Enterprise Go tag

The Portal Dockerfile previously hard-coded a Community build, so license.GenerateLicenseKey always returned the Community stub and admin onboarding 500'd on every call, leaving the orgs table empty and the SaaS deployment with no orgs to log into. Fixed.

Java SDK v5.5.0 companion release

Java SDK v5.5.0 ships alongside platform v7.2.0 with one user-facing addition:

  • mapTimeout on AxonFlowConfig — Java callers can now match the platform's 300-second MAP cap. Minor per-semver (additive field), consistent with the MAP-timeout additions Python, TypeScript, and Go landed in v7.1.x.

See Java SDK Version Compatibility for the full matrix and Java Getting Started for setup.

Upgrade checklist

  1. Upgrade the platform to v7.2.0.
  2. If you run the Java SDK, bump to v5.5.0 and set mapTimeout on your client config.
  3. If you run any SDK and execute long MAP plans, set the client-side timeout to at least 300 seconds (TS: mapTimeout: 300000; Python: map_timeout=300; Go: MapTimeout: 300 * time.Second; Java: .mapTimeout(Duration.ofSeconds(300))).
  4. If you front AxonFlow with a reverse proxy or load balancer, set its idle timeout to at least AXONFLOW_MAP_MAX_TIMEOUT_SECONDS (default 300s).
  5. If your callers hit any of the newly structured-401 endpoints, make sure every request sets X-Tenant-ID. Treat TENANT_REQUIRED as a client-fixable error.
  6. If you use session overrides, require_approval is now a valid action_override value — operators can route decisions into the HITL queue without editing the underlying policy.

Next release

  • v7.2.1 — patch that surfaces HITL approver identity (approved_by, approved_at) on workflow step responses and fixes a Customer Portal /approvals render crash.
  • v7.3.0 — first-class retry semantics and idempotency keys on the Workflow Control Plane.