Skip to main content

v8.0.0 Self-Hosted Upgrade Guide

Audience

This guide is for operators of self-hosted and In-VPC enterprise deployments. Community single-tenant self-hosted users should use the v7 to v8 Migration Guide instead; its simple-path community upgrade covers the common case without the enterprise detail below.

This guide is for operators of self-hosted and in-VPC AxonFlow enterprise deployments upgrading from v8.x to v8.0.0 (the platform release that delivers the FORCE Row-Level Security + identity-model rework formerly planned as v9.0.0). It covers what to do before the upgrade, how to run it, how to verify success, and how to roll back if needed.

If you are a customer of the AxonFlow-hosted Community SaaS (try.getaxonflow.com), no action is needed — AxonFlow handles the upgrade for you. See the v7 → v8 migration guide for the customer-API surface.

Deployment-shape decision table (do you need this guide?)

If your deployment matches "Community, single-tenant self-hosted" below, the simple-path upgrade in the public migration guide is sufficient — you do not need this page.

Deployment shapeNeed this guide?AXONFLOW_DB_APP_ROLE_URLAXONFLOW_DB_PLATFORM_ADMIN_URL
Community, single-tenant self-hostedNo (use public guide)Yes (or DATABASE_URL fallback in dev)No
Enterprise self-hosted with customer-portal adminYesYesYes
Enterprise self-hosted with multi-node enforcementYesYesYes
Enterprise self-hosted with AWS Marketplace meteringYesYesYes
Community-SaaS mode (sweep / recovery / tenant-delete)YesYesYes
Single-tenant enterprise without any of the aboveMarginal — read but skip the multi-tenant sectionsYesNo

Quick summary

TopicAnswer for v8.0.0
Is there a breaking customer-API change?No. JSON field names + Basic Auth credentials + X-Tenant-ID header all continue to work.
Is there a breaking operator-API change?Two: (a) AXONFLOW_DB_USE_APP_ROLE default flips from false to true in v8.0.0 — fresh deploys connect as the non-owner axonflow_app_role and rely on FORCE RLS for tenant isolation. You can preserve the legacy semantics by setting AXONFLOW_DB_USE_APP_ROLE=false explicitly. (b) If you run direct SQL queries against the AxonFlow application database under the app role, those queries return zero rows unless they first set the app.current_org_id session variable. SDK, plugin, and HTTP API callers are unaffected — only direct-DB tooling.
Are any source-fork APIs removed?Yes — middleware.RLSMiddleware, SetRLSContextForSession, ResetRLSContext, and WithRLS (pool-scope GUC helpers in ee/platform/customer-portal/middleware/rls.go) were removed. Self-hosted source forks calling these directly must migrate to api.withRequestOrgScope(r, h.db, fn). Diagnostic helpers (GetCurrentOrgID, VerifyRLSActive, GetRLSStats, RLSHealthCheck) are retained.
Do I need to regenerate my license?No. V2 licenses continue to validate.
Do I need to set new env vars?If you have not set ORG_ID to your real customer identifier, do so before upgrading. CFN-deployed stacks already set DEPLOYMENT_KIND=production automatically; self-hosted docker-compose operators running production must set it manually. To run as the app role (the v8.0.0 default), also set AXONFLOW_DB_APP_ROLE_URL and — if you run cross-org workers — AXONFLOW_DB_PLATFORM_ADMIN_URL.
Do I need to take a snapshot before upgrading?Yes — strongly recommended. v8.0.0 schema migrations are forward-only; the rollback contract is snapshot-restore.
Does my v8.x SDK/plugin still work?Yes. The platform accepts v7.x and v8.x SDKs/plugins through the entire v8.0.0 lifecycle. v10 removes the X-Tenant-ID deprecated alias.

Six-guard pre-upgrade checklist

Before running the preflight script (step 4 below), verify these six refuse-to-boot guards are addressed for your deployment. The v8.0.0 binaries refuse to boot when AXONFLOW_DB_USE_APP_ROLE=true (the default) and the relevant worker has no admin DSN — silent fallback to the request-traffic pool would defeat FORCE RLS.

#Guard prefixTriggered whenWhat to set
1[Marketplace]AWS Marketplace metering is enabledAXONFLOW_DB_PLATFORM_ADMIN_URL
2[NodeMonitor]ENABLE_NODE_MONITOR=trueAXONFLOW_DB_PLATFORM_ADMIN_URL
3[CSAAS-SWEEP]Community-SaaS sweep enabledAXONFLOW_DB_PLATFORM_ADMIN_URL
4[CSAAS-RECOVERY]Community-SaaS recovery handler enabledAXONFLOW_DB_PLATFORM_ADMIN_URL
5[CSAAS-DELETE]tenant-delete (GDPR right-to-erasure) endpoints exposedAXONFLOW_DB_PLATFORM_ADMIN_URL
6[customer-portal]customer-portal admin handlers in scopeAXONFLOW_DB_PLATFORM_ADMIN_URL + ADMIN_API_KEY

If your deployment triggers any of these, populate AXONFLOW_DB_PLATFORM_ADMIN_URL per step 4.5 below. If your deployment triggers none of these, you can run with AXONFLOW_DB_USE_APP_ROLE=true and just the app-role DSN — but you must still complete the upgrade steps below.

The public v7 → v8 Migration Guide covers:

  • Decision tree (deployment shape → env vars)
  • Simple-path 3-step community upgrade
  • Identity-model conceptual rework
  • Wire-level deprecation timeline (X-Tenant-IDX-Client-ID)
  • Smoke tests

This page is the stage-by-stage upgrade companion. For per-feature env-var combinations, customer-portal admin DSN format, and source-fork audit recipe, see also the v8.0.0 Enterprise Migration Guide.

Pre-upgrade checklist

Before pulling the v8.0.0 image, work through this checklist in order. Each step is also enforced by the preflight script in step 4.

1. Take a fresh database snapshot

The rollback contract for v8.0.0 is snapshot restore + image revert. Per-batch _down.sql migration files exist (e.g., 099_v9_rls_b1_sparse_tables_down.sql) but they revert one batch, not the full identity-model rollback. For a full v8.x rollback, you need a snapshot taken immediately before the upgrade.

  • AWS RDS: trigger a manual snapshot via the AWS console or aws rds create-db-snapshot --db-instance-identifier <id> --db-snapshot-identifier <name>. Wait for status available before proceeding.
  • Self-managed Postgres: use your existing backup tool (pgbackrest, barman, pg_dump to S3, etc.). Confirm the backup completed successfully.
  • Containerised Postgres in your own VPC: take an EBS volume snapshot of the underlying disk, OR run a pg_dump to a separate object store.

Record the snapshot identifier. You'll need it for rollback (Stage E below).

2. Confirm ORG_ID env on the agent task

ORG_ID is the deployment's customer/account identifier. It is the value the agent stamps onto every historical empty-org_id row during migration 094's Pass-2 backfill. It MUST be set to your real customer identifier (not the literal string local-dev-org, which is the dev-only sentinel) on any production deployment.

How to check:

HostingWhere ORG_ID lives
AWS Marketplace / CFN cloudformation-ecs-fargate.yamlOrganizationID CFN parameter on the stack → flows into the agent task def's ORG_ID env
Docker Compose (self-hosted enterprise)ORG_ID line in docker-compose.enterprise.yml or an external .env file
Custom Kubernetes / EC2 / on-premORG_ID env var on the agent container/process

If ORG_ID is unset on a production deployment, the agent's deployment-org getter falls back to local-dev-org, and migration 094 will refuse to run (the production-safety guard added in v9). Set ORG_ID and redeploy before continuing.

How to find your existing ORG_ID (CFN-deployed stacks):

# The CFN stack's OrganizationID parameter is the source of truth.
# Replace <your-stack-name> with your actual stack name (e.g., axonflow-acme-prod).
aws cloudformation describe-stacks \
--stack-name <your-stack-name> \
--query 'Stacks[0].Parameters[?ParameterKey==`OrganizationID`].ParameterValue' \
--output text

3. Confirm DEPLOYMENT_KIND env on the agent task

DEPLOYMENT_KIND distinguishes a real production deployment from a local docker-compose / community-mode install. Migration 094 uses it as a defence-in-depth signal alongside ORG_ID.

  • CFN-deployed stacks: DEPLOYMENT_KIND=production is hardcoded in cloudformation-ecs-fargate.yaml (agent + orchestrator task defs). No action needed.
  • Docker Compose: defaults to dev. Override by setting DEPLOYMENT_KIND=production in your shell or .env file before docker-compose up -d IF this docker-compose is your production deployment shape (rare but supported). Leave it as dev for local development.
  • Custom hosting: set DEPLOYMENT_KIND=production on the agent task.

4. Run the preflight script

The preflight script lives at scripts/deployment/v9_self_hosted_preflight.sh in the getaxonflow/axonflow-enterprise repository (BSL 1.1, source-available). It is dependency-free (bash + psql + optionally aws CLI for ECS discovery). Download it directly without cloning the repo:

curl -fsSLO https://raw.githubusercontent.com/getaxonflow/axonflow-enterprise/main/scripts/deployment/v9_self_hosted_preflight.sh
chmod +x v9_self_hosted_preflight.sh

For your DATABASE_URL, the Marketplace CFN template stores the RDS master password in AWS Secrets Manager. Construct the URL like this:

STACK_NAME=<your-stack-name>          # e.g., axonflow-acme-prod
DB_ENDPOINT=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
--query 'Stacks[0].Outputs[?OutputKey==`DatabaseEndpoint`].OutputValue' --output text)
DB_SECRET_ARN=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
--query 'Stacks[0].Outputs[?OutputKey==`DatabaseSecretArn`].OutputValue' --output text)
DB_PASSWORD=$(aws secretsmanager get-secret-value --secret-id "$DB_SECRET_ARN" \
--query SecretString --output text | jq -r .password)
export DATABASE_URL="postgres://axonflow:${DB_PASSWORD}@${DB_ENDPOINT}:5432/axonflow?sslmode=require"

(Adjust the --query keys if your stack's outputs are named differently — Marketplace template uses DatabaseEndpoint + DatabaseSecretArn; older templates may use different names.)

Now run the preflight:

# Minimum: just DATABASE_URL
./v9_self_hosted_preflight.sh

# Recommended: also supply the env discovery hints + your RDS identifier so
# the script can verify automated backups are enabled
DEPLOYMENT_KIND=production \
ORG_ID=acme-corp \
RDS_INSTANCE_IDENTIFIER=axonflow-acme-prod-db \
./v9_self_hosted_preflight.sh

# For ECS-Fargate: have the script discover env vars from the live task def
ECS_CLUSTER=axonflow-acme-cluster \
ECS_AGENT_SERVICE=axonflow-agent-service \
RDS_INSTANCE_IDENTIFIER=axonflow-acme-prod-db \
./v9_self_hosted_preflight.sh

The script exits 0 on PASS, 1 on FAIL.

The script reports PASS/WARN/FAIL across eight checks:

  1. Postgres version ≥ 14
  2. schema_migrations table state (all v8.x baseline migrations applied; no failed migrations)
  3. Empty/NULL org_id row scan (preview of what migration 094 will stamp with your ORG_ID)
  4. DEPLOYMENT_KIND + ORG_ID env vars
  5. Postgres roles (axonflow_app_role + axonflow_platform_admin) — these are CREATED by v9 migration 098, so absence on a pre-v9 stack is expected and reports PASS ("will be created"). The check fails only if the roles exist with the wrong BYPASSRLS configuration.
  6. Backup/snapshot policy
  7. local-dev-org default preservation
  8. AXONFLOW_DB_USE_APP_ROLE + AXONFLOW_DB_PLATFORM_ADMIN_URL env pairing — FAIL if AXONFLOW_DB_USE_APP_ROLE=true (the v8.0.0 default, also active when the env is unset) and AXONFLOW_DB_PLATFORM_ADMIN_URL is not set. The v8.0.0 binaries refuse to boot under that combination — see RLS write paths below. Also emits a WARN-level customized-handler audit advisory whenever app-role is effective, because the preflight script cannot scan a fork from outside.

Do not proceed if the script returns a FAIL. Read the remediation line printed under each FAIL, address it, and re-run.

4.5. Provision axonflow_app_role and axonflow_platform_admin

v8.0.0 defaults AXONFLOW_DB_USE_APP_ROLE=true. The agent + customer-portal connect as axonflow_app_role, which has NOBYPASSRLS, so FORCE RLS policies actually filter every query. Cross-org workers (Community-SaaS sweep, node monitor, recovery handler) use the separate axonflow_platform_admin role (BYPASSRLS).

Migration 098 creates both roles with NOBYPASSRLS / BYPASSRLS correctly, but roles without passwords cannot be logged in as. v8.0.0 ships scripts/operators/provision-app-role.sh to set passwords on both roles via ALTER ROLE ... WITH PASSWORD against your live Postgres and verify connectivity as each role. Passwords are supplied by the operator (env vars APP_ROLE_PASSWORD + PLATFORM_ADMIN_PASSWORD) — the script does not generate or print them.

curl -fsSLO https://raw.githubusercontent.com/getaxonflow/axonflow-enterprise/main/scripts/operators/provision-app-role.sh
chmod +x provision-app-role.sh

# Generate strong passwords yourself (the script does not generate them):
export APP_ROLE_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"
export PLATFORM_ADMIN_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"

# Stash both passwords in your secrets manager BEFORE running. Provide
# DATABASE_URL (or PGHOST/PGUSER/PGDATABASE) pointing at the RDS master,
# using the same construction shown in step 4 above:
DATABASE_URL="$DATABASE_URL" ./provision-app-role.sh

The script is idempotent — re-running with the same passwords is a no-op. Re-running with mismatched passwords exits 1 unless you set FORCE_RESET=1 to rotate. Stash both passwords in your secrets manager BEFORE running, otherwise you'll need FORCE_RESET=1 to recover the role.

Then construct the two DSN env vars to add to your task definitions:

AXONFLOW_DB_APP_ROLE_URL=postgres://axonflow_app_role:$APP_ROLE_PASSWORD@<host>:5432/axonflow?sslmode=require
AXONFLOW_DB_PLATFORM_ADMIN_URL=postgres://axonflow_platform_admin:$PLATFORM_ADMIN_PASSWORD@<host>:5432/axonflow?sslmode=require

Store both in your secrets manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, etc.) and reference them from your agent + customer-portal task definitions:

  • CFN-deployed stacks: store as two new secrets axonflow/<stack>/app-role-url and axonflow/<stack>/platform-admin-url; reference via Secrets: in the task def YAML.
  • Docker Compose: load via .env file or --env-file flag.
  • Kubernetes: mount as Secret-backed env vars.

If you want to defer the role flip (run v8.0.0 binary on v8.x runtime semantics), skip this step and add AXONFLOW_DB_USE_APP_ROLE=false to your task definitions instead. The same task defs will accept the app role later when you populate the DSN secrets and remove the false pin.

5. Review the v9 SDK/plugin compatibility matrix

v9 platform accepts existing v7.x and v8.x SDKs/plugins. You can upgrade clients at your own pace. See the v7 → v8 migration guide for the full compatibility matrix.

Recommended SDK floors when v8.0.0 ships:

SDKRecommendedMinimum
Go SDKv8.1.0v8.0.0
Python SDKv8.1.0v8.0.0
TypeScript SDKv8.1.0v8.0.0
Java SDKv8.1.0v8.0.0
Rust SDKv0.3.1v0.2.0
Plugin (claude/cursor/codex)v1.5.0v1.4.0
openclawv2.5.0v2.4.0

The platform's /health endpoint advertises the current floor — client SDKs older than the minimum emit a warning but continue to work through the entire v8.0.0 lifecycle.

Step-by-step upgrade

Stage A — pull the v8.0.0 image

For CFN-deployed stacks (AWS Marketplace ECS-Fargate template), the recommended path is the update-stack.yml GitHub Actions workflow in getaxonflow/axonflow-enterprise, which handles the parameter-preservation contract for you:

# From the axonflow-enterprise repo (or trigger via the GitHub Actions UI):
gh workflow run update-stack.yml \
-f stack_name=axonflow-acme-prod \
-f agent_image_tag=v8.0.0 \
-f orchestrator_image_tag=v8.0.0

If you must run aws cloudformation update-stack directly, every CFN parameter you do not explicitly update must be passed with UsePreviousValue=true — omitting that flag causes CFN to reset the parameter to its template default, which on a long-lived stack will silently clobber OrganizationID, DatabaseUrl, LicenseKey, etc. The full pattern:

aws cloudformation update-stack \
--stack-name axonflow-acme-prod \
--use-previous-template \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=AgentImageTag,ParameterValue=v8.0.0 \
ParameterKey=OrchestratorImageTag,ParameterValue=v8.0.0 \
ParameterKey=OrganizationID,UsePreviousValue=true \
ParameterKey=DatabaseUrl,UsePreviousValue=true \
ParameterKey=LicenseKey,UsePreviousValue=true \
# ... ALL other stack parameters, each with UsePreviousValue=true ...

Enumerate your stack's existing parameters first to make sure you list them all:

aws cloudformation describe-stacks \
--stack-name axonflow-acme-prod \
--query 'Stacks[0].Parameters[].ParameterKey' \
--output text

The GitHub Actions workflow path is strongly preferred over direct aws CLI because it derives the parameter list automatically.

For Docker Compose deployments:

# Update your image tag pin in docker-compose.enterprise.yml or .env
# Then:
docker compose pull
docker compose up -d

For custom Kubernetes / EC2 / on-prem hosting: update the image tag through your normal deployment path.

Stage B — wait for migrations

The agent runs migrations 088 through 103 on first boot. Migration 094 is the only one that performs row-level changes; the others are additive schema-only or role-creation migrations.

Tail the agent logs to confirm migrations complete:

# Docker
docker logs -f axonflow-agent | grep -E "Migration|RAISE NOTICE"

# ECS-Fargate (CloudWatch Logs)
aws logs tail /ecs/axonflow-agent --follow --since 5m | grep Migration

# Kubernetes
kubectl logs -f deploy/axonflow-agent | grep Migration

You should see log lines like (exact wording may vary slightly by image version):

✅ Set app.deployment_org_id=acme-corp session variable for migration 094 backfill
✅ Set app.deployment_kind=production session variable for migration 094 prod-safety precondition
Migration 094 Pass-1 PREP: deployment org=acme-corp organizations row ensured (1 new)
Migration 094 Pass-2: audit_logs org_id=acme-corp set on N rows
...
Migration 094 verify: <table>.org_id rows with empty/shared org_id = 0
Migration 094 complete — v9 org_id backfill

"Pass 1" applies to AxonFlow-hosted SaaS customers only and is a no-op on in-VPC stacks (it operates only on cs_*-prefixed rows that exist only in try.getaxonflow.com). "Pass 2" is the in-VPC backfill that stamps your ORG_ID onto any historical empty-org_id rows.

If you see Migration 094 prod-safety abort instead, the production-safety guard fired. The error message names the failure mode (ORG_ID env is unset on a production stack, etc.) and the remediation. Re-run the preflight script, fix the cause, and restart the agent — the migration will retry on next boot.

Stage C — post-deploy verification

Run these checks after the agent has settled (2-3 minutes after the last container restart):

# 1. /health advertises v9 platform version + SDK/plugin floors
curl -fsS https://<your-endpoint>/health | jq '.platform_version, .sdk_floor, .plugin_floor'

# Expected: "9.0.0" (or higher v9.x), v8.1.0 SDK floor, v1.5.0 plugin floor (claude/cursor/codex) + v2.5.0 (openclaw)

# 2. Migration 094 logged Pass-2 row counts for your deployment
docker logs <agent-container> 2>&1 | grep "Migration 094 Pass-2" | head -10

# 3. Empty-org_id rows are zero across customer-data tables (modulo deferred tables)
PGPASSWORD=... psql -h <db-host> -U axonflow -d axonflow <<'SQL'
SELECT 'audit_logs' AS t, COUNT(*) FROM audit_logs WHERE org_id IS NULL OR org_id = ''
UNION ALL SELECT 'mcp_query_audits', COUNT(*) FROM mcp_query_audits WHERE org_id IS NULL OR org_id = ''
UNION ALL SELECT 'static_policies', COUNT(*) FROM static_policies WHERE (org_id IS NULL OR org_id = '') AND tenant_id <> 'global'
UNION ALL SELECT 'dynamic_policies', COUNT(*) FROM dynamic_policies WHERE (org_id IS NULL OR org_id = '') AND tenant_id <> 'global'
UNION ALL SELECT 'service_identities', COUNT(*) FROM service_identities WHERE org_id IS NULL OR org_id = ''
UNION ALL SELECT 'execution_history', COUNT(*) FROM execution_history WHERE org_id IS NULL OR org_id = '';
SQL

# Expected: 0 across all rows. A few rows in agent_audit_logs / llm_call_audits
# may remain non-zero — those tables have their FORCE RLS rollout scheduled
# for a later phase and do not block v9 ship.

# 4. Smoke a real authenticated request.
# Your client_id/client_secret pair was issued when you provisioned the stack.
# For CFN-deployed stacks, the LicenseKey CFN parameter is your client_secret;
# the client_id is the OrganizationID CFN parameter (or a separately-issued
# API key if you have multiple credentials). To retrieve them:
# aws cloudformation describe-stacks --stack-name <your-stack> \
# --query 'Stacks[0].Parameters[?ParameterKey==`OrganizationID`||ParameterKey==`LicenseKey`].[ParameterKey,ParameterValue]'
curl -X POST https://<your-endpoint>/api/v1/process \
-H "Authorization: Basic $(echo -n 'your-client-id:your-client-secret' | base64)" \
-H "Content-Type: application/json" \
-d '{"query": "ping", "request_type": "chat"}'

# Expected: HTTP 200 + the normal response payload

Stage D — soak

Leave the stack in steady state for 24-72 hours before considering the upgrade complete. Watch for:

  • Auth failures in agent logs (would indicate an identity-plumbing issue)
  • RLS policy violation errors (would indicate a WithOrgScope gap)
  • Migration retries (would indicate a transient boot issue)
  • /health not advertising v9

If everything is steady, your stack is on v9.

Rollback path

If you need to roll back during or after the upgrade:

Rollback before migration 094 has stamped historical rows

If you catch the issue while the agent is still in its boot loop (migration 094 prod-safety branch fired and refused to stamp anything), simply:

  1. Revert the image tag to the prior v8.x release.
  2. Restart the agent + orchestrator services.
  3. No DB rollback needed — migration 094 either ran cleanly or aborted; there is no partial-state in between.

Rollback after migration 094 has run successfully

This is the snapshot-restore path. The schema additions are not destructive (new columns + new roles), but migration 094 stamped historical rows with your ORG_ID value. Reverting the platform image will not reverse those stamps; you need to restore from the snapshot taken in pre-upgrade step 1.

  1. Stop the agent + orchestrator services (or scale to 0).
  2. Restore the pre-upgrade RDS snapshot:
    • AWS RDS — important DNS gotcha: a restored DB instance cannot reuse the original DNS endpoint until the original is renamed or deleted. Two options:
      • Rename-then-restore (recommended): rename the live instance (aws rds modify-db-instance --db-instance-identifier <live> --new-db-instance-identifier <live>-broken), then restore the snapshot with the original identifier (aws rds restore-db-instance-from-db-snapshot --db-instance-identifier <live> --db-snapshot-identifier <snapshot>). The agent's existing DATABASE_URL resolves to the restored instance.
      • Restore-with-new-name + Route 53: restore the snapshot with a new identifier, then update the agent's DATABASE_URL (CFN parameter) to the new endpoint, or use Route 53 CNAME to repoint.
    • Self-managed Postgres: stop the live instance, restore from your backup tool, restart.
  3. Revert the agent + orchestrator image tags to the prior v8.x release.
  4. Restart services.

Per-batch rollback for FORCE RLS

Each FORCE RLS migration (099/101/102/103) ships with a _down.sql pair (e.g., 099_v9_rls_b1_sparse_tables_down.sql). These revert FORCE on one batch without reverting the broader v9 schema. Operators do not normally need these; the snapshot-restore path is the contract.

RLS write paths

The v8.0.0 default AXONFLOW_DB_USE_APP_ROLE=true makes FORCE Row-Level Security actually fire on every covered table. Under the legacy v8.x posture the agent connected as the RDS master role, so FORCE RLS was decorative — the master role bypassed every policy via table ownership. Flipping the gate changes the semantics of every INSERT/UPDATE/DELETE that touches a covered table: writes now hit the policy's WITH CHECK predicate (typically org_id = current_setting('app.current_org_id', true)), and a write that runs outside an org-scoped transaction fails with pq: new row violates row-level security policy.

For operators running stock v9, this is invisible: every hot-path write in the shipped codebase is already wrapped against this contract (the agent uses WithOrgScope for per-request writes, SECURITY DEFINER helpers for the auth-bootstrap window, and the axonflow_platform_admin pool for cross-org workers). For operators running forks or in-tree customizations — custom connectors, custom auth shims, in-tree extensions to handlers — the flip is gated on a write-path audit.

Refuse-to-boot guard

The v8.0.0 agent, orchestrator, and customer-portal binaries refuse to boot when AXONFLOW_DB_USE_APP_ROLE=true (the default — also active when the env is unset) and AXONFLOW_DB_PLATFORM_ADMIN_URL is not set. The binary exits with a FATAL log line naming both env vars; the previous behavior was a silent fallback to the request-traffic pool, which under FORCE RLS caused cross-org workers (marketplace metering, community-saas sweep / recovery, node monitor, customer-portal admin handlers) to quietly return zero rows.

The FATAL log looks like:

[Marketplace] FATAL: AXONFLOW_DB_PLATFORM_ADMIN_URL is required when AXONFLOW_DB_USE_APP_ROLE=true (silent fallback to a non-BYPASSRLS pool would defeat FORCE RLS — cross-org metering/sweep/recovery/monitoring would silently return 0 rows or undercount). Set AXONFLOW_DB_PLATFORM_ADMIN_URL to a DSN authenticating as axonflow_platform_admin, or set AXONFLOW_DB_USE_APP_ROLE=false to opt out of the v8.0.0 default and run under the legacy v8.x posture.

The bracketed prefix identifies which worker triggered the guard. The five worker prefixes are Marketplace, NodeMonitor, CSAAS-SWEEP, CSAAS-RECOVERY, and customer-portal.

If you see this FATAL on first boot, the fix is either:

  • Recommended: populate AXONFLOW_DB_PLATFORM_ADMIN_URL with a DSN that authenticates as axonflow_platform_admin (mirror of AXONFLOW_DB_APP_ROLE_URL — different role, same host/db). See step 4.5 above for the canonical wiring.
  • Phased-rollout: set AXONFLOW_DB_USE_APP_ROLE=false explicitly on the agent + orchestrator + customer-portal task definitions to opt out of the v8.0.0 default and run under the legacy v8.x posture. The guard is a no-op under that flag.

Customized-handler audit (forks only)

If you maintain a fork or have in-tree customizations to handlers that INSERT/UPDATE into application tables, audit those write paths before flipping AXONFLOW_DB_USE_APP_ROLE=true. The rule: every customized write into a Row-Level-Security-enabled table must go through one of three patterns.

PatternWhen to useWrapper
Per-request org-scoped transactionMost application writes — the request carries an org_id in its auth contextWrap the write in WithOrgScope(ctx, db, orgID, func(tx *sql.Tx) error { … }) (agent) or withRequestOrgScope(r, h.db, fn) (customer-portal). Sets app.current_org_id inside the same transaction.
SECURITY DEFINER helperAuth-bootstrap or pre-org-context writes (the org is being minted in this request — registration, signup, tenant-create)Call a SECURITY DEFINER function created by a migration. Examples shipped in v8.0.0: auth_lookup_api_key(), auth_touch_api_key(). Add your own via a new migration if your customization needs one.
Admin (BYPASSRLS) poolCross-org workers (sweeps, mirrors, aggregators) that genuinely iterate across orgsOpen the pool via OpenPlatformAdminConnection. Requires AXONFLOW_DB_PLATFORM_ADMIN_URL to be set (the refuse-to-boot guard ensures this).

To find your customized write sites, grep the fork:

# In your fork's root:
grep -rnE 'db\.(Exec|QueryRow|Query)|tx\.(Exec|QueryRow|Query)' --include='*.go' \
-- platform/ ee/ | grep -E 'INSERT|UPDATE|DELETE'

For each match, check the surrounding function:

  • Is the call inside a WithOrgScope / withRequestOrgScope closure? → safe.
  • Is the call on a *sql.DB opened via OpenPlatformAdminConnection? → safe.
  • Is the call a SELECT-only that goes through a SECURITY DEFINER helper? → safe.
  • Otherwise: this is a customized write that needs a wrapper before you flip AXONFLOW_DB_USE_APP_ROLE=true.

The migrations that enable Row-Level Security ship as part of the v8.0.0 image (the 018 template enables RLS across a base set of customer-data tables; later FORCE batches add FORCE to specific tables). Run psql -c '\d <table>' on any candidate table to see its RLS posture — Row security: enabled means writes are gated when not connecting as the table owner.

Staged rollout pattern

  1. Audit your fork (above). Fix every customized write that isn't wrapped.
  2. Stage with AXONFLOW_DB_USE_APP_ROLE=true on a staging stack carrying production-shaped traffic for at least one full diurnal cycle.
  3. Watch the agent + orchestrator logs for pq: new row violates row-level security policy. Each such line names the table the customized handler tried to write outside an org-scoped transaction.
  4. Roll back via the env flip if you surface violations: set AXONFLOW_DB_USE_APP_ROLE=false and re-run under the legacy posture while you fix the handler. The env override is intended as a temporary phased-rollout lever, not a permanent posture.
  5. Re-flip to true once staging is clean for one full diurnal cycle.

Operators running stock v9 can flip immediately after a successful preflight + standard staging soak — Check 8 will already have surfaced the env-pair requirement.

FAQ

What's new in v9 (in plain language)?

Three things, only the third of which is operator-visible:

  1. The codebase now uses three distinct identifiers (org_id customer, client_id credential, deployment_id license) instead of overloading tenant_id for all three. This is invisible on the wire — JSON field names + headers + Basic Auth all keep working.
  2. A new client_id column was added to several audit/policy tables. Existing rows have it backfilled to the same value as tenant_id. SDKs ≥v8.1.0 emit a new X-Client-ID header that fills it explicitly.
  3. Two new Postgres roles (axonflow_app_role for request traffic, axonflow_platform_admin for cross-org workers) were created. FORCE ROW LEVEL SECURITY is enabled on a handful of tables. This affects only direct SQL on a shared DB — see the v7 → v8 migration guide § Change 1.

What's deprecated in v9?

  • The X-Tenant-ID HTTP header (alias for X-Client-ID). Continues to work through all of v9; planned removal in v10.
  • The tenant_id JSON field in some API responses (also an alias for client_id). Same compatibility window.
  • The license payload's V2 org_id field (alias for V3 deployment_id). Continues to work; planned removal in v10.

Do I need to upgrade my SDKs?

No. v7.x and v8.x SDKs continue to authenticate via X-Tenant-ID through the entire v8.0.0 lifecycle. We recommend upgrading at your normal cadence, but it is not a v9 ship requirement.

Does v9 change my license format?

No. V3 license payloads add a deployment_id field but retain org_id as a back-compat alias. Existing V2 licenses continue to validate. You do not need to regenerate licenses for v9.

What happens if I forget to set ORG_ID before upgrading?

If DEPLOYMENT_KIND=production (the CFN default on real stacks), migration 094 will refuse to run with an explicit production-safety abort message. The agent boot-loops harmlessly until you set ORG_ID and restart. Historical rows are not touched. This guard was added specifically to prevent silent stamping with the dev sentinel.

If DEPLOYMENT_KIND=dev (the docker-compose default), migration 094 will accept the local-dev-org fallback and stamp historical rows with that value. This is the intended path for local development.

What does axonflow_app_role do?

It is a non-owner Postgres role used by the agent + orchestrator when AXONFLOW_DB_USE_APP_ROLE=true. Pre-v9 (v8.x default), the agent connects as the RDS master role, which bypasses RLS. Connecting as axonflow_app_role makes FORCE RLS effective — queries respect the app.current_org_id session variable.

v8.0.0 defaults AXONFLOW_DB_USE_APP_ROLE=true (CHANGELOG). Fresh deploys connect as the app role and rely on FORCE RLS for tenant isolation. If you want to upgrade the platform image to v8.0.0 without immediately flipping the runtime role (phased rollout), set AXONFLOW_DB_USE_APP_ROLE=false explicitly on your agent + customer-portal task definitions. This preserves v8.x semantics until you finish provisioning the role passwords + DSN secrets.

Use scripts/operators/provision-app-role.sh (referenced in step 4.5 above) to set the role passwords. You then construct the two DSN env vars (AXONFLOW_DB_APP_ROLE_URL, AXONFLOW_DB_PLATFORM_ADMIN_URL) from the passwords you supplied and add them to your task definitions.

What does axonflow_platform_admin do?

It is a BYPASSRLS Postgres role for legitimate cross-org workers (sweep workers, mirror Lambdas, aggregators, support tooling). These workers iterate across all orgs and cannot use the app role (which scopes to one org per transaction). The role is created by migration 098 but is not used until you wire your own cross-org tooling against it (or, for the AxonFlow-shipped sweep / node-monitor / recovery handler, set AXONFLOW_DB_PLATFORM_ADMIN_URL).

What about API key authentication — doesn't auth happen before app.current_org_id is set?

Yes, and v8.0.0 ships SECURITY DEFINER helper functions to handle this chicken-and-egg. The in-VPC enterprise auth path queries api_keys, customers, and pricing_tiers BEFORE app.current_org_id is established. Migration 117 ships two SECURITY DEFINER functions — auth_lookup_api_key() and auth_touch_api_key() — that run with elevated privileges for the narrow auth-bootstrap window only, then traffic drops back to the app role for the rest of the request. api_keys and customers are FORCE-RLS-protected for all non-auth traffic. No operator action required; the migration runs automatically on first v8.0.0 boot.

Will my customer-portal continue to work?

Yes. The first-party ee/platform/customer-portal-ui shipped in this repo is fully wired for v9 — nodes.go, export.go, connectors.go, sso.go, and auth/saml/service.go wrap every read/write of FORCE-RLS-protected tables in api.withRequestOrgScope(r, h.db, fn). The portal's existing handlers are RLS-correct out of the box.

Self-hosted source forks that called the removed middleware.RLSMiddleware / SetRLSContextForSession / ResetRLSContext / WithRLS family (pool-scope GUC helpers) will not compile against v8.0.0. Migrate to api.withRequestOrgScope(r, h.db, fn) — a request-scoped helper that opens a single *sql.Conn from the pool, sets app.current_org_id on that connection, runs the handler closure, and releases the connection. The pool-scope approach was unsafe under FORCE RLS because the GUC landed on one connection while the next handler statement might run on a different one. Diagnostic helpers in middleware/rls.go (GetCurrentOrgID, VerifyRLSActive, GetRLSStats, RLSHealthCheck) are retained.

References

  • v7 → v8 Migration Guide — customer-API surface (JSON fields, headers, SDKs)
  • AxonFlow source-available repo (BSL 1.1) — for the source code referenced in this guide (e.g., scripts/deployment/v9_self_hosted_preflight.sh, scripts/operators/provision-app-role.sh, and the agent / customer-portal Go source)

Enterprise Rollout Checklist

Use this page as part of the protected enterprise operating model: