v8.0.0 Self-Hosted Upgrade Guide
This guide is for operators of self-hosted and In-VPC enterprise deployments. Community single-tenant self-hosted users should use the v7 to v8 Migration Guide instead; its simple-path community upgrade covers the common case without the enterprise detail below.
This guide is for operators of self-hosted and in-VPC AxonFlow enterprise deployments upgrading from v8.x to v8.0.0 (the platform release that delivers the FORCE Row-Level Security + identity-model rework formerly planned as v9.0.0). It covers what to do before the upgrade, how to run it, how to verify success, and how to roll back if needed.
If you are a customer of the AxonFlow-hosted Community SaaS (try.getaxonflow.com), no action is needed — AxonFlow handles the upgrade for you. See the v7 → v8 migration guide for the customer-API surface.
Deployment-shape decision table (do you need this guide?)
If your deployment matches "Community, single-tenant self-hosted" below, the simple-path upgrade in the public migration guide is sufficient — you do not need this page.
| Deployment shape | Need this guide? | AXONFLOW_DB_APP_ROLE_URL | AXONFLOW_DB_PLATFORM_ADMIN_URL |
|---|---|---|---|
| Community, single-tenant self-hosted | No (use public guide) | Yes (or DATABASE_URL fallback in dev) | No |
| Enterprise self-hosted with customer-portal admin | Yes | Yes | Yes |
| Enterprise self-hosted with multi-node enforcement | Yes | Yes | Yes |
| Enterprise self-hosted with AWS Marketplace metering | Yes | Yes | Yes |
| Community-SaaS mode (sweep / recovery / tenant-delete) | Yes | Yes | Yes |
| Single-tenant enterprise without any of the above | Marginal — read but skip the multi-tenant sections | Yes | No |
Quick summary
| Topic | Answer for v8.0.0 |
|---|---|
| Is there a breaking customer-API change? | No. JSON field names + Basic Auth credentials + X-Tenant-ID header all continue to work. |
| Is there a breaking operator-API change? | Two: (a) AXONFLOW_DB_USE_APP_ROLE default flips from false to true in v8.0.0 — fresh deploys connect as the non-owner axonflow_app_role and rely on FORCE RLS for tenant isolation. You can preserve the legacy semantics by setting AXONFLOW_DB_USE_APP_ROLE=false explicitly. (b) If you run direct SQL queries against the AxonFlow application database under the app role, those queries return zero rows unless they first set the app.current_org_id session variable. SDK, plugin, and HTTP API callers are unaffected — only direct-DB tooling. |
| Are any source-fork APIs removed? | Yes — middleware.RLSMiddleware, SetRLSContextForSession, ResetRLSContext, and WithRLS (pool-scope GUC helpers in ee/platform/customer-portal/middleware/rls.go) were removed. Self-hosted source forks calling these directly must migrate to api.withRequestOrgScope(r, h.db, fn). Diagnostic helpers (GetCurrentOrgID, VerifyRLSActive, GetRLSStats, RLSHealthCheck) are retained. |
| Do I need to regenerate my license? | No. V2 licenses continue to validate. |
| Do I need to set new env vars? | If you have not set ORG_ID to your real customer identifier, do so before upgrading. CFN-deployed stacks already set DEPLOYMENT_KIND=production automatically; self-hosted docker-compose operators running production must set it manually. To run as the app role (the v8.0.0 default), also set AXONFLOW_DB_APP_ROLE_URL and — if you run cross-org workers — AXONFLOW_DB_PLATFORM_ADMIN_URL. |
| Do I need to take a snapshot before upgrading? | Yes — strongly recommended. v8.0.0 schema migrations are forward-only; the rollback contract is snapshot-restore. |
| Does my v8.x SDK/plugin still work? | Yes. The platform accepts v7.x and v8.x SDKs/plugins through the entire v8.0.0 lifecycle. v10 removes the X-Tenant-ID deprecated alias. |
Six-guard pre-upgrade checklist
Before running the preflight script (step 4 below), verify these six refuse-to-boot guards are addressed for your deployment. The v8.0.0 binaries refuse to boot when AXONFLOW_DB_USE_APP_ROLE=true (the default) and the relevant worker has no admin DSN — silent fallback to the request-traffic pool would defeat FORCE RLS.
| # | Guard prefix | Triggered when | What to set |
|---|---|---|---|
| 1 | [Marketplace] | AWS Marketplace metering is enabled | AXONFLOW_DB_PLATFORM_ADMIN_URL |
| 2 | [NodeMonitor] | ENABLE_NODE_MONITOR=true | AXONFLOW_DB_PLATFORM_ADMIN_URL |
| 3 | [CSAAS-SWEEP] | Community-SaaS sweep enabled | AXONFLOW_DB_PLATFORM_ADMIN_URL |
| 4 | [CSAAS-RECOVERY] | Community-SaaS recovery handler enabled | AXONFLOW_DB_PLATFORM_ADMIN_URL |
| 5 | [CSAAS-DELETE] | tenant-delete (GDPR right-to-erasure) endpoints exposed | AXONFLOW_DB_PLATFORM_ADMIN_URL |
| 6 | [customer-portal] | customer-portal admin handlers in scope | AXONFLOW_DB_PLATFORM_ADMIN_URL + ADMIN_API_KEY |
If your deployment triggers any of these, populate AXONFLOW_DB_PLATFORM_ADMIN_URL per step 4.5 below. If your deployment triggers none of these, you can run with AXONFLOW_DB_USE_APP_ROLE=true and just the app-role DSN — but you must still complete the upgrade steps below.
Related public guide
The public v7 → v8 Migration Guide covers:
- Decision tree (deployment shape → env vars)
- Simple-path 3-step community upgrade
- Identity-model conceptual rework
- Wire-level deprecation timeline (
X-Tenant-ID→X-Client-ID) - Smoke tests
This page is the stage-by-stage upgrade companion. For per-feature env-var combinations, customer-portal admin DSN format, and source-fork audit recipe, see also the v8.0.0 Enterprise Migration Guide.
Pre-upgrade checklist
Before pulling the v8.0.0 image, work through this checklist in order. Each step is also enforced by the preflight script in step 4.
1. Take a fresh database snapshot
The rollback contract for v8.0.0 is snapshot restore + image revert. Per-batch _down.sql migration files exist (e.g., 099_v9_rls_b1_sparse_tables_down.sql) but they revert one batch, not the full identity-model rollback. For a full v8.x rollback, you need a snapshot taken immediately before the upgrade.
- AWS RDS: trigger a manual snapshot via the AWS console or
aws rds create-db-snapshot --db-instance-identifier <id> --db-snapshot-identifier <name>. Wait for statusavailablebefore proceeding. - Self-managed Postgres: use your existing backup tool (
pgbackrest,barman,pg_dumpto S3, etc.). Confirm the backup completed successfully. - Containerised Postgres in your own VPC: take an EBS volume snapshot of the underlying disk, OR run a
pg_dumpto a separate object store.
Record the snapshot identifier. You'll need it for rollback (Stage E below).
2. Confirm ORG_ID env on the agent task
ORG_ID is the deployment's customer/account identifier. It is the value the agent stamps onto every historical empty-org_id row during migration 094's Pass-2 backfill. It MUST be set to your real customer identifier (not the literal string local-dev-org, which is the dev-only sentinel) on any production deployment.
How to check:
| Hosting | Where ORG_ID lives |
|---|---|
AWS Marketplace / CFN cloudformation-ecs-fargate.yaml | OrganizationID CFN parameter on the stack → flows into the agent task def's ORG_ID env |
| Docker Compose (self-hosted enterprise) | ORG_ID line in docker-compose.enterprise.yml or an external .env file |
| Custom Kubernetes / EC2 / on-prem | ORG_ID env var on the agent container/process |
If ORG_ID is unset on a production deployment, the agent's deployment-org getter falls back to local-dev-org, and migration 094 will refuse to run (the production-safety guard added in v9). Set ORG_ID and redeploy before continuing.
How to find your existing ORG_ID (CFN-deployed stacks):
# The CFN stack's OrganizationID parameter is the source of truth.
# Replace <your-stack-name> with your actual stack name (e.g., axonflow-acme-prod).
aws cloudformation describe-stacks \
--stack-name <your-stack-name> \
--query 'Stacks[0].Parameters[?ParameterKey==`OrganizationID`].ParameterValue' \
--output text
3. Confirm DEPLOYMENT_KIND env on the agent task
DEPLOYMENT_KIND distinguishes a real production deployment from a local docker-compose / community-mode install. Migration 094 uses it as a defence-in-depth signal alongside ORG_ID.
- CFN-deployed stacks:
DEPLOYMENT_KIND=productionis hardcoded incloudformation-ecs-fargate.yaml(agent + orchestrator task defs). No action needed. - Docker Compose: defaults to
dev. Override by settingDEPLOYMENT_KIND=productionin your shell or.envfile beforedocker-compose up -dIF this docker-compose is your production deployment shape (rare but supported). Leave it asdevfor local development. - Custom hosting: set
DEPLOYMENT_KIND=productionon the agent task.
4. Run the preflight script
The preflight script lives at scripts/deployment/v9_self_hosted_preflight.sh in the getaxonflow/axonflow-enterprise repository (BSL 1.1, source-available). It is dependency-free (bash + psql + optionally aws CLI for ECS discovery). Download it directly without cloning the repo:
curl -fsSLO https://raw.githubusercontent.com/getaxonflow/axonflow-enterprise/main/scripts/deployment/v9_self_hosted_preflight.sh
chmod +x v9_self_hosted_preflight.sh
For your DATABASE_URL, the Marketplace CFN template stores the RDS master password in AWS Secrets Manager. Construct the URL like this:
STACK_NAME=<your-stack-name> # e.g., axonflow-acme-prod
DB_ENDPOINT=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
--query 'Stacks[0].Outputs[?OutputKey==`DatabaseEndpoint`].OutputValue' --output text)
DB_SECRET_ARN=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
--query 'Stacks[0].Outputs[?OutputKey==`DatabaseSecretArn`].OutputValue' --output text)
DB_PASSWORD=$(aws secretsmanager get-secret-value --secret-id "$DB_SECRET_ARN" \
--query SecretString --output text | jq -r .password)
export DATABASE_URL="postgres://axonflow:${DB_PASSWORD}@${DB_ENDPOINT}:5432/axonflow?sslmode=require"
(Adjust the --query keys if your stack's outputs are named differently — Marketplace template uses DatabaseEndpoint + DatabaseSecretArn; older templates may use different names.)
Now run the preflight:
# Minimum: just DATABASE_URL
./v9_self_hosted_preflight.sh
# Recommended: also supply the env discovery hints + your RDS identifier so
# the script can verify automated backups are enabled
DEPLOYMENT_KIND=production \
ORG_ID=acme-corp \
RDS_INSTANCE_IDENTIFIER=axonflow-acme-prod-db \
./v9_self_hosted_preflight.sh
# For ECS-Fargate: have the script discover env vars from the live task def
ECS_CLUSTER=axonflow-acme-cluster \
ECS_AGENT_SERVICE=axonflow-agent-service \
RDS_INSTANCE_IDENTIFIER=axonflow-acme-prod-db \
./v9_self_hosted_preflight.sh
The script exits 0 on PASS, 1 on FAIL.
The script reports PASS/WARN/FAIL across eight checks:
- Postgres version ≥ 14
schema_migrationstable state (all v8.x baseline migrations applied; no failed migrations)- Empty/NULL
org_idrow scan (preview of what migration 094 will stamp with yourORG_ID) DEPLOYMENT_KIND+ORG_IDenv vars- Postgres roles (
axonflow_app_role+axonflow_platform_admin) — these are CREATED by v9 migration 098, so absence on a pre-v9 stack is expected and reports PASS ("will be created"). The check fails only if the roles exist with the wrong BYPASSRLS configuration. - Backup/snapshot policy
local-dev-orgdefault preservationAXONFLOW_DB_USE_APP_ROLE+AXONFLOW_DB_PLATFORM_ADMIN_URLenv pairing — FAIL ifAXONFLOW_DB_USE_APP_ROLE=true(the v8.0.0 default, also active when the env is unset) andAXONFLOW_DB_PLATFORM_ADMIN_URLis not set. The v8.0.0 binaries refuse to boot under that combination — see RLS write paths below. Also emits a WARN-level customized-handler audit advisory whenever app-role is effective, because the preflight script cannot scan a fork from outside.
Do not proceed if the script returns a FAIL. Read the remediation line printed under each FAIL, address it, and re-run.
4.5. Provision axonflow_app_role and axonflow_platform_admin
v8.0.0 defaults AXONFLOW_DB_USE_APP_ROLE=true. The agent + customer-portal connect as axonflow_app_role, which has NOBYPASSRLS, so FORCE RLS policies actually filter every query. Cross-org workers (Community-SaaS sweep, node monitor, recovery handler) use the separate axonflow_platform_admin role (BYPASSRLS).
Migration 098 creates both roles with NOBYPASSRLS / BYPASSRLS correctly, but roles without passwords cannot be logged in as. v8.0.0 ships scripts/operators/provision-app-role.sh to set passwords on both roles via ALTER ROLE ... WITH PASSWORD against your live Postgres and verify connectivity as each role. Passwords are supplied by the operator (env vars APP_ROLE_PASSWORD + PLATFORM_ADMIN_PASSWORD) — the script does not generate or print them.
curl -fsSLO https://raw.githubusercontent.com/getaxonflow/axonflow-enterprise/main/scripts/operators/provision-app-role.sh
chmod +x provision-app-role.sh
# Generate strong passwords yourself (the script does not generate them):
export APP_ROLE_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"
export PLATFORM_ADMIN_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"
# Stash both passwords in your secrets manager BEFORE running. Provide
# DATABASE_URL (or PGHOST/PGUSER/PGDATABASE) pointing at the RDS master,
# using the same construction shown in step 4 above:
DATABASE_URL="$DATABASE_URL" ./provision-app-role.sh
The script is idempotent — re-running with the same passwords is a no-op. Re-running with mismatched passwords exits 1 unless you set FORCE_RESET=1 to rotate. Stash both passwords in your secrets manager BEFORE running, otherwise you'll need FORCE_RESET=1 to recover the role.
Then construct the two DSN env vars to add to your task definitions:
AXONFLOW_DB_APP_ROLE_URL=postgres://axonflow_app_role:$APP_ROLE_PASSWORD@<host>:5432/axonflow?sslmode=require
AXONFLOW_DB_PLATFORM_ADMIN_URL=postgres://axonflow_platform_admin:$PLATFORM_ADMIN_PASSWORD@<host>:5432/axonflow?sslmode=require
Store both in your secrets manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, etc.) and reference them from your agent + customer-portal task definitions:
- CFN-deployed stacks: store as two new secrets
axonflow/<stack>/app-role-urlandaxonflow/<stack>/platform-admin-url; reference viaSecrets:in the task def YAML. - Docker Compose: load via
.envfile or--env-fileflag. - Kubernetes: mount as Secret-backed env vars.
If you want to defer the role flip (run v8.0.0 binary on v8.x runtime semantics), skip this step and add AXONFLOW_DB_USE_APP_ROLE=false to your task definitions instead. The same task defs will accept the app role later when you populate the DSN secrets and remove the false pin.
5. Review the v9 SDK/plugin compatibility matrix
v9 platform accepts existing v7.x and v8.x SDKs/plugins. You can upgrade clients at your own pace. See the v7 → v8 migration guide for the full compatibility matrix.
Recommended SDK floors when v8.0.0 ships:
| SDK | Recommended | Minimum |
|---|---|---|
| Go SDK | v8.1.0 | v8.0.0 |
| Python SDK | v8.1.0 | v8.0.0 |
| TypeScript SDK | v8.1.0 | v8.0.0 |
| Java SDK | v8.1.0 | v8.0.0 |
| Rust SDK | v0.3.1 | v0.2.0 |
| Plugin (claude/cursor/codex) | v1.5.0 | v1.4.0 |
| openclaw | v2.5.0 | v2.4.0 |
The platform's /health endpoint advertises the current floor — client SDKs older than the minimum emit a warning but continue to work through the entire v8.0.0 lifecycle.
Step-by-step upgrade
Stage A — pull the v8.0.0 image
For CFN-deployed stacks (AWS Marketplace ECS-Fargate template), the recommended path is the update-stack.yml GitHub Actions workflow in getaxonflow/axonflow-enterprise, which handles the parameter-preservation contract for you:
# From the axonflow-enterprise repo (or trigger via the GitHub Actions UI):
gh workflow run update-stack.yml \
-f stack_name=axonflow-acme-prod \
-f agent_image_tag=v8.0.0 \
-f orchestrator_image_tag=v8.0.0
If you must run aws cloudformation update-stack directly, every CFN parameter you do not explicitly update must be passed with UsePreviousValue=true — omitting that flag causes CFN to reset the parameter to its template default, which on a long-lived stack will silently clobber OrganizationID, DatabaseUrl, LicenseKey, etc. The full pattern:
aws cloudformation update-stack \
--stack-name axonflow-acme-prod \
--use-previous-template \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=AgentImageTag,ParameterValue=v8.0.0 \
ParameterKey=OrchestratorImageTag,ParameterValue=v8.0.0 \
ParameterKey=OrganizationID,UsePreviousValue=true \
ParameterKey=DatabaseUrl,UsePreviousValue=true \
ParameterKey=LicenseKey,UsePreviousValue=true \
# ... ALL other stack parameters, each with UsePreviousValue=true ...
Enumerate your stack's existing parameters first to make sure you list them all:
aws cloudformation describe-stacks \
--stack-name axonflow-acme-prod \
--query 'Stacks[0].Parameters[].ParameterKey' \
--output text
The GitHub Actions workflow path is strongly preferred over direct aws CLI because it derives the parameter list automatically.
For Docker Compose deployments:
# Update your image tag pin in docker-compose.enterprise.yml or .env
# Then:
docker compose pull
docker compose up -d
For custom Kubernetes / EC2 / on-prem hosting: update the image tag through your normal deployment path.
Stage B — wait for migrations
The agent runs migrations 088 through 103 on first boot. Migration 094 is the only one that performs row-level changes; the others are additive schema-only or role-creation migrations.
Tail the agent logs to confirm migrations complete:
# Docker
docker logs -f axonflow-agent | grep -E "Migration|RAISE NOTICE"
# ECS-Fargate (CloudWatch Logs)
aws logs tail /ecs/axonflow-agent --follow --since 5m | grep Migration
# Kubernetes
kubectl logs -f deploy/axonflow-agent | grep Migration
You should see log lines like (exact wording may vary slightly by image version):
✅ Set app.deployment_org_id=acme-corp session variable for migration 094 backfill
✅ Set app.deployment_kind=production session variable for migration 094 prod-safety precondition
Migration 094 Pass-1 PREP: deployment org=acme-corp organizations row ensured (1 new)
Migration 094 Pass-2: audit_logs org_id=acme-corp set on N rows
...
Migration 094 verify: <table>.org_id rows with empty/shared org_id = 0
Migration 094 complete — v9 org_id backfill
"Pass 1" applies to AxonFlow-hosted SaaS customers only and is a no-op on in-VPC stacks (it operates only on cs_*-prefixed rows that exist only in try.getaxonflow.com). "Pass 2" is the in-VPC backfill that stamps your ORG_ID onto any historical empty-org_id rows.
If you see Migration 094 prod-safety abort instead, the production-safety guard fired. The error message names the failure mode (ORG_ID env is unset on a production stack, etc.) and the remediation. Re-run the preflight script, fix the cause, and restart the agent — the migration will retry on next boot.
Stage C — post-deploy verification
Run these checks after the agent has settled (2-3 minutes after the last container restart):
# 1. /health advertises v9 platform version + SDK/plugin floors
curl -fsS https://<your-endpoint>/health | jq '.platform_version, .sdk_floor, .plugin_floor'
# Expected: "9.0.0" (or higher v9.x), v8.1.0 SDK floor, v1.5.0 plugin floor (claude/cursor/codex) + v2.5.0 (openclaw)
# 2. Migration 094 logged Pass-2 row counts for your deployment
docker logs <agent-container> 2>&1 | grep "Migration 094 Pass-2" | head -10
# 3. Empty-org_id rows are zero across customer-data tables (modulo deferred tables)
PGPASSWORD=... psql -h <db-host> -U axonflow -d axonflow <<'SQL'
SELECT 'audit_logs' AS t, COUNT(*) FROM audit_logs WHERE org_id IS NULL OR org_id = ''
UNION ALL SELECT 'mcp_query_audits', COUNT(*) FROM mcp_query_audits WHERE org_id IS NULL OR org_id = ''
UNION ALL SELECT 'static_policies', COUNT(*) FROM static_policies WHERE (org_id IS NULL OR org_id = '') AND tenant_id <> 'global'
UNION ALL SELECT 'dynamic_policies', COUNT(*) FROM dynamic_policies WHERE (org_id IS NULL OR org_id = '') AND tenant_id <> 'global'
UNION ALL SELECT 'service_identities', COUNT(*) FROM service_identities WHERE org_id IS NULL OR org_id = ''
UNION ALL SELECT 'execution_history', COUNT(*) FROM execution_history WHERE org_id IS NULL OR org_id = '';
SQL
# Expected: 0 across all rows. A few rows in agent_audit_logs / llm_call_audits
# may remain non-zero — those tables have their FORCE RLS rollout scheduled
# for a later phase and do not block v9 ship.
# 4. Smoke a real authenticated request.
# Your client_id/client_secret pair was issued when you provisioned the stack.
# For CFN-deployed stacks, the LicenseKey CFN parameter is your client_secret;
# the client_id is the OrganizationID CFN parameter (or a separately-issued
# API key if you have multiple credentials). To retrieve them:
# aws cloudformation describe-stacks --stack-name <your-stack> \
# --query 'Stacks[0].Parameters[?ParameterKey==`OrganizationID`||ParameterKey==`LicenseKey`].[ParameterKey,ParameterValue]'
curl -X POST https://<your-endpoint>/api/v1/process \
-H "Authorization: Basic $(echo -n 'your-client-id:your-client-secret' | base64)" \
-H "Content-Type: application/json" \
-d '{"query": "ping", "request_type": "chat"}'
# Expected: HTTP 200 + the normal response payload
Stage D — soak
Leave the stack in steady state for 24-72 hours before considering the upgrade complete. Watch for:
- Auth failures in agent logs (would indicate an identity-plumbing issue)
RLS policy violationerrors (would indicate aWithOrgScopegap)- Migration retries (would indicate a transient boot issue)
/healthnot advertising v9
If everything is steady, your stack is on v9.
Rollback path
If you need to roll back during or after the upgrade:
Rollback before migration 094 has stamped historical rows
If you catch the issue while the agent is still in its boot loop (migration 094 prod-safety branch fired and refused to stamp anything), simply:
- Revert the image tag to the prior v8.x release.
- Restart the agent + orchestrator services.
- No DB rollback needed — migration 094 either ran cleanly or aborted; there is no partial-state in between.
Rollback after migration 094 has run successfully
This is the snapshot-restore path. The schema additions are not destructive (new columns + new roles), but migration 094 stamped historical rows with your ORG_ID value. Reverting the platform image will not reverse those stamps; you need to restore from the snapshot taken in pre-upgrade step 1.
- Stop the agent + orchestrator services (or scale to 0).
- Restore the pre-upgrade RDS snapshot:
- AWS RDS — important DNS gotcha: a restored DB instance cannot reuse the original DNS endpoint until the original is renamed or deleted. Two options:
- Rename-then-restore (recommended): rename the live instance (
aws rds modify-db-instance --db-instance-identifier <live> --new-db-instance-identifier <live>-broken), then restore the snapshot with the original identifier (aws rds restore-db-instance-from-db-snapshot --db-instance-identifier <live> --db-snapshot-identifier <snapshot>). The agent's existingDATABASE_URLresolves to the restored instance. - Restore-with-new-name + Route 53: restore the snapshot with a new identifier, then update the agent's
DATABASE_URL(CFN parameter) to the new endpoint, or use Route 53 CNAME to repoint.
- Rename-then-restore (recommended): rename the live instance (
- Self-managed Postgres: stop the live instance, restore from your backup tool, restart.
- AWS RDS — important DNS gotcha: a restored DB instance cannot reuse the original DNS endpoint until the original is renamed or deleted. Two options:
- Revert the agent + orchestrator image tags to the prior v8.x release.
- Restart services.
Per-batch rollback for FORCE RLS
Each FORCE RLS migration (099/101/102/103) ships with a _down.sql pair (e.g., 099_v9_rls_b1_sparse_tables_down.sql). These revert FORCE on one batch without reverting the broader v9 schema. Operators do not normally need these; the snapshot-restore path is the contract.
RLS write paths
The v8.0.0 default AXONFLOW_DB_USE_APP_ROLE=true makes FORCE Row-Level Security actually fire on every covered table. Under the legacy v8.x posture the agent connected as the RDS master role, so FORCE RLS was decorative — the master role bypassed every policy via table ownership. Flipping the gate changes the semantics of every INSERT/UPDATE/DELETE that touches a covered table: writes now hit the policy's WITH CHECK predicate (typically org_id = current_setting('app.current_org_id', true)), and a write that runs outside an org-scoped transaction fails with pq: new row violates row-level security policy.
For operators running stock v9, this is invisible: every hot-path write in the shipped codebase is already wrapped against this contract (the agent uses WithOrgScope for per-request writes, SECURITY DEFINER helpers for the auth-bootstrap window, and the axonflow_platform_admin pool for cross-org workers). For operators running forks or in-tree customizations — custom connectors, custom auth shims, in-tree extensions to handlers — the flip is gated on a write-path audit.
Refuse-to-boot guard
The v8.0.0 agent, orchestrator, and customer-portal binaries refuse to boot when AXONFLOW_DB_USE_APP_ROLE=true (the default — also active when the env is unset) and AXONFLOW_DB_PLATFORM_ADMIN_URL is not set. The binary exits with a FATAL log line naming both env vars; the previous behavior was a silent fallback to the request-traffic pool, which under FORCE RLS caused cross-org workers (marketplace metering, community-saas sweep / recovery, node monitor, customer-portal admin handlers) to quietly return zero rows.
The FATAL log looks like:
[Marketplace] FATAL: AXONFLOW_DB_PLATFORM_ADMIN_URL is required when AXONFLOW_DB_USE_APP_ROLE=true (silent fallback to a non-BYPASSRLS pool would defeat FORCE RLS — cross-org metering/sweep/recovery/monitoring would silently return 0 rows or undercount). Set AXONFLOW_DB_PLATFORM_ADMIN_URL to a DSN authenticating as axonflow_platform_admin, or set AXONFLOW_DB_USE_APP_ROLE=false to opt out of the v8.0.0 default and run under the legacy v8.x posture.
The bracketed prefix identifies which worker triggered the guard. The five worker prefixes are Marketplace, NodeMonitor, CSAAS-SWEEP, CSAAS-RECOVERY, and customer-portal.
If you see this FATAL on first boot, the fix is either:
- Recommended: populate
AXONFLOW_DB_PLATFORM_ADMIN_URLwith a DSN that authenticates asaxonflow_platform_admin(mirror ofAXONFLOW_DB_APP_ROLE_URL— different role, same host/db). See step 4.5 above for the canonical wiring. - Phased-rollout: set
AXONFLOW_DB_USE_APP_ROLE=falseexplicitly on the agent + orchestrator + customer-portal task definitions to opt out of the v8.0.0 default and run under the legacy v8.x posture. The guard is a no-op under that flag.
Customized-handler audit (forks only)
If you maintain a fork or have in-tree customizations to handlers that INSERT/UPDATE into application tables, audit those write paths before flipping AXONFLOW_DB_USE_APP_ROLE=true. The rule: every customized write into a Row-Level-Security-enabled table must go through one of three patterns.
| Pattern | When to use | Wrapper |
|---|---|---|
| Per-request org-scoped transaction | Most application writes — the request carries an org_id in its auth context | Wrap the write in WithOrgScope(ctx, db, orgID, func(tx *sql.Tx) error { … }) (agent) or withRequestOrgScope(r, h.db, fn) (customer-portal). Sets app.current_org_id inside the same transaction. |
| SECURITY DEFINER helper | Auth-bootstrap or pre-org-context writes (the org is being minted in this request — registration, signup, tenant-create) | Call a SECURITY DEFINER function created by a migration. Examples shipped in v8.0.0: auth_lookup_api_key(), auth_touch_api_key(). Add your own via a new migration if your customization needs one. |
| Admin (BYPASSRLS) pool | Cross-org workers (sweeps, mirrors, aggregators) that genuinely iterate across orgs | Open the pool via OpenPlatformAdminConnection. Requires AXONFLOW_DB_PLATFORM_ADMIN_URL to be set (the refuse-to-boot guard ensures this). |
To find your customized write sites, grep the fork:
# In your fork's root:
grep -rnE 'db\.(Exec|QueryRow|Query)|tx\.(Exec|QueryRow|Query)' --include='*.go' \
-- platform/ ee/ | grep -E 'INSERT|UPDATE|DELETE'
For each match, check the surrounding function:
- Is the call inside a
WithOrgScope/withRequestOrgScopeclosure? → safe. - Is the call on a
*sql.DBopened viaOpenPlatformAdminConnection? → safe. - Is the call a SELECT-only that goes through a SECURITY DEFINER helper? → safe.
- Otherwise: this is a customized write that needs a wrapper before you flip
AXONFLOW_DB_USE_APP_ROLE=true.
The migrations that enable Row-Level Security ship as part of the v8.0.0 image (the 018 template enables RLS across a base set of customer-data tables; later FORCE batches add FORCE to specific tables). Run psql -c '\d <table>' on any candidate table to see its RLS posture — Row security: enabled means writes are gated when not connecting as the table owner.
Staged rollout pattern
- Audit your fork (above). Fix every customized write that isn't wrapped.
- Stage with
AXONFLOW_DB_USE_APP_ROLE=trueon a staging stack carrying production-shaped traffic for at least one full diurnal cycle. - Watch the agent + orchestrator logs for
pq: new row violates row-level security policy. Each such line names the table the customized handler tried to write outside an org-scoped transaction. - Roll back via the env flip if you surface violations: set
AXONFLOW_DB_USE_APP_ROLE=falseand re-run under the legacy posture while you fix the handler. The env override is intended as a temporary phased-rollout lever, not a permanent posture. - Re-flip to
trueonce staging is clean for one full diurnal cycle.
Operators running stock v9 can flip immediately after a successful preflight + standard staging soak — Check 8 will already have surfaced the env-pair requirement.
FAQ
What's new in v9 (in plain language)?
Three things, only the third of which is operator-visible:
- The codebase now uses three distinct identifiers (
org_idcustomer,client_idcredential,deployment_idlicense) instead of overloadingtenant_idfor all three. This is invisible on the wire — JSON field names + headers + Basic Auth all keep working. - A new
client_idcolumn was added to several audit/policy tables. Existing rows have it backfilled to the same value astenant_id. SDKs ≥v8.1.0 emit a newX-Client-IDheader that fills it explicitly. - Two new Postgres roles (
axonflow_app_rolefor request traffic,axonflow_platform_adminfor cross-org workers) were created.FORCE ROW LEVEL SECURITYis enabled on a handful of tables. This affects only direct SQL on a shared DB — see the v7 → v8 migration guide § Change 1.
What's deprecated in v9?
- The
X-Tenant-IDHTTP header (alias forX-Client-ID). Continues to work through all of v9; planned removal in v10. - The
tenant_idJSON field in some API responses (also an alias forclient_id). Same compatibility window. - The license payload's V2
org_idfield (alias for V3deployment_id). Continues to work; planned removal in v10.
Do I need to upgrade my SDKs?
No. v7.x and v8.x SDKs continue to authenticate via X-Tenant-ID through the entire v8.0.0 lifecycle. We recommend upgrading at your normal cadence, but it is not a v9 ship requirement.
Does v9 change my license format?
No. V3 license payloads add a deployment_id field but retain org_id as a back-compat alias. Existing V2 licenses continue to validate. You do not need to regenerate licenses for v9.
What happens if I forget to set ORG_ID before upgrading?
If DEPLOYMENT_KIND=production (the CFN default on real stacks), migration 094 will refuse to run with an explicit production-safety abort message. The agent boot-loops harmlessly until you set ORG_ID and restart. Historical rows are not touched. This guard was added specifically to prevent silent stamping with the dev sentinel.
If DEPLOYMENT_KIND=dev (the docker-compose default), migration 094 will accept the local-dev-org fallback and stamp historical rows with that value. This is the intended path for local development.
What does axonflow_app_role do?
It is a non-owner Postgres role used by the agent + orchestrator when AXONFLOW_DB_USE_APP_ROLE=true. Pre-v9 (v8.x default), the agent connects as the RDS master role, which bypasses RLS. Connecting as axonflow_app_role makes FORCE RLS effective — queries respect the app.current_org_id session variable.
v8.0.0 defaults AXONFLOW_DB_USE_APP_ROLE=true (CHANGELOG). Fresh deploys connect as the app role and rely on FORCE RLS for tenant isolation. If you want to upgrade the platform image to v8.0.0 without immediately flipping the runtime role (phased rollout), set AXONFLOW_DB_USE_APP_ROLE=false explicitly on your agent + customer-portal task definitions. This preserves v8.x semantics until you finish provisioning the role passwords + DSN secrets.
Use scripts/operators/provision-app-role.sh (referenced in step 4.5 above) to set the role passwords. You then construct the two DSN env vars (AXONFLOW_DB_APP_ROLE_URL, AXONFLOW_DB_PLATFORM_ADMIN_URL) from the passwords you supplied and add them to your task definitions.
What does axonflow_platform_admin do?
It is a BYPASSRLS Postgres role for legitimate cross-org workers (sweep workers, mirror Lambdas, aggregators, support tooling). These workers iterate across all orgs and cannot use the app role (which scopes to one org per transaction). The role is created by migration 098 but is not used until you wire your own cross-org tooling against it (or, for the AxonFlow-shipped sweep / node-monitor / recovery handler, set AXONFLOW_DB_PLATFORM_ADMIN_URL).
What about API key authentication — doesn't auth happen before app.current_org_id is set?
Yes, and v8.0.0 ships SECURITY DEFINER helper functions to handle this chicken-and-egg. The in-VPC enterprise auth path queries api_keys, customers, and pricing_tiers BEFORE app.current_org_id is established. Migration 117 ships two SECURITY DEFINER functions — auth_lookup_api_key() and auth_touch_api_key() — that run with elevated privileges for the narrow auth-bootstrap window only, then traffic drops back to the app role for the rest of the request. api_keys and customers are FORCE-RLS-protected for all non-auth traffic. No operator action required; the migration runs automatically on first v8.0.0 boot.
Will my customer-portal continue to work?
Yes. The first-party ee/platform/customer-portal-ui shipped in this repo is fully wired for v9 — nodes.go, export.go, connectors.go, sso.go, and auth/saml/service.go wrap every read/write of FORCE-RLS-protected tables in api.withRequestOrgScope(r, h.db, fn). The portal's existing handlers are RLS-correct out of the box.
Self-hosted source forks that called the removed middleware.RLSMiddleware / SetRLSContextForSession / ResetRLSContext / WithRLS family (pool-scope GUC helpers) will not compile against v8.0.0. Migrate to api.withRequestOrgScope(r, h.db, fn) — a request-scoped helper that opens a single *sql.Conn from the pool, sets app.current_org_id on that connection, runs the handler closure, and releases the connection. The pool-scope approach was unsafe under FORCE RLS because the GUC landed on one connection while the next handler statement might run on a different one. Diagnostic helpers in middleware/rls.go (GetCurrentOrgID, VerifyRLSActive, GetRLSStats, RLSHealthCheck) are retained.
References
- v7 → v8 Migration Guide — customer-API surface (JSON fields, headers, SDKs)
- AxonFlow source-available repo (BSL 1.1) — for the source code referenced in this guide (e.g.,
scripts/deployment/v9_self_hosted_preflight.sh,scripts/operators/provision-app-role.sh, and the agent / customer-portal Go source)
Enterprise Rollout Checklist
Use this page as part of the protected enterprise operating model:
- confirm the deployment shape in Deployment Operations and Deployment Operations
- check identity and access requirements in Authentication, SSO Configuration, and SCIM Overview
- connect governance workflows to Policy Management, Approvals Queue, and Audit Logging Guide
- use Support Escalation when the rollout needs escalation paths, incident context, or production-readiness review
