Skip to main content

v8.0.0 Enterprise Migration Guide

This guide is the detailed enterprise companion to the v7 to v8 Migration Guide. It covers multi-tenant DSN secrets wiring, customer-portal admin handlers, NodeMonitor, Marketplace and community-saas-mode env combinations, the customer-portal API consumer breaking change, and the source-fork audit recipe.

If you are running a single-tenant self-hosted Community deployment, the simple-path community upgrade in the public guide is sufficient — you do not need this page.

Audience

This guide is for operators of:

  • In-VPC Enterprise deployments running customer-portal admin handlers, multi-node enforcement, or AWS Marketplace metering.
  • Community-SaaS mode deployments running cross-org workers (sweep, recovery, tenant-delete).
  • Self-hosted source forks with customized handlers writing to FORCE-RLS-enabled tables.

Deployment-shape decision recap

From the public guide, the env-var requirement matrix:

Deployment shapeAXONFLOW_DB_APP_ROLE_URLAXONFLOW_DB_PLATFORM_ADMIN_URL
Community, single-tenant self-hostedYes (or DATABASE_URL fallback in dev)No
Enterprise self-hosted with customer-portalYesYes
Enterprise self-hosted with multi-node enforcementYesYes
Enterprise self-hosted with AWS Marketplace meteringYesYes
Community-SaaS mode (sweep / recovery / tenant-delete)YesYes
Single-tenant enterprise without any of the aboveYesNo

If you fall into any of the bolded rows, the rest of this guide applies.

Step-by-step enterprise upgrade

1. Take a fresh database snapshot

The rollback contract for v8.0.0 is snapshot restore + image revert. Per-batch _down.sql migration files exist (099_v9_rls_b1_sparse_tables_down.sql, 103_v9_rls_b9_identity_force_down.sql, etc.) but they revert one batch, not the full identity-model rollback. For a full rollback, you need a snapshot taken immediately before the upgrade.

  • AWS RDS: trigger a manual snapshot via the AWS console or aws rds create-db-snapshot --db-instance-identifier <id> --db-snapshot-identifier <name>. Wait for status available before proceeding.
  • Self-managed Postgres: use your existing backup tool. Confirm the backup completed successfully.
  • Containerised Postgres in your own VPC: take an EBS volume snapshot of the underlying disk, OR run a pg_dump to a separate object store.

Record the snapshot identifier. You'll need it for rollback.

2. Confirm ORG_ID and DEPLOYMENT_KIND env on the agent task

ORG_ID is the deployment's customer/account identifier. Migration 094's Pass-2 backfill stamps this onto every historical empty-org_id row.

HostingWhere ORG_ID lives
AWS Marketplace / CFN cloudformation-ecs-fargate.yamlOrganizationID CFN parameter → agent task def's ORG_ID env
Docker Compose (self-hosted enterprise)ORG_ID line in docker-compose.enterprise.yml or .env
Custom Kubernetes / EC2 / on-premORG_ID env var on the agent container/process

If ORG_ID is unset on a production deployment, the agent's deployment-org getter falls back to local-dev-org, and migration 094 will refuse to run (the production-safety guard). Set ORG_ID and redeploy before continuing.

DEPLOYMENT_KIND distinguishes a real production deployment from a local docker-compose / community-mode install. Migration 094 uses it as a defence-in-depth signal alongside ORG_ID.

  • CFN-deployed stacks: DEPLOYMENT_KIND=production is hardcoded in cloudformation-ecs-fargate.yaml. No action needed.
  • Docker Compose: defaults to dev. Override by setting DEPLOYMENT_KIND=production in your shell or .env file before docker-compose up -d IF this docker-compose is your production deployment shape.
  • Custom hosting: set DEPLOYMENT_KIND=production on the agent task.

3. Run the preflight script

The preflight script lives at scripts/deployment/v9_self_hosted_preflight.sh in the getaxonflow/axonflow-enterprise repository (BSL 1.1, source-available). It is dependency-free (bash + psql + optionally aws CLI for ECS discovery). Download it without cloning the repo:

curl -fsSLO https://raw.githubusercontent.com/getaxonflow/axonflow-enterprise/main/scripts/deployment/v9_self_hosted_preflight.sh
chmod +x v9_self_hosted_preflight.sh

For your DATABASE_URL, the Marketplace CFN template stores the RDS master password in AWS Secrets Manager:

STACK_NAME=<your-stack-name>
DB_ENDPOINT=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
--query 'Stacks[0].Outputs[?OutputKey==`DatabaseEndpoint`].OutputValue' --output text)
DB_SECRET_ARN=$(aws cloudformation describe-stacks --stack-name "$STACK_NAME" \
--query 'Stacks[0].Outputs[?OutputKey==`DatabaseSecretArn`].OutputValue' --output text)
DB_PASSWORD=$(aws secretsmanager get-secret-value --secret-id "$DB_SECRET_ARN" \
--query SecretString --output text | jq -r .password)
export DATABASE_URL="postgres://axonflow:${DB_PASSWORD}@${DB_ENDPOINT}:5432/axonflow?sslmode=require"

Run the preflight:

DEPLOYMENT_KIND=production \
ORG_ID=acme-corp \
RDS_INSTANCE_IDENTIFIER=axonflow-acme-prod-db \
./v9_self_hosted_preflight.sh

The script reports PASS/WARN/FAIL across eight checks. Do not proceed if the script returns a FAIL.

4. Provision axonflow_app_role and axonflow_platform_admin

v8.0.0 ships scripts/operators/provision-app-role.sh:

curl -fsSLO https://raw.githubusercontent.com/getaxonflow/axonflow-enterprise/main/scripts/operators/provision-app-role.sh
chmod +x provision-app-role.sh

# Operator supplies the passwords — the script does not generate or print them:
export APP_ROLE_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"
export PLATFORM_ADMIN_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"

# Stash both passwords in your secrets manager BEFORE running:
DATABASE_URL="$DATABASE_URL" ./provision-app-role.sh

The script is idempotent. Re-running with mismatched passwords exits 1 unless you set FORCE_RESET=1 to rotate.

Construct the two DSN env vars:

AXONFLOW_DB_APP_ROLE_URL=postgres://axonflow_app_role:$APP_ROLE_PASSWORD@<host>:5432/axonflow?sslmode=require
AXONFLOW_DB_PLATFORM_ADMIN_URL=postgres://axonflow_platform_admin:$PLATFORM_ADMIN_PASSWORD@<host>:5432/axonflow?sslmode=require

Store both in your secrets manager and reference them from your agent + customer-portal + orchestrator task definitions:

  • CFN-deployed stacks: store as two secrets axonflow/<stack>/app-role-url and axonflow/<stack>/platform-admin-url; reference via Secrets: in the task def YAML.
  • Docker Compose: load via .env file or --env-file flag.
  • Kubernetes: mount as Secret-backed env vars.

5. Per-feature env-var combinations

Required env vars by deployment feature:

FeatureAXONFLOW_DB_APP_ROLE_URLAXONFLOW_DB_PLATFORM_ADMIN_URLAdditional env
Stock in-VPC enterprise (no portal, no marketplace, no multi-node)requirednot required
Customer-portal admin handlersrequiredrequiredADMIN_API_KEY (from secrets manager)
Multi-node enforcementrequiredrequiredENABLE_NODE_MONITOR=true
AWS Marketplace meteringrequiredrequiredMarketplace CFN parameters (auto-wired)
Community-SaaS mode (sweep + recovery + tenant-delete)requiredrequiredCOMMUNITY_SAAS_SWEEP_ENABLED=true, COMMUNITY_SAAS_REGISTRATION_ENABLED=true
GDPR right-to-erasure endpoints (/api/v1/tenant/{id}/delete-*)requiredrequired

Setting AXONFLOW_DB_PLATFORM_ADMIN_URL to an empty string is treated the same as unset — the refuse-to-boot guard fires.

Customer-portal API consumer breaking change (v8.0.0)

Affects: direct HTTP API consumers of the customer-portal role APIs. In v8+ the current session-authenticated routes are /api/v1/roles, /api/v1/roles/{id}, /api/v1/users/{email}/roles, and /api/v1/users/{email}/roles/{roleId}. Older internal clients and notes may refer to the pre-v8 customer-portal role paths.

Not affected: the bundled customer-portal UI (ee/platform/customer-portal-ui), which consumes its own backend and updates atomically with the schema rename.

The custom_roles.tenant_id and role_assignments.tenant_id columns were renamed to org_id (migration 111 in the v8.0.0 series). The JSON tag on the API response payloads renamed correspondingly:

  • Pre-v8.0.0: {"tenant_id": "acme-corp", "role_name": "...", ...}
  • v8.0.0+: {"org_id": "acme-corp", "role_name": "...", ...}

If you wrote a tool that consumes the customer-portal HTTP API for roles or role-assignments directly (your own admin scripts, an SSO provisioning integration, a compliance audit pipeline that polls the role list), update your consumer to read org_id instead of tenant_id. The semantic meaning is unchanged — same row, same value, renamed JSON field.

If you write to these endpoints, the request payload accepts both tenant_id and org_id for one compatibility window; the new canonical field is org_id.

Find your consumers

To find code in your own repos that reads these endpoints or older customer-portal role paths:

grep -rnE '(customer-portal/(roles|role-assignments)|/api/v1/(roles|users/.*/roles))' --include='*.py' --include='*.js' --include='*.ts' --include='*.go' .

For each match, check the response handler for tenant_id references in the role / role-assignment payload — those need to become org_id.

Source-fork audit recipe (customized handlers must wrap FORCE-RLS writes)

The full audit recipe — broader-scope than the public guide's brief mention.

Step A — grep for direct write sites

In your fork's root:

grep -rnE 'db\.(Exec|QueryRow|Query)|tx\.(Exec|QueryRow|Query)' --include='*.go' \
-- platform/ ee/ | grep -E 'INSERT|UPDATE|DELETE'

Step B — check the surrounding function for each match

PatternVerdict
Call is inside a WithOrgScope(ctx, db, orgID, func(tx *sql.Tx) error { ... }) closureSafe
Call is inside a withRequestOrgScope(r, h.db, fn) closure (customer-portal)Safe
Call is on a *sql.DB opened via OpenPlatformAdminConnection()Safe (admin pool, BYPASSRLS)
Call goes through a SECURITY DEFINER helper (auth_lookup_api_key(), etc.)Safe
None of the aboveNeeds a wrapper before flipping AXONFLOW_DB_USE_APP_ROLE=true

Step C — identify which RLS-enabled tables are touched

Run psql -c '\d <table>' on each candidate table. Output includes Row security: enabled (and FORCE if FORCE RLS is on). FORCE-RLS-enabled tables in v8.0.0 include (non-exhaustive):

  • organizations, tenants
  • audit_logs, mcp_query_audits, audit_archive, audit_retention_config, decision_chain
  • dynamic_policies, policy_overrides
  • connector_configs, connectors, agent_heartbeats, node_violations
  • community_saas_registrations
  • saml_configurations
  • usage_events, deployment_upgrades
  • api_keys, customers

Step D — staged rollout

  1. Stage 1: in staging, set AXONFLOW_DB_USE_APP_ROLE=true + provision both DSNs. Carry production-shaped traffic through at least one full diurnal cycle.
  2. Stage 2: watch agent + orchestrator + customer-portal logs for pq: new row violates row-level security policy. Each line names the table the customized handler tried to write outside an org-scoped transaction.
  3. Stage 3: also watch for silent zero-row writesDELETE under FORCE RLS evaluates USING (not WITH CHECK), so a DELETE outside org scope filters silently to zero rows without error. The mutation gate shape is "row STILL EXISTS after 'successful' response" not "500 / 42501". Audit your customized DELETE call sites accordingly.
  4. Stage 4: if violations surface, set AXONFLOW_DB_USE_APP_ROLE=false in staging to keep traffic flowing while you fix the handler. Re-flip once the violation queue is clean.
  5. Stage 5: production flip after staging is clean for one diurnal cycle.

Customer-portal admin handlers — DSN format detail

The customer-portal admin API key + DSN wiring landed in v8.0.0:

  • The AdminAPIKeySecret CFN resource generates a strong API key, stored in AWS Secrets Manager.
  • The customer-portal task definition references the secret as ADMIN_API_KEY env var.
  • Customer-portal admin handlers compare incoming X-Admin-API-Key header against ADMIN_API_KEY — anonymous requests get HTTP 401.
  • Admin handlers open AXONFLOW_DB_PLATFORM_ADMIN_URL for cross-org operations (org create / list / quota update / deletion).

The seed workflow that creates the first org on a fresh stack passes the API key via masked X-Admin-API-Key header.

NodeMonitor + Marketplace + customer-portal admin DSN combinations

If your stack runs multiple cross-org features simultaneously:

# Agent task def env block (cloudformation-ecs-fargate.yaml):
- Name: AXONFLOW_DB_USE_APP_ROLE
Value: "true"
- Name: AXONFLOW_DB_APP_ROLE_URL
ValueFrom: !Ref AppRoleDSNSecretArn
- Name: AXONFLOW_DB_PLATFORM_ADMIN_URL
ValueFrom: !Ref PlatformAdminDSNSecretArn
- Name: ENABLE_NODE_MONITOR
Value: "true"
- Name: COMMUNITY_SAAS_SWEEP_ENABLED
Value: "false" # only true on csaas-mode stacks
- Name: ENABLE_AWS_MARKETPLACE_METERING
Value: "true"
# Customer-portal task def env block:
- Name: AXONFLOW_DB_USE_APP_ROLE
Value: "true"
- Name: AXONFLOW_DB_APP_ROLE_URL
ValueFrom: !Ref AppRoleDSNSecretArn
- Name: AXONFLOW_DB_PLATFORM_ADMIN_URL
ValueFrom: !Ref PlatformAdminDSNSecretArn
- Name: ADMIN_API_KEY
ValueFrom: !Ref AdminAPIKeySecretArn
- Name: ENVIRONMENT
Value: "production" # gates HandleForgotPassword to HTTP 501 in production

Rollback path

Before migration 094 has stamped historical rows

If you catch the issue while the agent is still in its boot loop (migration 094 prod-safety branch fired and refused to stamp anything):

  1. Revert the image tag to the prior v8.x release.
  2. Restart the agent + orchestrator services.
  3. No DB rollback needed — migration 094 either ran cleanly or aborted; no partial-state in between.

After migration 094 has run successfully

Snapshot-restore path. The schema additions are not destructive (new columns + new roles), but migration 094 stamped historical rows with your ORG_ID value. Reverting the platform image will not reverse those stamps.

  1. Stop the agent + orchestrator services (or scale to 0).
  2. Restore the pre-upgrade RDS snapshot:
    • AWS RDS rename-then-restore (recommended): rename the live instance, then restore the snapshot with the original identifier. The agent's existing DATABASE_URL resolves to the restored instance.

      aws rds modify-db-instance --db-instance-identifier <live> --new-db-instance-identifier <live>-broken
      aws rds restore-db-instance-from-db-snapshot --db-instance-identifier <live> --db-snapshot-identifier <snapshot>
    • Restore-with-new-name + Route 53: restore the snapshot with a new identifier, then update DATABASE_URL (CFN parameter) to the new endpoint, or use Route 53 CNAME to repoint.

    • Self-managed Postgres: stop the live instance, restore from your backup tool, restart.

  3. Revert the agent + orchestrator + customer-portal image tags to the prior v8.x release.
  4. Restart services.

Per-batch rollback for FORCE RLS

Each FORCE RLS migration ships with a _down.sql pair. These revert FORCE on one batch without reverting the broader v8.0.0 schema. Operators do not normally need these; the snapshot-restore path is the contract.

SDK / PluginRecommendedMinimum
Go SDKv8.1.0v8.0.0
Python SDKv8.1.0v8.0.0
TypeScript SDKv8.1.0v8.0.0
Java SDKv8.1.0v8.0.0
Rust SDKv0.3.1v0.2.0
Plugin (claude / cursor / codex)v1.5.0v1.4.0
openclawv2.5.0v2.4.0

The platform's /health endpoint advertises the current floor — client SDKs older than the minimum emit a warning but continue to work through the entire v8.0.0 lifecycle (deprecated-alias compatibility window).

Additional v8.0.0 guarantees in this area

  • Agent agent_heartbeats UPSERT under FORCE RLS — the agent's sendHeartbeat path wraps the UPSERT in an org-scoped transaction so the heartbeat write succeeds under axonflow_app_role against the FORCE-RLS-enabled table (migration 107). Operators staging the role flip do not need to watch for heartbeat-related RLS violations as a special case — the stock handler is RLS-correct.
  • AST audit walker for write-path coverage — a build-time guard preventing customized writes from bypassing the three wrap patterns is in place; CI fails on any unwrapped INSERT / UPDATE / DELETE into an RLS-gated table. The source-fork audit recipe in Step A–D above is CI-enforced on the stock codebase — operators maintaining forks inherit the same guard when they pick up the v8.0.0 codebase and customize on top. Step A's manual grep stays useful for pre-merge local checks; Step D's staged rollout is backstopped by the build-time guarantee that no new bypass crept in.

See also

Enterprise Rollout Checklist

Use this page as part of the protected enterprise operating model: