Skip to main content

Evaluation Rollout Guide

The Evaluation tier exists for the moment when Community is no longer enough to answer the real question. The real question is not "does AxonFlow compile and run?" It is "can this become part of a production-grade operating model?"

This guide is for the engineer or platform owner running that next step.

What Evaluation Is For

Evaluation is best used to validate the controls that usually matter just before production:

  • real approval queues
  • policy simulation before rollout
  • evidence export for governance review
  • larger policy, execution, and provider limits
  • more realistic workload and operator behavior

The current evaluation tier gives you:

  • 50 tenant policies
  • 5 organization policies
  • 5 custom policy connectors
  • 3 LLM providers
  • 100 pending approvals
  • 14 day audit retention
  • policy simulation
  • evidence export up to 5000 records and 3 exports per day
  • 24-hour approval expiry
  • 300/day policy simulation cap
  • 50 inputs/run impact report limit

See Community vs Evaluation vs Enterprise for the complete limit profile.

That is enough to run a meaningful internal pilot rather than only a developer proof of concept.

Ready to start? Request an Evaluation License

Pick The Right Evaluation Scope

A strong evaluation scope usually has all three of these:

  1. one real application or workflow that matters
  2. one workflow path with meaningful governance or approval risk
  3. one stakeholder beyond the core engineering team

Weak evaluations are usually too small. They prove that the platform starts, but they do not prove that the organization can operate it.

Good examples:

  • a customer-support assistant with governed connector access
  • an internal research assistant with redaction and evidence requirements
  • a multi-step workflow that requires review before execution of risky actions

Phase 1: Prove Technical Fit

Use the first phase to answer:

  • does the SDK integration fit the app architecture?
  • do the right connectors and providers exist?
  • do policies catch the right classes of risky behavior?

This is where Community To Enterprise Migration and Deployment Mode Matrix are most useful.

Phase 2: Prove Operational Fit

This is the real evaluation phase. Validate:

  • approval queues with real reviewer behavior
  • policy simulation before policy rollouts
  • execution visibility and incident handling
  • evidence export for internal governance review
  • whether the current limits are enough for the intended pilot

This is where pages like Human-in-the-Loop and Execution Viewer matter.

Phase 3: Prove Organizational Fit

Use the final phase to answer:

  • would security sign off on this rollout model?
  • can reviewers and operators use it without engineering babysitting everything?
  • does the pilot already point toward identity, portal workflows, or enterprise connectors?

That is the phase where the enterprise decision usually becomes obvious.

Exit Criteria For A Good Evaluation

Before you call the evaluation successful, you should have answers to:

  1. which workflows deserve approval gates?
  2. which policies need simulation before rollout?
  3. how will operators inspect, replay, and export executions?
  4. what are the first scale or governance limits you are likely to hit?
  5. is Evaluation enough for the intended production pilot, or is Enterprise the realistic landing zone?

If those answers are still fuzzy, the evaluation probably measured developer excitement more than platform fit.

Signals That Evaluation Should Turn Into Enterprise

The strongest signals are:

  • several teams want to share the platform
  • non-engineers need approval or portal workflows
  • SSO or SCIM becomes mandatory
  • security, procurement, or compliance wants stronger operational evidence
  • enterprise connectors or provider management become part of the plan

That is when Enterprise Overview and Enterprise Rollout Checklist become more relevant than one more pilot iteration.