Skip to main content

How AxonFlow Detection Works

Concept page. This explains how detection decides: what the detector tiers are, which detectors evaluate which tools, and how to read the result. For the PII detectors themselves, see PII Detection; for per-org enforcement actions, see Detection Posture; for unblocking a false positive, see Session Overrides.

Every governed request and response is evaluated against the policies enabled for your organization. The outcome is a decision — allowed, blocked, redacted, needs_approval, or error — recorded, along with the policy that produced it when one matched. Behind every decision sit two questions:

  1. Does this content match something a policy is looking for? — the detection tiers.
  2. Should this detector evaluate this tool at all? — capability scoping.

This page explains both, so that when a decision surprises you, you can tell which kind of detector fired and why it was looking at your request in the first place.

The three tiers of detection

Detectors are not all built the same way, and the differences matter when you are reading a decision. Broadly, a detector's confidence comes from one of three sources: the internal structure of the match, the context around the match, or the syntax of an operation.

TierWhat it checksExamplesSignal strength
Structural / checksumThe match satisfies the identifier's internal structureCredit cards, IBANs, bank routing numbers, Indonesian NIK / NPWPStrongest — the identifier's internal structure has to check out
Shape with contextThe match has the right shape and the surrounding text supports itEmails, phone numbers, IP addressesStrong when context agrees; context can also rule a match out
Keyword + operation syntaxThe match resembles the syntax of a risky operationSQL injection, dangerous shell commands, dynamic code execution, config-file writesDepends on where it runs — which is what capability scoping addresses

Tier 1: structural and checksum validation

Many high-value identifiers have internal structure, and AxonFlow validates that structure before a match counts — or, for some identifiers, before it is scored at full confidence:

  • Credit card numbers must pass the Luhn checksum. A random sixteen-digit number that fails Luhn is not treated as a card number.
  • IBANs must pass the standard MOD-97 check.
  • US bank account references are validated against the ABA routing checksum.
  • Indonesian NIK numbers encode a region and a date of birth. A sixteen-digit number whose embedded date fields are out of range is not a NIK, and is not treated as one. NPWP tax numbers are checked against their expected formats — the punctuated legacy form and the current sixteen-digit form — before a match is scored at full confidence.

This tier produces the most reliable matches: the identifier proves itself. It is also why a "why wasn't this detected?" question about a national ID is often answered by the test data — a structurally invalid NIK is correctly not detected, because it could not be issued to a real person.

Tier 2: shape with context

Emails, phone numbers, and IP addresses have a recognizable shape, but shape alone over-matches ordinary technical text. In this tier, a candidate match is weighed against its surrounding context — and for some types, context alone can rule a match out:

  • A dotted number immediately preceded by a version or firmware label — firmware 1.2.3.4, "ver": "10.20.30.40" — is a version string, not an IP address, and is rejected.
  • Private-network addresses (10.x.x.x, 192.168.x.x) are scored lower than routable addresses, because they rarely identify a person (a lower score, not an exemption: a valid match can still be flagged). Never-routable and special-purpose addresses (loopback, 0.0.0.0, link-local, carrier-grade NAT, multicast, the reserved documentation blocks) are rejected outright, because they cannot identify a person.
  • A phone-shaped digit string that repeats a single digit, or falls outside plausible phone-number lengths, is rejected outright. Context words shift the score in both directions: "call" or "mobile" nearby raises it, while "zip", "amount", or "price" lowers it.

The principle: in this tier, a match must not only look like the thing — the text around it has to be consistent with it being the thing.

Tier 3: keyword and operation syntax

Detectors for SQL injection, dangerous shell commands (including reverse shells and risky downloads), credential-file access, internal-network and cloud-metadata fetches, path traversal, package installation, dynamic code execution, and configuration-file writes look for the syntax of an operation: a UNION SELECT clause, a stacked query, a shell pipeline that downloads and executes, a code-evaluation call, a write to an agent's config file.

These detectors exist because the operations they model are genuinely dangerous where they can execute. But ordinary prose can resemble operation syntax — a sentence about revoking someone's access, a markdown table divider, a filename mentioned in a runbook. That makes this tier the widest surface for false positives on documentation text — and it is exactly the tier that capability scoping addresses.

Capability scoping: detectors run where the capability exists

A detector models a threat. Execution-class detectors — the tier-3 families above — model one specific threat: "this input is about to be executed by the governed tool." That model is right for a shell tool, a database connector, or a file-write tool. It is wrong for a tool whose only capability is writing prose into a SaaS document, such as a Jira issue or a Confluence page: there is no executor behind the tool, so the input is documentation, not something that will be executed.

Capability scoping applies the threat model to the tool's actual capabilities. Tools are classified into capability classes:

Capability classMeaningExecution-class detectors
text-documentWrites prose to a SaaS document API. No shell, no query execution, no local file access, no arbitrary network fetch.Skipped
shell-execRuns shell commands.Evaluated
db-queryExecutes a query language against a datastore.Evaluated
file-writeWrites the local or agent filesystem.Evaluated
networkIssues arbitrary outbound requests.Evaluated
unknownAnything not positively classified.Evaluated — fail-closed

Only the positive text-document classification changes behavior. Every other class — including anything unknown — receives full evaluation.

In practice: a tool classified as text-document is not checked for SQL injection, because it cannot execute SQL. A shell tool is. A database connector is. And the same payload that passes through a document tool is still evaluated when it is sent to a tool that could execute it — and blocked wherever the policy's action is block.

What always runs, everywhere

Capability scoping only ever skips detectors that are positively classified as execution-class. The content-borne families — detectors that care about what the text contains, not what it would do if executed — evaluate every tool, including text-document tools:

  • PII detection, in every jurisdiction. Content is content: a national ID or an email address written into a Jira ticket is still a data leak, whatever the tool's capabilities. PII detection staying universal is the single most important invariant of capability scoping.
  • Prompt-injection guards. An injection string written into a document is stored prompt injection — it re-enters a model's context the next time the document is read back. So these guards evaluate documentation tools by design.
  • Secrets and sensitive-data detection, and the compliance policy families.

Fail-closed by default

Scoping is designed so that classification can only ever relax evaluation for tools positively classified as text-document — never widen a gap:

  1. An unknown tool gets everything. A tool identity that is empty or not positively classified is evaluated by every enabled detector.
  2. An unclassified policy always runs. A detector is skipped only when it is positively classified as execution-class. New policies — including policies your own team authors — evaluate everywhere until they are explicitly classified.
  3. Classification is never caller-asserted. The classification happens server-side, against AxonFlow's own registry of known text-document tools. There is no request field through which a client can claim "I am text-only." (The tool's name is reported by the same client that enforces the decision — the same trust anchor as the rest of the integration; what no client can do is assert a capability class directly.)

Tools that AxonFlow itself executes against managed connectors are never scoped at all: a connector's registered name is free-form text chosen at registration, not a capability statement, so a datastore connector merely named like a document tool keeps full SQL enforcement.

One honest boundary: capability scoping is descriptive, not a guarantee. It removes a class of mismatched evaluation — it does not mean documentation content is never flagged. Content-borne detectors still evaluate documentation tools by design, and execution-class detectors still evaluate every tool that is not positively classified as text-document.

Availability

Capability scoping is available on platform v9.4.0+ (see the release notes for your version). On v9.3.1 and earlier, there is no capability scoping: execution-class detectors evaluate every tool. Everything else on this page — the detection tiers, the universal content-borne families, reading a decision, and the override path — applies to those versions as-is.

Reading a decision

Every decision AxonFlow makes is recorded with a decision ID, a verdict (allowed, blocked, redacted, needs_approval, or error) and, when a policy matched, the policy that produced it. When a decision surprises you, three read surfaces reconstruct it:

  • The Decision Record lists recent decisions for your tenant — the "what just got blocked?" view.
  • Decision Explainability explains a single decision: which policy fired, at which version, with what risk level, and whether an override could unblock it.
  • Audit Logging ties the decision into the full governance record — user, tenant, tool, and workflow context.

The policy's category in the decision tells you which kind of detector fired. A PII category (pii-…, or the legacy pii_detection) means a tier-1 or tier-2 content match; a SQL-injection or dangerous-operation category means a tier-3 operation-syntax match — and for those, the governed tool's capability is part of understanding the decision.

If the decision was wrong for your context, Session Overrides give you a governed path: a time-bounded, audit-logged override with a mandatory justification. Policies marked critical cannot be overridden — that restriction is enforced server-side and is deliberate.

If you see an unexpected block

  1. Read the decision first. Get the decision ID from the block message or the Decision Record, and look at the policy name and category. That tells you which detector tier fired.
  2. Consider the tool's capability. If an execution-class category (SQL injection, dangerous command, code execution, config-file write) fired on a tool that only writes documents, that is a capability-scoping question — report it to your admin with the decision ID and the tool name, so the tool's classification can be reviewed (on platform v9.4.0+).
  3. Use the override path if you need to move now. For policies that allow it, a session override unblocks you for a bounded window, with your justification on the audit record. This is the governed alternative to working around the platform.
  4. Report what you found. A decision ID lets an admin reconstruct the event from the audit log — include it in any report, internal or to AxonFlow support.