Swarm Testing Model

Framework Flow

A run moves from release question to domain selection, target integration, seeded behavior coverage, trace judging, and evidence. The flow stays stable while each domain changes the behavior language.

Question

Domain

Swarm

Trace

Judge

Evidence

Core Terms

Term

Release question

Meaning

The behavior risk the team wants evidence for before launch.

Term

Swarm

Meaning

A repeatable set of users, queries, tasks, journeys, or scenarios used to exercise the target.

Term

Target

Meaning

The AI system under test: a service, callable, agent graph, or protocol endpoint.

Term

Trace

Meaning

The recorded interaction between a seeded actor/task and the target.

Term

Judge

Meaning

The domain-owned scorer that interprets completed traces.

Term

Evidence

Meaning

Human-readable and machine-readable artifacts used for release review.

Term	Meaning
Release question	The behavior risk the team wants evidence for before launch.
Swarm	A repeatable set of users, queries, tasks, journeys, or scenarios used to exercise the target.
Target	The AI system under test: a service, callable, agent graph, or protocol endpoint.
Trace	The recorded interaction between a seeded actor/task and the target.
Judge	The domain-owned scorer that interprets completed traces.
Evidence	Human-readable and machine-readable artifacts used for release review.

What Makes It Different

The run is replayable enough to compare releases, not a one-off prompt review.
The judge is domain-shaped, so the evidence uses the right failure language.
The artifacts preserve inputs, outputs, traces, manifests, and compare decisions.
Generation is an optional coverage layer, not the source of truth for scoring.

Source docs Next: Domain products