Docs

Swarm Testing Model

The core Evidpath model: release questions become seeded behavior swarms, target interactions, judged traces, and launch evidence.

Framework Flow

A run moves from release question to domain selection, target integration, seeded behavior coverage, trace judging, and evidence. The flow stays stable while each domain changes the behavior language.

1

Question

2

Domain

3

Swarm

4

Trace

5

Judge

6

Evidence

Core Terms

Term

Release question

Meaning

The behavior risk the team wants evidence for before launch.

Term

Swarm

Meaning

A repeatable set of users, queries, tasks, journeys, or scenarios used to exercise the target.

Term

Target

Meaning

The AI system under test: a service, callable, agent graph, or protocol endpoint.

Term

Trace

Meaning

The recorded interaction between a seeded actor/task and the target.

Term

Judge

Meaning

The domain-owned scorer that interprets completed traces.

Term

Evidence

Meaning

Human-readable and machine-readable artifacts used for release review.

What Makes It Different

  • The run is replayable enough to compare releases, not a one-off prompt review.
  • The judge is domain-shaped, so the evidence uses the right failure language.
  • The artifacts preserve inputs, outputs, traces, manifests, and compare decisions.
  • Generation is an optional coverage layer, not the source of truth for scoring.