Docs

Evidpath Docs

Evidpath is a swarm testing and evidence system for AI products. The platform runs repeatable domain swarms, judges traces, and writes launch evidence for recommender, search, and agent systems.

What Evidpath Does

Evidpath turns a release question into a repeatable run: choose a domain product, run seeded behavior coverage against a target, judge the traces, and review the evidence bundle before launch.

  • Use one swarm engine for seeded runs, trace capture, judging, reports, manifests, and compare workflows.
  • Use domain products for the behavior language that matters to the system under test.
  • Evaluate recommender systems, search rankers, and agent trajectories through public package workflows.
  • Keep generated swarm coverage scoped honestly: it is currently strongest for recommenders and expanding across domains.

Platform And Products

Layer

Evidpath platform

What it owns

Planning, deterministic seeds, execution, trace ledger, reports, manifests, and compare artifacts.

Layer

Domain product

What it owns

Target contract, scenario grammar, simulated actors or tasks, judge, metrics, and report language.

Layer

Integration path

What it owns

Native HTTP, schema-mapped HTTP, Python callable, or agent protocol driver depending on the domain.

Layer

Evidence packet

What it owns

report.md, results.json, traces.jsonl, run_manifest.json, and compare outputs.

Public Domains

Domain

recommender

Use it for

Recommendation slates, novelty, repetition, cold start, trust collapse, and abandonment.

Current notes

Audit, compare, generated scenarios/populations, run-swarm, native/schema-mapped HTTP, Python, and optional adapters.

Domain

search

Use it for

Ranked results, relevance, freshness, ambiguity, typo recovery, zero-result behavior, and personalization.

Current notes

Audit and compare with native HTTP, schema-mapped HTTP, Python, and reference runs.

Domain

agents

Use it for

Tool use, grounding, refusal, multi-turn state, unsafe requests, and latency.

Current notes

Audit and compare with Python/LangGraph, OpenAI-compatible, Anthropic, MCP stdio, HTTP session, and reference runs.