What Evidpath Does
Evidpath turns a release question into a repeatable run: choose a domain product, run seeded behavior coverage against a target, judge the traces, and review the evidence bundle before launch.
- Use one swarm engine for seeded runs, trace capture, judging, reports, manifests, and compare workflows.
- Use domain products for the behavior language that matters to the system under test.
- Evaluate recommender systems, search rankers, and agent trajectories through public package workflows.
- Keep generated swarm coverage scoped honestly: it is currently strongest for recommenders and expanding across domains.
Platform And Products
Layer
Evidpath platform
What it owns
Planning, deterministic seeds, execution, trace ledger, reports, manifests, and compare artifacts.
Layer
Domain product
What it owns
Target contract, scenario grammar, simulated actors or tasks, judge, metrics, and report language.
Layer
Integration path
What it owns
Native HTTP, schema-mapped HTTP, Python callable, or agent protocol driver depending on the domain.
Layer
Evidence packet
What it owns
report.md, results.json, traces.jsonl, run_manifest.json, and compare outputs.
| Layer | What it owns |
|---|---|
| Evidpath platform | Planning, deterministic seeds, execution, trace ledger, reports, manifests, and compare artifacts. |
| Domain product | Target contract, scenario grammar, simulated actors or tasks, judge, metrics, and report language. |
| Integration path | Native HTTP, schema-mapped HTTP, Python callable, or agent protocol driver depending on the domain. |
| Evidence packet | report.md, results.json, traces.jsonl, run_manifest.json, and compare outputs. |
Public Domains
Domain
recommender
Use it for
Recommendation slates, novelty, repetition, cold start, trust collapse, and abandonment.
Current notes
Audit, compare, generated scenarios/populations, run-swarm, native/schema-mapped HTTP, Python, and optional adapters.
Domain
search
Use it for
Ranked results, relevance, freshness, ambiguity, typo recovery, zero-result behavior, and personalization.
Current notes
Audit and compare with native HTTP, schema-mapped HTTP, Python, and reference runs.
Domain
agents
Use it for
Tool use, grounding, refusal, multi-turn state, unsafe requests, and latency.
Current notes
Audit and compare with Python/LangGraph, OpenAI-compatible, Anthropic, MCP stdio, HTTP session, and reference runs.
| Domain | Use it for | Current notes |
|---|---|---|
| recommender | Recommendation slates, novelty, repetition, cold start, trust collapse, and abandonment. | Audit, compare, generated scenarios/populations, run-swarm, native/schema-mapped HTTP, Python, and optional adapters. |
| search | Ranked results, relevance, freshness, ambiguity, typo recovery, zero-result behavior, and personalization. | Audit and compare with native HTTP, schema-mapped HTTP, Python, and reference runs. |
| agents | Tool use, grounding, refusal, multi-turn state, unsafe requests, and latency. | Audit and compare with Python/LangGraph, OpenAI-compatible, Anthropic, MCP stdio, HTTP session, and reference runs. |