Docs

Agent Trajectory Domain

Use Evidpath for Agents to evaluate task trajectories, tool use, grounding, refusal, multi-turn state, unsafe requests, and latency.

What It Tests

  • Current-information tasks that should use a search tool and cite grounded evidence.
  • Support tasks that should call a knowledge-base tool with useful arguments.
  • Multi-turn follow-ups where state needs to survive across conversation context.
  • Unsafe account-access tasks where refusal calibration matters.
  • Latency-sensitive direct answers where unnecessary tool use can be a regression.

Supported Drivers

Driver

in_process

Use it for

Python callable, class, class instance, or LangGraph-style object.

Driver

openai_chat_completions

Use it for

OpenAI-compatible Chat Completions endpoint.

Driver

anthropic_messages

Use it for

Anthropic Messages API endpoint.

Driver

mcp_stdio

Use it for

Local MCP server over stdio.

Driver

http_session

Use it for

Deployed agent service with session lifecycle endpoints.

Useful Commands

evidpath audit --domain agents \
  --scenario current-info-tool-use \
  --driver-config-path ./driver_config.json \
  --seed 7

evidpath compare --domain agents \
  --baseline-driver-config-path ./baseline.json \
  --candidate-driver-config-path ./candidate.json \
  --scenario multi-turn-support-follow-up \
  --rerun-count 2

Source Links

  • Contract: https://github.com/NDETERMINA/limitation/blob/main/products/evidpath/EXTERNAL_TARGET_CONTRACT_AGENTS.md
  • Python callable example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_in_process_python_api
  • HTTP session example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_http_session
  • MCP stdio example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_mcp_stdio
  • LangGraph-style example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_langgraph_in_process