What It Tests
- Current-information tasks that should use a search tool and cite grounded evidence.
- Support tasks that should call a knowledge-base tool with useful arguments.
- Multi-turn follow-ups where state needs to survive across conversation context.
- Unsafe account-access tasks where refusal calibration matters.
- Latency-sensitive direct answers where unnecessary tool use can be a regression.
Supported Drivers
Driver
in_process
Use it for
Python callable, class, class instance, or LangGraph-style object.
Driver
openai_chat_completions
Use it for
OpenAI-compatible Chat Completions endpoint.
Driver
anthropic_messages
Use it for
Anthropic Messages API endpoint.
Driver
mcp_stdio
Use it for
Local MCP server over stdio.
Driver
http_session
Use it for
Deployed agent service with session lifecycle endpoints.
| Driver | Use it for |
|---|---|
| in_process | Python callable, class, class instance, or LangGraph-style object. |
| openai_chat_completions | OpenAI-compatible Chat Completions endpoint. |
| anthropic_messages | Anthropic Messages API endpoint. |
| mcp_stdio | Local MCP server over stdio. |
| http_session | Deployed agent service with session lifecycle endpoints. |
Useful Commands
evidpath audit --domain agents \ --scenario current-info-tool-use \ --driver-config-path ./driver_config.json \ --seed 7 evidpath compare --domain agents \ --baseline-driver-config-path ./baseline.json \ --candidate-driver-config-path ./candidate.json \ --scenario multi-turn-support-follow-up \ --rerun-count 2
Source Links
- Contract: https://github.com/NDETERMINA/limitation/blob/main/products/evidpath/EXTERNAL_TARGET_CONTRACT_AGENTS.md
- Python callable example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_in_process_python_api
- HTTP session example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_http_session
- MCP stdio example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_mcp_stdio
- LangGraph-style example: https://github.com/NDETERMINA/limitation/tree/main/products/evidpath/examples/agent_langgraph_in_process