README.md

  1# eval-cli
  2
  3Headless CLI binary for running Zed's agent in evaluation/benchmark
  4environments. Designed to work inside containerized environments like
  5[Harbor](https://harborframework.com/) where the repository is already
  6checked out and API keys are provided via environment variables.
  7
  8Uses the same `NativeAgent` + `AcpThread` pipeline as the production Zed
  9editor — full agentic loop with tool calls, subagents, and retries, just
 10without a GUI.
 11
 12## Building
 13
 14### Native (for local testing on the same OS)
 15
 16```
 17cargo build --release -p eval_cli
 18```
 19
 20### Cross-compile for Linux x86_64 (from macOS or other hosts)
 21
 22Harbor containers run Linux x86_64. Use the Docker-based build script:
 23
 24```
 25crates/eval_cli/script/build-linux
 26```
 27
 28This produces `target/eval-cli` (an x86_64 Linux ELF binary). You can
 29also specify a custom output path:
 30
 31```
 32crates/eval_cli/script/build-linux --output ~/bin/eval-cli-linux
 33```
 34
 35## Standalone usage
 36
 37```
 38eval-cli \
 39  --workdir /testbed \
 40  --model anthropic/claude-sonnet-4-6-latest \
 41  --instruction "Fix the bug described in..." \
 42  --timeout 600 \
 43  --output-dir /logs/agent
 44```
 45
 46Reads API keys from environment variables (`ANTHROPIC_API_KEY`,
 47`OPENAI_API_KEY`, etc.). Writes `result.json`, `thread.md`, and
 48`thread.json` to the output directory.
 49
 50### Exit codes
 51
 52| Code | Meaning                            |
 53| ---- | ---------------------------------- |
 54| 0    | Agent finished                     |
 55| 1    | Error (model/auth/runtime failure) |
 56| 2    | Timeout                            |
 57| 3    | Interrupted (SIGTERM/SIGINT)       |
 58
 59## Harbor integration
 60
 61The `zed_eval/` directory contains a Python package that
 62implements Harbor's `BaseInstalledAgent` interface, allowing eval-cli to
 63be used with `--agent-import-path` without modifying Harbor's source code.
 64
 65### Setup
 66
 67```
 68pip install -e crates/eval_cli/harbor/
 69```
 70
 71### Running with a local binary
 72
 73Build for Linux first, then pass the binary path:
 74
 75```
 76crates/eval_cli/script/build-linux
 77
 78harbor run -d "swebench_verified@latest" \
 79  --agent-import-path zed_eval.agent:ZedAgent \
 80  --ae binary_path=target/eval-cli \
 81  -m anthropic/claude-sonnet-4-6-latest
 82```
 83
 84The agent uploads the binary into the container during setup — no
 85download URL needed during local iteration.
 86
 87### Running with a download URL
 88
 89For CI or when the binary is hosted somewhere:
 90
 91```
 92harbor run -d "swebench_verified@latest" \
 93  --agent-import-path zed_eval.agent:ZedAgent \
 94  --ak download_url=https://example.com/eval-cli \
 95  -m anthropic/claude-sonnet-4-6-latest
 96```
 97
 98### Setting a timeout
 99
100Pass `EVAL_CLI_TIMEOUT` via `--ae`:
101
102```
103harbor run -d "swebench_verified@latest" \
104  --agent-import-path zed_eval.agent:ZedAgent \
105  --ak binary_path=target/eval-cli \
106  --ae EVAL_CLI_TIMEOUT=600 \
107  -m anthropic/claude-sonnet-4-6-latest
108```