1# eval-cli
2
3Headless CLI binary for running Zed's agent in evaluation/benchmark
4environments. Designed to work inside containerized environments like
5[Harbor](https://harborframework.com/) where the repository is already
6checked out and API keys are provided via environment variables.
7
8Uses the same `NativeAgent` + `AcpThread` pipeline as the production Zed
9editor — full agentic loop with tool calls, subagents, and retries, just
10without a GUI.
11
12## Building
13
14### Native (for local testing on the same OS)
15
16```
17cargo build --release -p eval_cli
18```
19
20### Cross-compile for Linux x86_64 (from macOS or other hosts)
21
22Harbor containers run Linux x86_64. Use the Docker-based build script:
23
24```
25crates/eval_cli/script/build-linux
26```
27
28This produces `target/eval-cli` (an x86_64 Linux ELF binary). You can
29also specify a custom output path:
30
31```
32crates/eval_cli/script/build-linux --output ~/bin/eval-cli-linux
33```
34
35## Standalone usage
36
37```
38eval-cli \
39 --workdir /testbed \
40 --model anthropic/claude-sonnet-4-6-latest \
41 --instruction "Fix the bug described in..." \
42 --timeout 600 \
43 --output-dir /logs/agent
44```
45
46Reads API keys from environment variables (`ANTHROPIC_API_KEY`,
47`OPENAI_API_KEY`, etc.). Writes `result.json`, `thread.md`, and
48`thread.json` to the output directory.
49
50### Exit codes
51
52| Code | Meaning |
53| ---- | ---------------------------------- |
54| 0 | Agent finished |
55| 1 | Error (model/auth/runtime failure) |
56| 2 | Timeout |
57| 3 | Interrupted (SIGTERM/SIGINT) |
58
59## Harbor integration
60
61The `zed_eval/` directory contains a Python package that
62implements Harbor's `BaseInstalledAgent` interface, allowing eval-cli to
63be used with `--agent-import-path` without modifying Harbor's source code.
64
65### Setup
66
67```
68pip install -e crates/eval_cli/harbor/
69```
70
71### Running with a local binary
72
73Build for Linux first, then pass the binary path:
74
75```
76crates/eval_cli/script/build-linux
77
78harbor run -d "swebench_verified@latest" \
79 --agent-import-path zed_eval.agent:ZedAgent \
80 --ae binary_path=target/eval-cli \
81 -m anthropic/claude-sonnet-4-6-latest
82```
83
84The agent uploads the binary into the container during setup — no
85download URL needed during local iteration.
86
87### Running with a download URL
88
89For CI or when the binary is hosted somewhere:
90
91```
92harbor run -d "swebench_verified@latest" \
93 --agent-import-path zed_eval.agent:ZedAgent \
94 --ak download_url=https://example.com/eval-cli \
95 -m anthropic/claude-sonnet-4-6-latest
96```
97
98### Setting a timeout
99
100Pass `EVAL_CLI_TIMEOUT` via `--ae`:
101
102```
103harbor run -d "swebench_verified@latest" \
104 --agent-import-path zed_eval.agent:ZedAgent \
105 --ak binary_path=target/eval-cli \
106 --ae EVAL_CLI_TIMEOUT=600 \
107 -m anthropic/claude-sonnet-4-6-latest
108```