1# Visual Test Runner Improvements
2
3This document describes improvements to make the visual test runner in `crates/zed/src/visual_test_runner.rs` more deterministic, faster, and less hacky.
4
5## Background
6
7The visual test runner captures screenshots of Zed's UI and compares them against baseline images. It currently works but has several issues:
8
91. **Non-deterministic timing**: Uses `timer().await` calls scattered throughout
102. **Real filesystem I/O**: Uses `tempfile` and `RealFs` instead of `FakeFs`
113. **Dead code**: Unused variables and counters from removed print statements
124. **Code duplication**: Similar setup code repeated across tests
135. **Limited production code coverage**: Some areas use stubs where real code could run
14
15## How to Run the Visual Tests
16
17```bash
18# Run the visual tests (compare against baselines)
19cargo run -p zed --bin zed_visual_test_runner --features visual-tests
20
21# Update baseline images (when UI intentionally changes)
22UPDATE_BASELINE=1 cargo run -p zed --bin zed_visual_test_runner --features visual-tests
23```
24
25The test runner is a separate binary, not a `#[test]` function. It uses `Application::new().run()` to create a real GPUI application context.
26
27---
28
29## Improvement 1: Replace Timer-Based Waits with `run_until_parked()`
30
31### Problem
32
33The code is littered with timing-based waits like:
34
35```rust
36cx.background_executor()
37 .timer(std::time::Duration::from_millis(500))
38 .await;
39```
40
41These appear ~15 times in the file. They are:
42- **Slow**: Adds up to several seconds of unnecessary waiting
43- **Non-deterministic**: Could flake on slow CI machines
44- **Arbitrary**: The durations (100ms, 200ms, 300ms, 500ms, 2s) were chosen by trial and error
45
46### Solution
47
48Use `run_until_parked()` which runs all pending async tasks until there's nothing left to do. This is:
49- **Instant**: Returns immediately when work is complete
50- **Deterministic**: Waits exactly as long as needed
51- **Standard**: Used throughout Zed's test suite
52
53### How to Implement
54
55The challenge is that the visual test runner uses `AsyncApp` (from `cx.spawn()`), not `TestAppContext`. The `run_until_parked()` method is on `BackgroundExecutor` which is accessible via `cx.background_executor()`.
56
57However, `run_until_parked()` is a **blocking** call that runs the executor synchronously, while the visual tests are currently written as async code. You'll need to restructure the code.
58
59**Option A: Keep async structure, call run_until_parked between awaits**
60
61```rust
62// Before:
63cx.background_executor()
64 .timer(std::time::Duration::from_millis(500))
65 .await;
66
67// After - run all pending work synchronously:
68cx.background_executor().run_until_parked();
69```
70
71But this won't work directly in async context because `run_until_parked()` blocks.
72
73**Option B: Restructure to use synchronous test pattern**
74
75The standard Zed test pattern uses `#[gpui::test]` with `TestAppContext`:
76
77```rust
78#[gpui::test]
79async fn test_something(cx: &mut TestAppContext) {
80 cx.run_until_parked(); // This works!
81}
82```
83
84For the visual test runner, you'd need to convert from `Application::new().run()` to the test harness. This is a bigger change but would be more idiomatic.
85
86**Option C: Use cx.refresh() + small delay for rendering**
87
88For purely rendering-related waits, `cx.refresh()` forces a repaint. A single small delay after refresh may be acceptable for GPU readback timing:
89
90```rust
91cx.refresh().ok();
92// Minimal delay just for GPU work, not async task completion
93cx.background_executor()
94 .timer(std::time::Duration::from_millis(16)) // ~1 frame
95 .await;
96```
97
98### Locations to Change
99
100Search for `timer(std::time::Duration` in the file. Each occurrence should be evaluated:
101
102| Line | Current Duration | Purpose | Replacement |
103|------|-----------------|---------|-------------|
104| 160-162 | 500ms | Wait for worktree | `run_until_parked()` or await the task |
105| 190-192 | 500ms | Wait for panel add | `run_until_parked()` |
106| 205-207 | 500ms | Wait for panel render | `cx.refresh()` |
107| 248-250 | 500ms | Wait for item activation | `run_until_parked()` |
108| 258-260 | 2000ms | Wait for UI to stabilize | `cx.refresh()` + minimal delay |
109| 294-296 | 500ms | Wait for panel close | `run_until_parked()` |
110| 752-754 | 100ms | Wait for worktree scan | `run_until_parked()` |
111| 860-862 | 100ms | Wait for workspace init | `run_until_parked()` |
112| 881-883 | 200ms | Wait for panel ready | `run_until_parked()` |
113| 893-895 | 200ms | Wait for thread view | `run_until_parked()` |
114| 912-914 | 500ms | Wait for response | `run_until_parked()` |
115| 937-939 | 300ms | Wait for refresh | `cx.refresh()` |
116| 956-958 | 300ms | Wait for UI update | `run_until_parked()` |
117| 968-970 | 300ms | Wait for refresh | `cx.refresh()` |
118
119### How to Verify
120
121After making changes:
1221. Run the tests: `cargo run -p zed --bin zed_visual_test_runner --features visual-tests`
1232. They should pass with the same baseline images
1243. They should run faster (measure before/after)
125
126---
127
128## Improvement 2: Use `FakeFs` Instead of Real Filesystem
129
130### Problem
131
132The code currently uses:
133```rust
134let fs = Arc::new(RealFs::new(None, cx.background_executor().clone()));
135let temp_dir = tempfile::tempdir().expect("Failed to create temp directory");
136```
137
138This is:
139- **Slow**: Real I/O is slower than in-memory operations
140- **Non-deterministic**: Filesystem timing varies
141- **Messy**: Leaves temp directories on failure
142
143### Solution
144
145Use `FakeFs` which is an in-memory filesystem used throughout Zed's tests:
146
147```rust
148use fs::FakeFs;
149
150let fs = FakeFs::new(cx.background_executor().clone());
151fs.insert_tree(
152 "/project",
153 json!({
154 "src": {
155 "main.rs": "fn main() { println!(\"Hello\"); }"
156 },
157 "Cargo.toml": "[package]\nname = \"test\""
158 }),
159).await;
160```
161
162### How to Implement
163
1641. **Add the dependency** in `crates/zed/Cargo.toml` if not present:
165 ```toml
166 fs = { workspace = true, features = ["test-support"] }
167 ```
168
1692. **Replace RealFs creation** in `init_app_state()` (around line 655):
170
171 ```rust
172 // Before:
173 let fs = Arc::new(RealFs::new(None, cx.background_executor().clone()));
174
175 // After:
176 let fs = FakeFs::new(cx.background_executor().clone());
177 ```
178
1793. **Replace tempdir + file creation** (around lines 66-71 and 518-643):
180
181 ```rust
182 // Before:
183 let temp_dir = tempfile::tempdir().expect("...");
184 let project_path = temp_dir.path().join("project");
185 std::fs::create_dir_all(&project_path).expect("...");
186 create_test_files(&project_path);
187
188 // After:
189 let fs = FakeFs::new(cx.background_executor().clone());
190 fs.insert_tree("/project", json!({
191 "src": {
192 "main.rs": include_str!("test_content/main.rs"),
193 "lib.rs": include_str!("test_content/lib.rs"),
194 },
195 "Cargo.toml": include_str!("test_content/Cargo.toml"),
196 "README.md": include_str!("test_content/README.md"),
197 })).await;
198 let project_path = Path::new("/project");
199 ```
200
2014. **For the test image** (around line 726):
202
203 ```rust
204 // Before:
205 let temp_dir = tempfile::tempdir()?;
206 let project_path = temp_dir.path().join("project");
207 std::fs::create_dir_all(&project_path)?;
208 let image_path = project_path.join("test-image.png");
209 std::fs::write(&image_path, EMBEDDED_TEST_IMAGE)?;
210
211 // After:
212 fs.insert_file("/project/test-image.png", EMBEDDED_TEST_IMAGE.to_vec()).await;
213 let project_path = Path::new("/project");
214 ```
215
2165. **Update `init_app_state`** to accept `fs` as a parameter instead of creating it internally.
217
218### Reference Example
219
220See `crates/project_panel/src/project_panel_tests.rs` lines 17-62 for a complete example of using `FakeFs` with `insert_tree()`.
221
222### How to Verify
223
2241. Run the tests - they should produce identical screenshots
2252. Verify no temp directories are created in `/tmp` or equivalent
2263. Check that tests run faster
227
228---
229
230## Improvement 3: Remove Dead Code
231
232### Problem
233
234After removing print statements, there's leftover dead code:
235
236```rust
237let _ = update_baseline; // Line 63 - was used in removed print
238
239let mut passed = 0; // Lines 263-265
240let mut failed = 0;
241let mut updated = 0;
242// ... counters incremented but never read
243
244let _ = (passed, updated); // Line 327 - silences warning
245```
246
247### Solution
248
249Remove the unused code and restructure to not need counters.
250
251### How to Implement
252
2531. **Remove the `let _ = update_baseline;`** on line 63 - `update_baseline` is already used later
254
2552. **Simplify test result handling**:
256
257 ```rust
258 // Before:
259 let mut passed = 0;
260 let mut failed = 0;
261 let mut updated = 0;
262
263 match test_result {
264 Ok(TestResult::Passed) => passed += 1,
265 Ok(TestResult::BaselineUpdated(_)) => updated += 1,
266 Err(_) => failed += 1,
267 }
268 // ... repeat for each test ...
269
270 let _ = (passed, updated);
271
272 if failed > 0 {
273 std::process::exit(1);
274 }
275
276 // After:
277 let mut any_failed = false;
278
279 if run_visual_test("project_panel", ...).await.is_err() {
280 any_failed = true;
281 }
282
283 if run_visual_test("workspace_with_editor", ...).await.is_err() {
284 any_failed = true;
285 }
286
287 if run_agent_thread_view_test(...).await.is_err() {
288 any_failed = true;
289 }
290
291 if any_failed {
292 std::process::exit(1);
293 }
294 ```
295
2963. **Or collect results into a Vec**:
297
298 ```rust
299 let results = vec![
300 run_visual_test("project_panel", ...).await,
301 run_visual_test("workspace_with_editor", ...).await,
302 run_agent_thread_view_test(...).await,
303 ];
304
305 if results.iter().any(|r| r.is_err()) {
306 std::process::exit(1);
307 }
308 ```
309
310### How to Verify
311
3121. Run `cargo clippy -p zed --features visual-tests` - should have no warnings about unused variables
3132. Run the tests - behavior should be unchanged
314
315---
316
317## Improvement 4: Consolidate Test Setup Code
318
319### Problem
320
321There's significant duplication between:
322- Main workspace test setup (lines 106-260)
323- Agent panel test setup (lines 713-900)
324
325Both create:
326- Projects with `Project::local()`
327- Worktrees with `find_or_create_worktree()`
328- Windows with `cx.open_window()`
329- Wait for things to settle
330
331### Solution
332
333Extract common setup into helper functions.
334
335### How to Implement
336
3371. **Create a `TestWorkspace` struct**:
338
339 ```rust
340 struct TestWorkspace {
341 window: WindowHandle<Workspace>,
342 project: Entity<Project>,
343 }
344
345 impl TestWorkspace {
346 async fn new(
347 app_state: Arc<AppState>,
348 size: Size<Pixels>,
349 project_path: &Path,
350 cx: &mut AsyncApp,
351 ) -> Result<Self> {
352 let project = cx.update(|cx| {
353 Project::local(
354 app_state.client.clone(),
355 app_state.node_runtime.clone(),
356 app_state.user_store.clone(),
357 app_state.languages.clone(),
358 app_state.fs.clone(),
359 None,
360 false,
361 cx,
362 )
363 })?;
364
365 // Add worktree
366 let add_task = project.update(cx, |project, cx| {
367 project.find_or_create_worktree(project_path, true, cx)
368 })?;
369 add_task.await?;
370
371 // Create window
372 let bounds = Bounds {
373 origin: point(px(0.0), px(0.0)),
374 size,
375 };
376
377 let window = cx.update(|cx| {
378 cx.open_window(
379 WindowOptions {
380 window_bounds: Some(WindowBounds::Windowed(bounds)),
381 focus: false,
382 show: false,
383 ..Default::default()
384 },
385 |window, cx| {
386 cx.new(|cx| {
387 Workspace::new(None, project.clone(), app_state.clone(), window, cx)
388 })
389 },
390 )
391 })??;
392
393 cx.background_executor().run_until_parked();
394
395 Ok(Self { window, project })
396 }
397 }
398 ```
399
4002. **Create a `setup_project_panel` helper**:
401
402 ```rust
403 async fn setup_project_panel(
404 workspace: &TestWorkspace,
405 cx: &mut AsyncApp,
406 ) -> Result<Entity<ProjectPanel>> {
407 let panel_task = workspace.window.update(cx, |_ws, window, cx| {
408 let weak = cx.weak_entity();
409 window.spawn(cx, async move |cx| ProjectPanel::load(weak, cx).await)
410 })?;
411
412 let panel = panel_task.await?;
413
414 workspace.window.update(cx, |ws, window, cx| {
415 ws.add_panel(panel.clone(), window, cx);
416 ws.open_panel::<ProjectPanel>(window, cx);
417 })?;
418
419 cx.background_executor().run_until_parked();
420
421 Ok(panel)
422 }
423 ```
424
4253. **Use helpers in tests**:
426
427 ```rust
428 // Test 1
429 let workspace = TestWorkspace::new(
430 app_state.clone(),
431 size(px(1280.0), px(800.0)),
432 &project_path,
433 &mut cx,
434 ).await?;
435
436 setup_project_panel(&workspace, &mut cx).await?;
437 open_file(&workspace, "src/main.rs", &mut cx).await?;
438
439 run_visual_test("project_panel", workspace.window.into(), &mut cx, update_baseline).await?;
440 ```
441
442### How to Verify
443
4441. Tests should produce identical screenshots
4452. Code should be shorter and more readable
4463. Adding new tests should be easier
447
448---
449
450## Improvement 5: Exercise More Production Code
451
452### Problem
453
454Some tests use minimal stubs where real production code could run deterministically.
455
456### Current State (Good)
457
458The agent thread test already runs the **real** `ReadFileTool`:
459```rust
460let tool = Arc::new(agent::ReadFileTool::new(...));
461tool.clone().run(input, event_stream, cx).await?;
462```
463
464This is great! It exercises real tool execution.
465
466### Opportunities for More Coverage
467
4681. **Syntax highlighting**: Register real language grammars so the editor shows colored code
469
470 ```rust
471 // Currently just:
472 let languages = Arc::new(LanguageRegistry::new(cx.background_executor().clone()));
473
474 // Could add:
475 languages.register_native_grammars([
476 tree_sitter_rust::LANGUAGE.into(),
477 tree_sitter_markdown::LANGUAGE.into(),
478 ]);
479 ```
480
4812. **File icons**: The project panel could show real file icons by registering file types
482
4833. **Theme loading**: Currently uses `LoadThemes::JustBase`. Could load a full theme:
484 ```rust
485 theme::init(theme::LoadThemes::All, cx);
486 ```
487 (But this might make tests slower - evaluate trade-off)
488
4894. **More tool types**: Test other tools like `ListFilesTool`, `GrepTool` that don't need network
490
491### How to Implement Syntax Highlighting
492
4931. Add language dependencies to `Cargo.toml`:
494 ```toml
495 tree-sitter-rust = { workspace = true }
496 tree-sitter-markdown = { workspace = true }
497 ```
498
4992. In `init_app_state` or test setup:
500 ```rust
501 let languages = Arc::new(LanguageRegistry::new(cx.background_executor().clone()));
502
503 // Register Rust grammar
504 languages.register_native_grammars([
505 ("rust", tree_sitter_rust::LANGUAGE.into()),
506 ]);
507
508 // Register the Rust language config
509 languages.register_available_language(
510 LanguageConfig {
511 name: "Rust".into(),
512 grammar: Some("rust".into()),
513 matcher: LanguageMatcher {
514 path_suffixes: vec!["rs".into()],
515 ..Default::default()
516 },
517 ..Default::default()
518 },
519 tree_sitter_rust::LANGUAGE.into(),
520 vec![], // No LSP adapters needed for visual tests
521 );
522 ```
523
524### How to Verify
525
5261. Update baselines after adding syntax highlighting
5272. Screenshots should show colored code instead of plain text
5283. Tests should still be deterministic (same colors every time)
529
530---
531
532## Improvement 6: Better Test Organization
533
534### Problem
535
536Tests are numbered in comments but the structure is ad-hoc:
537```rust
538// Run Test 1: Project Panel
539// Run Test 2: Workspace with Editor
540// Run Test 3: Agent Thread View
541```
542
543### Solution
544
545Create a test registry or use a more structured approach.
546
547### How to Implement
548
5491. **Define tests as structs**:
550
551 ```rust
552 trait VisualTest {
553 fn name(&self) -> &'static str;
554 async fn setup(&self, cx: &mut AsyncApp) -> Result<AnyWindowHandle>;
555 async fn run(&self, window: AnyWindowHandle, cx: &mut AsyncApp, update_baseline: bool) -> Result<TestResult>;
556 }
557
558 struct ProjectPanelTest;
559 struct WorkspaceEditorTest;
560 struct AgentThreadTest;
561
562 impl VisualTest for ProjectPanelTest {
563 fn name(&self) -> &'static str { "project_panel" }
564 // ...
565 }
566 ```
567
5682. **Run all tests from a registry**:
569
570 ```rust
571 let tests: Vec<Box<dyn VisualTest>> = vec![
572 Box::new(ProjectPanelTest),
573 Box::new(WorkspaceEditorTest),
574 Box::new(AgentThreadTest),
575 ];
576
577 let mut failed = false;
578 for test in tests {
579 match test.run(...).await {
580 Ok(_) => {},
581 Err(_) => failed = true,
582 }
583 }
584 ```
585
586This makes it easy to:
587- Add new tests (just add to the vec)
588- Run specific tests (filter by name)
589- See all tests in one place
590
591---
592
593## Improvement 7: Constants and Configuration
594
595### Problem
596
597Magic numbers scattered throughout:
598```rust
599size(px(1280.0), px(800.0)) // Window size for workspace tests
600size(px(500.0), px(900.0)) // Window size for agent panel
601Duration::from_millis(500) // Various delays
602const MATCH_THRESHOLD: f64 = 0.99; // Already a constant, good!
603```
604
605### Solution
606
607Move all configuration to constants at the top of the file.
608
609### How to Implement
610
611```rust
612// Window sizes
613const WORKSPACE_WINDOW_SIZE: Size<Pixels> = size(px(1280.0), px(800.0));
614const AGENT_PANEL_WINDOW_SIZE: Size<Pixels> = size(px(500.0), px(900.0));
615
616// Timing (if any delays are still needed after Improvement 1)
617const FRAME_DELAY: Duration = Duration::from_millis(16);
618
619// Image comparison
620const MATCH_THRESHOLD: f64 = 0.99;
621const PIXEL_TOLERANCE: i16 = 2; // Currently hardcoded in pixels_are_similar()
622```
623
624---
625
626## Summary: Recommended Order of Implementation
627
6281. **Remove dead code** (Improvement 3) - Quick win, low risk
6292. **Add constants** (Improvement 7) - Quick win, improves readability
6303. **Use FakeFs** (Improvement 2) - Medium effort, big determinism win
6314. **Replace timers** (Improvement 1) - Medium effort, biggest determinism win
6325. **Consolidate setup** (Improvement 4) - Medium effort, maintainability win
6336. **Exercise more production code** (Improvement 5) - Lower priority, nice to have
6347. **Better test organization** (Improvement 6) - Lower priority, nice to have for many tests
635
636## Testing Your Changes
637
638After each change:
639
640```bash
641# Build to check for compile errors
642cargo build -p zed --features visual-tests
643
644# Run clippy for warnings
645cargo clippy -p zed --features visual-tests
646
647# Run the tests
648cargo run -p zed --bin zed_visual_test_runner --features visual-tests
649
650# If screenshots changed intentionally, update baselines
651UPDATE_BASELINE=1 cargo run -p zed --bin zed_visual_test_runner --features visual-tests
652```
653
654Baseline images are stored in `crates/zed/test_fixtures/visual_tests/`.
655
656## Questions?
657
658If you get stuck:
6591. Look at existing tests in `crates/project_panel/src/project_panel_tests.rs` for examples of `FakeFs` and `run_until_parked()`
6602. Look at `crates/gpui/src/app/test_context.rs` for `VisualTestContext` documentation
6613. The CLAUDE.md file at the repo root has Rust coding guidelines