visual-test-improvements.md

  1# Visual Test Runner Improvements
  2
  3This document describes improvements to make the visual test runner in `crates/zed/src/visual_test_runner.rs` more deterministic, faster, and less hacky.
  4
  5## Background
  6
  7The visual test runner captures screenshots of Zed's UI and compares them against baseline images. It currently works but has several issues:
  8
  91. **Non-deterministic timing**: Uses `timer().await` calls scattered throughout
 102. **Real filesystem I/O**: Uses `tempfile` and `RealFs` instead of `FakeFs`
 113. **Dead code**: Unused variables and counters from removed print statements
 124. **Code duplication**: Similar setup code repeated across tests
 135. **Limited production code coverage**: Some areas use stubs where real code could run
 14
 15## How to Run the Visual Tests
 16
 17```bash
 18# Run the visual tests (compare against baselines)
 19cargo run -p zed --bin zed_visual_test_runner --features visual-tests
 20
 21# Update baseline images (when UI intentionally changes)
 22UPDATE_BASELINE=1 cargo run -p zed --bin zed_visual_test_runner --features visual-tests
 23```
 24
 25The test runner is a separate binary, not a `#[test]` function. It uses `Application::new().run()` to create a real GPUI application context.
 26
 27---
 28
 29## Improvement 1: Replace Timer-Based Waits with `run_until_parked()`
 30
 31### Problem
 32
 33The code is littered with timing-based waits like:
 34
 35```rust
 36cx.background_executor()
 37    .timer(std::time::Duration::from_millis(500))
 38    .await;
 39```
 40
 41These appear ~15 times in the file. They are:
 42- **Slow**: Adds up to several seconds of unnecessary waiting
 43- **Non-deterministic**: Could flake on slow CI machines
 44- **Arbitrary**: The durations (100ms, 200ms, 300ms, 500ms, 2s) were chosen by trial and error
 45
 46### Solution
 47
 48Use `run_until_parked()` which runs all pending async tasks until there's nothing left to do. This is:
 49- **Instant**: Returns immediately when work is complete
 50- **Deterministic**: Waits exactly as long as needed
 51- **Standard**: Used throughout Zed's test suite
 52
 53### How to Implement
 54
 55The challenge is that the visual test runner uses `AsyncApp` (from `cx.spawn()`), not `TestAppContext`. The `run_until_parked()` method is on `BackgroundExecutor` which is accessible via `cx.background_executor()`.
 56
 57However, `run_until_parked()` is a **blocking** call that runs the executor synchronously, while the visual tests are currently written as async code. You'll need to restructure the code.
 58
 59**Option A: Keep async structure, call run_until_parked between awaits**
 60
 61```rust
 62// Before:
 63cx.background_executor()
 64    .timer(std::time::Duration::from_millis(500))
 65    .await;
 66
 67// After - run all pending work synchronously:
 68cx.background_executor().run_until_parked();
 69```
 70
 71But this won't work directly in async context because `run_until_parked()` blocks.
 72
 73**Option B: Restructure to use synchronous test pattern**
 74
 75The standard Zed test pattern uses `#[gpui::test]` with `TestAppContext`:
 76
 77```rust
 78#[gpui::test]
 79async fn test_something(cx: &mut TestAppContext) {
 80    cx.run_until_parked();  // This works!
 81}
 82```
 83
 84For the visual test runner, you'd need to convert from `Application::new().run()` to the test harness. This is a bigger change but would be more idiomatic.
 85
 86**Option C: Use cx.refresh() + small delay for rendering**
 87
 88For purely rendering-related waits, `cx.refresh()` forces a repaint. A single small delay after refresh may be acceptable for GPU readback timing:
 89
 90```rust
 91cx.refresh().ok();
 92// Minimal delay just for GPU work, not async task completion
 93cx.background_executor()
 94    .timer(std::time::Duration::from_millis(16))  // ~1 frame
 95    .await;
 96```
 97
 98### Locations to Change
 99
100Search for `timer(std::time::Duration` in the file. Each occurrence should be evaluated:
101
102| Line | Current Duration | Purpose | Replacement |
103|------|-----------------|---------|-------------|
104| 160-162 | 500ms | Wait for worktree | `run_until_parked()` or await the task |
105| 190-192 | 500ms | Wait for panel add | `run_until_parked()` |
106| 205-207 | 500ms | Wait for panel render | `cx.refresh()` |
107| 248-250 | 500ms | Wait for item activation | `run_until_parked()` |
108| 258-260 | 2000ms | Wait for UI to stabilize | `cx.refresh()` + minimal delay |
109| 294-296 | 500ms | Wait for panel close | `run_until_parked()` |
110| 752-754 | 100ms | Wait for worktree scan | `run_until_parked()` |
111| 860-862 | 100ms | Wait for workspace init | `run_until_parked()` |
112| 881-883 | 200ms | Wait for panel ready | `run_until_parked()` |
113| 893-895 | 200ms | Wait for thread view | `run_until_parked()` |
114| 912-914 | 500ms | Wait for response | `run_until_parked()` |
115| 937-939 | 300ms | Wait for refresh | `cx.refresh()` |
116| 956-958 | 300ms | Wait for UI update | `run_until_parked()` |
117| 968-970 | 300ms | Wait for refresh | `cx.refresh()` |
118
119### How to Verify
120
121After making changes:
1221. Run the tests: `cargo run -p zed --bin zed_visual_test_runner --features visual-tests`
1232. They should pass with the same baseline images
1243. They should run faster (measure before/after)
125
126---
127
128## Improvement 2: Use `FakeFs` Instead of Real Filesystem
129
130### Problem
131
132The code currently uses:
133```rust
134let fs = Arc::new(RealFs::new(None, cx.background_executor().clone()));
135let temp_dir = tempfile::tempdir().expect("Failed to create temp directory");
136```
137
138This is:
139- **Slow**: Real I/O is slower than in-memory operations
140- **Non-deterministic**: Filesystem timing varies
141- **Messy**: Leaves temp directories on failure
142
143### Solution
144
145Use `FakeFs` which is an in-memory filesystem used throughout Zed's tests:
146
147```rust
148use fs::FakeFs;
149
150let fs = FakeFs::new(cx.background_executor().clone());
151fs.insert_tree(
152    "/project",
153    json!({
154        "src": {
155            "main.rs": "fn main() { println!(\"Hello\"); }"
156        },
157        "Cargo.toml": "[package]\nname = \"test\""
158    }),
159).await;
160```
161
162### How to Implement
163
1641. **Add the dependency** in `crates/zed/Cargo.toml` if not present:
165   ```toml
166   fs = { workspace = true, features = ["test-support"] }
167   ```
168
1692. **Replace RealFs creation** in `init_app_state()` (around line 655):
170
171   ```rust
172   // Before:
173   let fs = Arc::new(RealFs::new(None, cx.background_executor().clone()));
174   
175   // After:
176   let fs = FakeFs::new(cx.background_executor().clone());
177   ```
178
1793. **Replace tempdir + file creation** (around lines 66-71 and 518-643):
180
181   ```rust
182   // Before:
183   let temp_dir = tempfile::tempdir().expect("...");
184   let project_path = temp_dir.path().join("project");
185   std::fs::create_dir_all(&project_path).expect("...");
186   create_test_files(&project_path);
187   
188   // After:
189   let fs = FakeFs::new(cx.background_executor().clone());
190   fs.insert_tree("/project", json!({
191       "src": {
192           "main.rs": include_str!("test_content/main.rs"),
193           "lib.rs": include_str!("test_content/lib.rs"),
194       },
195       "Cargo.toml": include_str!("test_content/Cargo.toml"),
196       "README.md": include_str!("test_content/README.md"),
197   })).await;
198   let project_path = Path::new("/project");
199   ```
200
2014. **For the test image** (around line 726):
202
203   ```rust
204   // Before:
205   let temp_dir = tempfile::tempdir()?;
206   let project_path = temp_dir.path().join("project");
207   std::fs::create_dir_all(&project_path)?;
208   let image_path = project_path.join("test-image.png");
209   std::fs::write(&image_path, EMBEDDED_TEST_IMAGE)?;
210   
211   // After:
212   fs.insert_file("/project/test-image.png", EMBEDDED_TEST_IMAGE.to_vec()).await;
213   let project_path = Path::new("/project");
214   ```
215
2165. **Update `init_app_state`** to accept `fs` as a parameter instead of creating it internally.
217
218### Reference Example
219
220See `crates/project_panel/src/project_panel_tests.rs` lines 17-62 for a complete example of using `FakeFs` with `insert_tree()`.
221
222### How to Verify
223
2241. Run the tests - they should produce identical screenshots
2252. Verify no temp directories are created in `/tmp` or equivalent
2263. Check that tests run faster
227
228---
229
230## Improvement 3: Remove Dead Code
231
232### Problem
233
234After removing print statements, there's leftover dead code:
235
236```rust
237let _ = update_baseline;  // Line 63 - was used in removed print
238
239let mut passed = 0;       // Lines 263-265
240let mut failed = 0;
241let mut updated = 0;
242// ... counters incremented but never read
243
244let _ = (passed, updated);  // Line 327 - silences warning
245```
246
247### Solution
248
249Remove the unused code and restructure to not need counters.
250
251### How to Implement
252
2531. **Remove the `let _ = update_baseline;`** on line 63 - `update_baseline` is already used later
254
2552. **Simplify test result handling**:
256
257   ```rust
258   // Before:
259   let mut passed = 0;
260   let mut failed = 0;
261   let mut updated = 0;
262   
263   match test_result {
264       Ok(TestResult::Passed) => passed += 1,
265       Ok(TestResult::BaselineUpdated(_)) => updated += 1,
266       Err(_) => failed += 1,
267   }
268   // ... repeat for each test ...
269   
270   let _ = (passed, updated);
271   
272   if failed > 0 {
273       std::process::exit(1);
274   }
275   
276   // After:
277   let mut any_failed = false;
278   
279   if run_visual_test("project_panel", ...).await.is_err() {
280       any_failed = true;
281   }
282   
283   if run_visual_test("workspace_with_editor", ...).await.is_err() {
284       any_failed = true;
285   }
286   
287   if run_agent_thread_view_test(...).await.is_err() {
288       any_failed = true;
289   }
290   
291   if any_failed {
292       std::process::exit(1);
293   }
294   ```
295
2963. **Or collect results into a Vec**:
297
298   ```rust
299   let results = vec![
300       run_visual_test("project_panel", ...).await,
301       run_visual_test("workspace_with_editor", ...).await,
302       run_agent_thread_view_test(...).await,
303   ];
304   
305   if results.iter().any(|r| r.is_err()) {
306       std::process::exit(1);
307   }
308   ```
309
310### How to Verify
311
3121. Run `cargo clippy -p zed --features visual-tests` - should have no warnings about unused variables
3132. Run the tests - behavior should be unchanged
314
315---
316
317## Improvement 4: Consolidate Test Setup Code
318
319### Problem
320
321There's significant duplication between:
322- Main workspace test setup (lines 106-260)
323- Agent panel test setup (lines 713-900)
324
325Both create:
326- Projects with `Project::local()`
327- Worktrees with `find_or_create_worktree()`
328- Windows with `cx.open_window()`
329- Wait for things to settle
330
331### Solution
332
333Extract common setup into helper functions.
334
335### How to Implement
336
3371. **Create a `TestWorkspace` struct**:
338
339   ```rust
340   struct TestWorkspace {
341       window: WindowHandle<Workspace>,
342       project: Entity<Project>,
343   }
344   
345   impl TestWorkspace {
346       async fn new(
347           app_state: Arc<AppState>,
348           size: Size<Pixels>,
349           project_path: &Path,
350           cx: &mut AsyncApp,
351       ) -> Result<Self> {
352           let project = cx.update(|cx| {
353               Project::local(
354                   app_state.client.clone(),
355                   app_state.node_runtime.clone(),
356                   app_state.user_store.clone(),
357                   app_state.languages.clone(),
358                   app_state.fs.clone(),
359                   None,
360                   false,
361                   cx,
362               )
363           })?;
364           
365           // Add worktree
366           let add_task = project.update(cx, |project, cx| {
367               project.find_or_create_worktree(project_path, true, cx)
368           })?;
369           add_task.await?;
370           
371           // Create window
372           let bounds = Bounds {
373               origin: point(px(0.0), px(0.0)),
374               size,
375           };
376           
377           let window = cx.update(|cx| {
378               cx.open_window(
379                   WindowOptions {
380                       window_bounds: Some(WindowBounds::Windowed(bounds)),
381                       focus: false,
382                       show: false,
383                       ..Default::default()
384                   },
385                   |window, cx| {
386                       cx.new(|cx| {
387                           Workspace::new(None, project.clone(), app_state.clone(), window, cx)
388                       })
389                   },
390               )
391           })??;
392           
393           cx.background_executor().run_until_parked();
394           
395           Ok(Self { window, project })
396       }
397   }
398   ```
399
4002. **Create a `setup_project_panel` helper**:
401
402   ```rust
403   async fn setup_project_panel(
404       workspace: &TestWorkspace,
405       cx: &mut AsyncApp,
406   ) -> Result<Entity<ProjectPanel>> {
407       let panel_task = workspace.window.update(cx, |_ws, window, cx| {
408           let weak = cx.weak_entity();
409           window.spawn(cx, async move |cx| ProjectPanel::load(weak, cx).await)
410       })?;
411       
412       let panel = panel_task.await?;
413       
414       workspace.window.update(cx, |ws, window, cx| {
415           ws.add_panel(panel.clone(), window, cx);
416           ws.open_panel::<ProjectPanel>(window, cx);
417       })?;
418       
419       cx.background_executor().run_until_parked();
420       
421       Ok(panel)
422   }
423   ```
424
4253. **Use helpers in tests**:
426
427   ```rust
428   // Test 1
429   let workspace = TestWorkspace::new(
430       app_state.clone(),
431       size(px(1280.0), px(800.0)),
432       &project_path,
433       &mut cx,
434   ).await?;
435   
436   setup_project_panel(&workspace, &mut cx).await?;
437   open_file(&workspace, "src/main.rs", &mut cx).await?;
438   
439   run_visual_test("project_panel", workspace.window.into(), &mut cx, update_baseline).await?;
440   ```
441
442### How to Verify
443
4441. Tests should produce identical screenshots
4452. Code should be shorter and more readable
4463. Adding new tests should be easier
447
448---
449
450## Improvement 5: Exercise More Production Code
451
452### Problem
453
454Some tests use minimal stubs where real production code could run deterministically.
455
456### Current State (Good)
457
458The agent thread test already runs the **real** `ReadFileTool`:
459```rust
460let tool = Arc::new(agent::ReadFileTool::new(...));
461tool.clone().run(input, event_stream, cx).await?;
462```
463
464This is great! It exercises real tool execution.
465
466### Opportunities for More Coverage
467
4681. **Syntax highlighting**: Register real language grammars so the editor shows colored code
469
470   ```rust
471   // Currently just:
472   let languages = Arc::new(LanguageRegistry::new(cx.background_executor().clone()));
473   
474   // Could add:
475   languages.register_native_grammars([
476       tree_sitter_rust::LANGUAGE.into(),
477       tree_sitter_markdown::LANGUAGE.into(),
478   ]);
479   ```
480
4812. **File icons**: The project panel could show real file icons by registering file types
482
4833. **Theme loading**: Currently uses `LoadThemes::JustBase`. Could load a full theme:
484   ```rust
485   theme::init(theme::LoadThemes::All, cx);
486   ```
487   (But this might make tests slower - evaluate trade-off)
488
4894. **More tool types**: Test other tools like `ListFilesTool`, `GrepTool` that don't need network
490
491### How to Implement Syntax Highlighting
492
4931. Add language dependencies to `Cargo.toml`:
494   ```toml
495   tree-sitter-rust = { workspace = true }
496   tree-sitter-markdown = { workspace = true }
497   ```
498
4992. In `init_app_state` or test setup:
500   ```rust
501   let languages = Arc::new(LanguageRegistry::new(cx.background_executor().clone()));
502   
503   // Register Rust grammar
504   languages.register_native_grammars([
505       ("rust", tree_sitter_rust::LANGUAGE.into()),
506   ]);
507   
508   // Register the Rust language config
509   languages.register_available_language(
510       LanguageConfig {
511           name: "Rust".into(),
512           grammar: Some("rust".into()),
513           matcher: LanguageMatcher {
514               path_suffixes: vec!["rs".into()],
515               ..Default::default()
516           },
517           ..Default::default()
518       },
519       tree_sitter_rust::LANGUAGE.into(),
520       vec![],  // No LSP adapters needed for visual tests
521   );
522   ```
523
524### How to Verify
525
5261. Update baselines after adding syntax highlighting
5272. Screenshots should show colored code instead of plain text
5283. Tests should still be deterministic (same colors every time)
529
530---
531
532## Improvement 6: Better Test Organization
533
534### Problem
535
536Tests are numbered in comments but the structure is ad-hoc:
537```rust
538// Run Test 1: Project Panel
539// Run Test 2: Workspace with Editor  
540// Run Test 3: Agent Thread View
541```
542
543### Solution
544
545Create a test registry or use a more structured approach.
546
547### How to Implement
548
5491. **Define tests as structs**:
550
551   ```rust
552   trait VisualTest {
553       fn name(&self) -> &'static str;
554       async fn setup(&self, cx: &mut AsyncApp) -> Result<AnyWindowHandle>;
555       async fn run(&self, window: AnyWindowHandle, cx: &mut AsyncApp, update_baseline: bool) -> Result<TestResult>;
556   }
557   
558   struct ProjectPanelTest;
559   struct WorkspaceEditorTest;
560   struct AgentThreadTest;
561   
562   impl VisualTest for ProjectPanelTest {
563       fn name(&self) -> &'static str { "project_panel" }
564       // ...
565   }
566   ```
567
5682. **Run all tests from a registry**:
569
570   ```rust
571   let tests: Vec<Box<dyn VisualTest>> = vec![
572       Box::new(ProjectPanelTest),
573       Box::new(WorkspaceEditorTest),
574       Box::new(AgentThreadTest),
575   ];
576   
577   let mut failed = false;
578   for test in tests {
579       match test.run(...).await {
580           Ok(_) => {},
581           Err(_) => failed = true,
582       }
583   }
584   ```
585
586This makes it easy to:
587- Add new tests (just add to the vec)
588- Run specific tests (filter by name)
589- See all tests in one place
590
591---
592
593## Improvement 7: Constants and Configuration
594
595### Problem
596
597Magic numbers scattered throughout:
598```rust
599size(px(1280.0), px(800.0))  // Window size for workspace tests
600size(px(500.0), px(900.0))   // Window size for agent panel
601Duration::from_millis(500)   // Various delays
602const MATCH_THRESHOLD: f64 = 0.99;  // Already a constant, good!
603```
604
605### Solution
606
607Move all configuration to constants at the top of the file.
608
609### How to Implement
610
611```rust
612// Window sizes
613const WORKSPACE_WINDOW_SIZE: Size<Pixels> = size(px(1280.0), px(800.0));
614const AGENT_PANEL_WINDOW_SIZE: Size<Pixels> = size(px(500.0), px(900.0));
615
616// Timing (if any delays are still needed after Improvement 1)
617const FRAME_DELAY: Duration = Duration::from_millis(16);
618
619// Image comparison
620const MATCH_THRESHOLD: f64 = 0.99;
621const PIXEL_TOLERANCE: i16 = 2;  // Currently hardcoded in pixels_are_similar()
622```
623
624---
625
626## Summary: Recommended Order of Implementation
627
6281. **Remove dead code** (Improvement 3) - Quick win, low risk
6292. **Add constants** (Improvement 7) - Quick win, improves readability
6303. **Use FakeFs** (Improvement 2) - Medium effort, big determinism win
6314. **Replace timers** (Improvement 1) - Medium effort, biggest determinism win
6325. **Consolidate setup** (Improvement 4) - Medium effort, maintainability win
6336. **Exercise more production code** (Improvement 5) - Lower priority, nice to have
6347. **Better test organization** (Improvement 6) - Lower priority, nice to have for many tests
635
636## Testing Your Changes
637
638After each change:
639
640```bash
641# Build to check for compile errors
642cargo build -p zed --features visual-tests
643
644# Run clippy for warnings
645cargo clippy -p zed --features visual-tests
646
647# Run the tests
648cargo run -p zed --bin zed_visual_test_runner --features visual-tests
649
650# If screenshots changed intentionally, update baselines
651UPDATE_BASELINE=1 cargo run -p zed --bin zed_visual_test_runner --features visual-tests
652```
653
654Baseline images are stored in `crates/zed/test_fixtures/visual_tests/`.
655
656## Questions?
657
658If you get stuck:
6591. Look at existing tests in `crates/project_panel/src/project_panel_tests.rs` for examples of `FakeFs` and `run_until_parked()`
6602. Look at `crates/gpui/src/app/test_context.rs` for `VisualTestContext` documentation
6613. The CLAUDE.md file at the repo root has Rust coding guidelines