Visual Test Runner Improvements
This document describes improvements to make the visual test runner in crates/zed/src/visual_test_runner.rs more deterministic, faster, and less hacky.
Background
The visual test runner captures screenshots of Zed's UI and compares them against baseline images. It currently works but has several issues:
- Non-deterministic timing: Uses
timer().awaitcalls scattered throughout - Real filesystem I/O: Uses
tempfileandRealFsinstead ofFakeFs - Dead code: Unused variables and counters from removed print statements
- Code duplication: Similar setup code repeated across tests
- Limited production code coverage: Some areas use stubs where real code could run
How to Run the Visual Tests
# Run the visual tests (compare against baselines)
cargo run -p zed --bin zed_visual_test_runner --features visual-tests
# Update baseline images (when UI intentionally changes)
UPDATE_BASELINE=1 cargo run -p zed --bin zed_visual_test_runner --features visual-tests
The test runner is a separate binary, not a #[test] function. It uses Application::new().run() to create a real GPUI application context.
Improvement 1: Replace Timer-Based Waits with run_until_parked()
Problem
The code is littered with timing-based waits like:
cx.background_executor()
.timer(std::time::Duration::from_millis(500))
.await;
These appear ~15 times in the file. They are:
- Slow: Adds up to several seconds of unnecessary waiting
- Non-deterministic: Could flake on slow CI machines
- Arbitrary: The durations (100ms, 200ms, 300ms, 500ms, 2s) were chosen by trial and error
Solution
Use run_until_parked() which runs all pending async tasks until there's nothing left to do. This is:
- Instant: Returns immediately when work is complete
- Deterministic: Waits exactly as long as needed
- Standard: Used throughout Zed's test suite
How to Implement
The challenge is that the visual test runner uses AsyncApp (from cx.spawn()), not TestAppContext. The run_until_parked() method is on BackgroundExecutor which is accessible via cx.background_executor().
However, run_until_parked() is a blocking call that runs the executor synchronously, while the visual tests are currently written as async code. You'll need to restructure the code.
Option A: Keep async structure, call run_until_parked between awaits
// Before:
cx.background_executor()
.timer(std::time::Duration::from_millis(500))
.await;
// After - run all pending work synchronously:
cx.background_executor().run_until_parked();
But this won't work directly in async context because run_until_parked() blocks.
Option B: Restructure to use synchronous test pattern
The standard Zed test pattern uses #[gpui::test] with TestAppContext:
#[gpui::test]
async fn test_something(cx: &mut TestAppContext) {
cx.run_until_parked(); // This works!
}
For the visual test runner, you'd need to convert from Application::new().run() to the test harness. This is a bigger change but would be more idiomatic.
Option C: Use cx.refresh() + small delay for rendering
For purely rendering-related waits, cx.refresh() forces a repaint. A single small delay after refresh may be acceptable for GPU readback timing:
cx.refresh().ok();
// Minimal delay just for GPU work, not async task completion
cx.background_executor()
.timer(std::time::Duration::from_millis(16)) // ~1 frame
.await;
Locations to Change
Search for timer(std::time::Duration in the file. Each occurrence should be evaluated:
| Line | Current Duration | Purpose | Replacement |
|---|---|---|---|
| 160-162 | 500ms | Wait for worktree | run_until_parked() or await the task |
| 190-192 | 500ms | Wait for panel add | run_until_parked() |
| 205-207 | 500ms | Wait for panel render | cx.refresh() |
| 248-250 | 500ms | Wait for item activation | run_until_parked() |
| 258-260 | 2000ms | Wait for UI to stabilize | cx.refresh() + minimal delay |
| 294-296 | 500ms | Wait for panel close | run_until_parked() |
| 752-754 | 100ms | Wait for worktree scan | run_until_parked() |
| 860-862 | 100ms | Wait for workspace init | run_until_parked() |
| 881-883 | 200ms | Wait for panel ready | run_until_parked() |
| 893-895 | 200ms | Wait for thread view | run_until_parked() |
| 912-914 | 500ms | Wait for response | run_until_parked() |
| 937-939 | 300ms | Wait for refresh | cx.refresh() |
| 956-958 | 300ms | Wait for UI update | run_until_parked() |
| 968-970 | 300ms | Wait for refresh | cx.refresh() |
How to Verify
After making changes:
- Run the tests:
cargo run -p zed --bin zed_visual_test_runner --features visual-tests - They should pass with the same baseline images
- They should run faster (measure before/after)
Improvement 2: Use FakeFs Instead of Real Filesystem
Problem
The code currently uses:
let fs = Arc::new(RealFs::new(None, cx.background_executor().clone()));
let temp_dir = tempfile::tempdir().expect("Failed to create temp directory");
This is:
- Slow: Real I/O is slower than in-memory operations
- Non-deterministic: Filesystem timing varies
- Messy: Leaves temp directories on failure
Solution
Use FakeFs which is an in-memory filesystem used throughout Zed's tests:
use fs::FakeFs;
let fs = FakeFs::new(cx.background_executor().clone());
fs.insert_tree(
"/project",
json!({
"src": {
"main.rs": "fn main() { println!(\"Hello\"); }"
},
"Cargo.toml": "[package]\nname = \"test\""
}),
).await;
How to Implement
-
Add the dependency in
crates/zed/Cargo.tomlif not present:fs = { workspace = true, features = ["test-support"] } -
Replace RealFs creation in
init_app_state()(around line 655):// Before: let fs = Arc::new(RealFs::new(None, cx.background_executor().clone())); // After: let fs = FakeFs::new(cx.background_executor().clone()); -
Replace tempdir + file creation (around lines 66-71 and 518-643):
// Before: let temp_dir = tempfile::tempdir().expect("..."); let project_path = temp_dir.path().join("project"); std::fs::create_dir_all(&project_path).expect("..."); create_test_files(&project_path); // After: let fs = FakeFs::new(cx.background_executor().clone()); fs.insert_tree("/project", json!({ "src": { "main.rs": include_str!("test_content/main.rs"), "lib.rs": include_str!("test_content/lib.rs"), }, "Cargo.toml": include_str!("test_content/Cargo.toml"), "README.md": include_str!("test_content/README.md"), })).await; let project_path = Path::new("/project"); -
For the test image (around line 726):
// Before: let temp_dir = tempfile::tempdir()?; let project_path = temp_dir.path().join("project"); std::fs::create_dir_all(&project_path)?; let image_path = project_path.join("test-image.png"); std::fs::write(&image_path, EMBEDDED_TEST_IMAGE)?; // After: fs.insert_file("/project/test-image.png", EMBEDDED_TEST_IMAGE.to_vec()).await; let project_path = Path::new("/project"); -
Update
init_app_stateto acceptfsas a parameter instead of creating it internally.
Reference Example
See crates/project_panel/src/project_panel_tests.rs lines 17-62 for a complete example of using FakeFs with insert_tree().
How to Verify
- Run the tests - they should produce identical screenshots
- Verify no temp directories are created in
/tmpor equivalent - Check that tests run faster
Improvement 3: Remove Dead Code
Problem
After removing print statements, there's leftover dead code:
let _ = update_baseline; // Line 63 - was used in removed print
let mut passed = 0; // Lines 263-265
let mut failed = 0;
let mut updated = 0;
// ... counters incremented but never read
let _ = (passed, updated); // Line 327 - silences warning
Solution
Remove the unused code and restructure to not need counters.
How to Implement
-
Remove the
let _ = update_baseline;on line 63 -update_baselineis already used later -
Simplify test result handling:
// Before: let mut passed = 0; let mut failed = 0; let mut updated = 0; match test_result { Ok(TestResult::Passed) => passed += 1, Ok(TestResult::BaselineUpdated(_)) => updated += 1, Err(_) => failed += 1, } // ... repeat for each test ... let _ = (passed, updated); if failed > 0 { std::process::exit(1); } // After: let mut any_failed = false; if run_visual_test("project_panel", ...).await.is_err() { any_failed = true; } if run_visual_test("workspace_with_editor", ...).await.is_err() { any_failed = true; } if run_agent_thread_view_test(...).await.is_err() { any_failed = true; } if any_failed { std::process::exit(1); } -
Or collect results into a Vec:
let results = vec![ run_visual_test("project_panel", ...).await, run_visual_test("workspace_with_editor", ...).await, run_agent_thread_view_test(...).await, ]; if results.iter().any(|r| r.is_err()) { std::process::exit(1); }
How to Verify
- Run
cargo clippy -p zed --features visual-tests- should have no warnings about unused variables - Run the tests - behavior should be unchanged
Improvement 4: Consolidate Test Setup Code
Problem
There's significant duplication between:
- Main workspace test setup (lines 106-260)
- Agent panel test setup (lines 713-900)
Both create:
- Projects with
Project::local() - Worktrees with
find_or_create_worktree() - Windows with
cx.open_window() - Wait for things to settle
Solution
Extract common setup into helper functions.
How to Implement
-
Create a
TestWorkspacestruct:struct TestWorkspace { window: WindowHandle<Workspace>, project: Entity<Project>, } impl TestWorkspace { async fn new( app_state: Arc<AppState>, size: Size<Pixels>, project_path: &Path, cx: &mut AsyncApp, ) -> Result<Self> { let project = cx.update(|cx| { Project::local( app_state.client.clone(), app_state.node_runtime.clone(), app_state.user_store.clone(), app_state.languages.clone(), app_state.fs.clone(), None, false, cx, ) })?; // Add worktree let add_task = project.update(cx, |project, cx| { project.find_or_create_worktree(project_path, true, cx) })?; add_task.await?; // Create window let bounds = Bounds { origin: point(px(0.0), px(0.0)), size, }; let window = cx.update(|cx| { cx.open_window( WindowOptions { window_bounds: Some(WindowBounds::Windowed(bounds)), focus: false, show: false, ..Default::default() }, |window, cx| { cx.new(|cx| { Workspace::new(None, project.clone(), app_state.clone(), window, cx) }) }, ) })??; cx.background_executor().run_until_parked(); Ok(Self { window, project }) } } -
Create a
setup_project_panelhelper:async fn setup_project_panel( workspace: &TestWorkspace, cx: &mut AsyncApp, ) -> Result<Entity<ProjectPanel>> { let panel_task = workspace.window.update(cx, |_ws, window, cx| { let weak = cx.weak_entity(); window.spawn(cx, async move |cx| ProjectPanel::load(weak, cx).await) })?; let panel = panel_task.await?; workspace.window.update(cx, |ws, window, cx| { ws.add_panel(panel.clone(), window, cx); ws.open_panel::<ProjectPanel>(window, cx); })?; cx.background_executor().run_until_parked(); Ok(panel) } -
Use helpers in tests:
// Test 1 let workspace = TestWorkspace::new( app_state.clone(), size(px(1280.0), px(800.0)), &project_path, &mut cx, ).await?; setup_project_panel(&workspace, &mut cx).await?; open_file(&workspace, "src/main.rs", &mut cx).await?; run_visual_test("project_panel", workspace.window.into(), &mut cx, update_baseline).await?;
How to Verify
- Tests should produce identical screenshots
- Code should be shorter and more readable
- Adding new tests should be easier
Improvement 5: Exercise More Production Code
Problem
Some tests use minimal stubs where real production code could run deterministically.
Current State (Good)
The agent thread test already runs the real ReadFileTool:
let tool = Arc::new(agent::ReadFileTool::new(...));
tool.clone().run(input, event_stream, cx).await?;
This is great! It exercises real tool execution.
Opportunities for More Coverage
-
Syntax highlighting: Register real language grammars so the editor shows colored code
// Currently just: let languages = Arc::new(LanguageRegistry::new(cx.background_executor().clone())); // Could add: languages.register_native_grammars([ tree_sitter_rust::LANGUAGE.into(), tree_sitter_markdown::LANGUAGE.into(), ]); -
File icons: The project panel could show real file icons by registering file types
-
Theme loading: Currently uses
LoadThemes::JustBase. Could load a full theme:theme::init(theme::LoadThemes::All, cx);(But this might make tests slower - evaluate trade-off)
-
More tool types: Test other tools like
ListFilesTool,GrepToolthat don't need network
How to Implement Syntax Highlighting
-
Add language dependencies to
Cargo.toml:tree-sitter-rust = { workspace = true } tree-sitter-markdown = { workspace = true } -
In
init_app_stateor test setup:let languages = Arc::new(LanguageRegistry::new(cx.background_executor().clone())); // Register Rust grammar languages.register_native_grammars([ ("rust", tree_sitter_rust::LANGUAGE.into()), ]); // Register the Rust language config languages.register_available_language( LanguageConfig { name: "Rust".into(), grammar: Some("rust".into()), matcher: LanguageMatcher { path_suffixes: vec!["rs".into()], ..Default::default() }, ..Default::default() }, tree_sitter_rust::LANGUAGE.into(), vec![], // No LSP adapters needed for visual tests );
How to Verify
- Update baselines after adding syntax highlighting
- Screenshots should show colored code instead of plain text
- Tests should still be deterministic (same colors every time)
Improvement 6: Better Test Organization
Problem
Tests are numbered in comments but the structure is ad-hoc:
// Run Test 1: Project Panel
// Run Test 2: Workspace with Editor
// Run Test 3: Agent Thread View
Solution
Create a test registry or use a more structured approach.
How to Implement
-
Define tests as structs:
trait VisualTest { fn name(&self) -> &'static str; async fn setup(&self, cx: &mut AsyncApp) -> Result<AnyWindowHandle>; async fn run(&self, window: AnyWindowHandle, cx: &mut AsyncApp, update_baseline: bool) -> Result<TestResult>; } struct ProjectPanelTest; struct WorkspaceEditorTest; struct AgentThreadTest; impl VisualTest for ProjectPanelTest { fn name(&self) -> &'static str { "project_panel" } // ... } -
Run all tests from a registry:
let tests: Vec<Box<dyn VisualTest>> = vec![ Box::new(ProjectPanelTest), Box::new(WorkspaceEditorTest), Box::new(AgentThreadTest), ]; let mut failed = false; for test in tests { match test.run(...).await { Ok(_) => {}, Err(_) => failed = true, } }
This makes it easy to:
- Add new tests (just add to the vec)
- Run specific tests (filter by name)
- See all tests in one place
Improvement 7: Constants and Configuration
Problem
Magic numbers scattered throughout:
size(px(1280.0), px(800.0)) // Window size for workspace tests
size(px(500.0), px(900.0)) // Window size for agent panel
Duration::from_millis(500) // Various delays
const MATCH_THRESHOLD: f64 = 0.99; // Already a constant, good!
Solution
Move all configuration to constants at the top of the file.
How to Implement
// Window sizes
const WORKSPACE_WINDOW_SIZE: Size<Pixels> = size(px(1280.0), px(800.0));
const AGENT_PANEL_WINDOW_SIZE: Size<Pixels> = size(px(500.0), px(900.0));
// Timing (if any delays are still needed after Improvement 1)
const FRAME_DELAY: Duration = Duration::from_millis(16);
// Image comparison
const MATCH_THRESHOLD: f64 = 0.99;
const PIXEL_TOLERANCE: i16 = 2; // Currently hardcoded in pixels_are_similar()
Summary: Recommended Order of Implementation
- Remove dead code (Improvement 3) - Quick win, low risk
- Add constants (Improvement 7) - Quick win, improves readability
- Use FakeFs (Improvement 2) - Medium effort, big determinism win
- Replace timers (Improvement 1) - Medium effort, biggest determinism win
- Consolidate setup (Improvement 4) - Medium effort, maintainability win
- Exercise more production code (Improvement 5) - Lower priority, nice to have
- Better test organization (Improvement 6) - Lower priority, nice to have for many tests
Testing Your Changes
After each change:
# Build to check for compile errors
cargo build -p zed --features visual-tests
# Run clippy for warnings
cargo clippy -p zed --features visual-tests
# Run the tests
cargo run -p zed --bin zed_visual_test_runner --features visual-tests
# If screenshots changed intentionally, update baselines
UPDATE_BASELINE=1 cargo run -p zed --bin zed_visual_test_runner --features visual-tests
Baseline images are stored in crates/zed/test_fixtures/visual_tests/.
Questions?
If you get stuck:
- Look at existing tests in
crates/project_panel/src/project_panel_tests.rsfor examples ofFakeFsandrun_until_parked() - Look at
crates/gpui/src/app/test_context.rsforVisualTestContextdocumentation - The CLAUDE.md file at the repo root has Rust coding guidelines