From 7d19e899889fc2e4a7999e9ce77a4bd438e3dc3b Mon Sep 17 00:00:00 2001 From: Ben Brandt Date: Thu, 7 May 2026 12:22:52 +0200 Subject: [PATCH] Fix DirectX atlas panic after GPU device recovery (#55878) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Problem A Sentry-reported crash on Windows (Intel Iris Xe Graphics, v1.0.1): ``` index out of bounds: the len is 1 but the index is 1 ``` panicking at `DirectXAtlasState::texture` in [`crates/gpui_windows/src/directx_atlas.rs`](https://github.com/zed-industries/zed/blob/main/crates/gpui_windows/src/directx_atlas.rs): ```rust AtlasTextureKind::Subpixel => { &self.subpixel_textures[id.index as usize].as_ref().unwrap() } ``` ## Root cause After a GPU device-lost recovery, GPUI's view cache replays stale `AtlasTile` references from the previous frame's `paint_operations` via `Scene::replay`. 1. **Atlas grows past one texture.** A long enough session pushes `subpixel_textures.textures.len() ≥ 2` (easy on Iris Xe at the default 1024×1024 atlas size). Top-level views in Zed use `cached(...)`, so their `AnyViewState.paint_range` records into `rendered_frame.scene.paint_operations`, referencing both index `0` and index `1`. 2. **Device lost.** `handle_device_lost` clears every `AtlasTextureList` (`textures.len() == 0`) and `tiles_by_key`, then sets `skip_draws = true`. 3. **`WM_GPUI_FORCE_UPDATE_WINDOW` arrives.** `mark_drawable()` flips `skip_draws` back to `false` and `request_frame` runs with `force_render: true`. 4. **The cache hit.** Inside `Window::draw`, `AnyView::prepaint`'s cache check (`!dirty_views.contains(...) && !window.refreshing`) succeeds for every cached view because the recovery doesn't touch invalidator state and `force_render` doesn't propagate into `Window`. `AnyView::paint` calls `window.reuse_paint` → `Scene::replay` → `primitive.clone()`, which (since `SubpixelSprite`/`AtlasTile` are `Copy`) verbatim copies a `Primitive::SubpixelSprite { tile: { texture_id: { index: 1, ... }, ... } }` into `next_frame.scene`. 5. **Atlas regrows to one.** Dirty/uncached parts of the same frame (caret, animations, anything that called `cx.notify`) fall through to `paint_glyph` → `get_or_insert_with` → `push_texture`, growing `subpixel_textures.textures` from `0` to **`1`** with index `0` valid. 6. **Panic.** After `mem::swap`, `rendered_frame.scene` contains a mix of fresh `index = 0` and replayed `index = 1` sprites. `Scene::batches` emits separate batches per `texture_id`; the `index = 1` batch reaches `atlas.get_texture_view` → `subpixel_textures[1]` → panic with `len = 1, index = 1`. The two earlier related fixes do not catch this: - **#52389 / dbd95ea7** (`if force_render { mark_drawable }`) protects the 200 ms recovery sleep — pending `WM_PAINT`s carry `force_render = false` and so do not clear `skip_draws`. But `WM_GPUI_FORCE_UPDATE_WINDOW` carries `force_render = true`, so `mark_drawable` runs, then `Window::draw`'s `reuse_paint` still reproduces stale tiles. - The unmerged Windows draft `2e5d890e37` (`force_render_after_recovery`) similarly only forces the forced-render branch — it doesn't bypass the view cache. ## Fix Two parts: **1. Bypass the view cache on a forced draw (cross-platform).** In the platform-agnostic `request_frame` closure in `Window::new`, call `window.refresh()` whenever `RequestFrameOptions::force_render` is `true`. `Window::refresh` is the documented escape hatch for cached views (per the `AnyView::cached` docs: *"The one exception is when [Window::refresh] is called, in which case caching is ignored."*). With `refreshing = true` every `AnyView::prepaint` cache check fails, every cached view fully repaints, and `paint_glyph` allocates fresh tiles for every glyph, so `rendered_frame.scene` ends up free of stale `AtlasTile`s. **2. Add the `force_render_after_recovery` flag on Windows.** Mirror the Linux fix from #52389: a per-window `Cell` set after `WindowsWindowInner::handle_device_lost` succeeds and consumed at the top of `draw_window`. Together with the GPUI change above, the first frame after recovery (whether a stray `WM_PAINT` during the 200 ms recovery sleep or the explicit `WM_GPUI_FORCE_UPDATE_WINDOW`) is treated as a forced render that both clears `skip_draws` and bypasses the view cache. ## Testing - `script/clippy -p gpui` is clean. - I do not have a Windows toolchain available locally, so I have not cross-compiled `gpui_windows`. Reviewers with Windows access — please smoke-test on a machine where the device-lost path can be exercised (Intel iGPU, suspend/resume, or running a TDR-inducing test on a GPU driver). ## Related - Sentry issue ID 7457971403 (DirectX subpixel atlas crash, Intel Iris Xe). - Builds on / fixes the residual gap in #52389 (`gpui_linux: Force scene rebuild after GPU device recovery"). The GPUI change here also hardens the corresponding Linux path against the same `reuse_paint` mechanism. Release Notes: - Fixed a crash on Windows when the GPU device is lost and recovered during use (typically driver crash, suspend/resume, or display reconfiguration, most commonly on Intel iGPUs) --- crates/gpui/src/window.rs | 5 +++++ crates/gpui_windows/src/events.rs | 6 ++++++ crates/gpui_windows/src/window.rs | 7 +++++++ 3 files changed, 18 insertions(+) diff --git a/crates/gpui/src/window.rs b/crates/gpui/src/window.rs index dc387c67f39817c51c4b92ba786f5ea6112a25d9..46b1ab64a188ca81fb33c60dd5f9e920be4e7fc1 100644 --- a/crates/gpui/src/window.rs +++ b/crates/gpui/src/window.rs @@ -1402,6 +1402,11 @@ impl Window { measure("frame duration", || { handle .update(&mut cx, |_, window, cx| { + if request_frame_options.force_render { + // Bypass cached view reuse so we don't replay stale + // atlas tile references after a GPU device recovery. + window.refresh(); + } let arena_clear_needed = window.draw(cx); window.present(); arena_clear_needed.clear(); diff --git a/crates/gpui_windows/src/events.rs b/crates/gpui_windows/src/events.rs index a4c47789191f9c1fa8a461f4510ce5c66d681fb5..77c4cde9788f7cb62e513fab5485705a7842a770 100644 --- a/crates/gpui_windows/src/events.rs +++ b/crates/gpui_windows/src/events.rs @@ -1174,6 +1174,11 @@ impl WindowsWindowInner { { panic!("Device lost: {err}"); } + // Make sure the first `draw_window` after recovery (whether it comes + // from the forced WM_GPUI_FORCE_UPDATE_WINDOW or a stray WM_PAINT in + // between) is treated as a forced render so it both clears + // `skip_draws` and bypasses the view cache. + self.state.force_render_after_recovery.set(true); Some(0) } @@ -1198,6 +1203,7 @@ impl WindowsWindowInner { } } + let force_render = force_render || self.state.force_render_after_recovery.take(); if force_render { // Re-enable drawing after a device loss recovery. The forced render // will rebuild the scene with fresh atlas textures. diff --git a/crates/gpui_windows/src/window.rs b/crates/gpui_windows/src/window.rs index 130d3dd7214b2cfb939fef01e2698a61519ab3b6..178d750024fdac51e92961522da52162a6947a70 100644 --- a/crates/gpui_windows/src/window.rs +++ b/crates/gpui_windows/src/window.rs @@ -63,6 +63,12 @@ pub struct WindowsWindowState { pub direct_manipulation: DirectManipulationHandler, pub renderer: RefCell, + /// Set after a GPU device-lost recovery so the next `draw_window` call is + /// treated as a forced render. This guarantees the next frame both + /// re-enables drawing (via `mark_drawable`) and bypasses the GPUI view + /// cache, which would otherwise replay stale atlas tile references from + /// the previous frame and panic in `DirectXAtlasState::texture`. + pub force_render_after_recovery: Cell, pub click_state: ClickState, pub current_cursor: Cell>, @@ -159,6 +165,7 @@ impl WindowsWindowState { last_reported_capslock: Cell::new(last_reported_capslock), hovered: Cell::new(hovered), renderer: RefCell::new(renderer), + force_render_after_recovery: Cell::new(false), click_state, current_cursor: Cell::new(current_cursor), cursor_visible,