Fix DirectX atlas panic after GPU device recovery (#55878)
Ben Brandt
created
## Problem
A Sentry-reported crash on Windows (Intel Iris Xe Graphics, v1.0.1):
```
index out of bounds: the len is 1 but the index is 1
```
panicking at `DirectXAtlasState::texture` in
[`crates/gpui_windows/src/directx_atlas.rs`](https://github.com/zed-industries/zed/blob/main/crates/gpui_windows/src/directx_atlas.rs):
```rust
AtlasTextureKind::Subpixel => {
&self.subpixel_textures[id.index as usize].as_ref().unwrap()
}
```
## Root cause
After a GPU device-lost recovery, GPUI's view cache replays stale
`AtlasTile`
references from the previous frame's `paint_operations` via
`Scene::replay`.
1. **Atlas grows past one texture.** A long enough session pushes
`subpixel_textures.textures.len() ≥ 2` (easy on Iris Xe at the default
1024×1024 atlas size). Top-level views in Zed use `cached(...)`, so
their
`AnyViewState.paint_range` records into
`rendered_frame.scene.paint_operations`,
referencing both index `0` and index `1`.
2. **Device lost.** `handle_device_lost` clears every `AtlasTextureList`
(`textures.len() == 0`)
and `tiles_by_key`, then sets `skip_draws = true`.
3. **`WM_GPUI_FORCE_UPDATE_WINDOW` arrives.** `mark_drawable()` flips
`skip_draws` back to `false` and `request_frame` runs with
`force_render: true`.
4. **The cache hit.** Inside `Window::draw`, `AnyView::prepaint`'s cache
check (`!dirty_views.contains(...) && !window.refreshing`) succeeds for
every cached view because the recovery doesn't touch invalidator state
and
`force_render` doesn't propagate into `Window`. `AnyView::paint` calls
`window.reuse_paint` → `Scene::replay` → `primitive.clone()`, which
(since
`SubpixelSprite`/`AtlasTile` are `Copy`) verbatim copies a
`Primitive::SubpixelSprite { tile: { texture_id: { index: 1, ... }, ...
} }`
into `next_frame.scene`.
5. **Atlas regrows to one.** Dirty/uncached parts of the same frame
(caret, animations, anything that called `cx.notify`) fall through to
`paint_glyph` → `get_or_insert_with` → `push_texture`, growing
`subpixel_textures.textures` from `0` to **`1`** with index `0` valid.
6. **Panic.** After `mem::swap`, `rendered_frame.scene` contains a mix
of
fresh `index = 0` and replayed `index = 1` sprites. `Scene::batches`
emits separate batches per `texture_id`; the `index = 1` batch reaches
`atlas.get_texture_view` → `subpixel_textures[1]` → panic with
`len = 1, index = 1`.
The two earlier related fixes do not catch this:
- **#52389 / dbd95ea7** (`if force_render { mark_drawable }`) protects
the
200 ms recovery sleep — pending `WM_PAINT`s carry `force_render = false`
and so do not clear `skip_draws`. But `WM_GPUI_FORCE_UPDATE_WINDOW`
carries `force_render = true`, so `mark_drawable` runs, then
`Window::draw`'s `reuse_paint` still reproduces stale tiles.
- The unmerged Windows draft `2e5d890e37`
(`force_render_after_recovery`)
similarly only forces the forced-render branch — it doesn't bypass the
view cache.
## Fix
Two parts:
**1. Bypass the view cache on a forced draw (cross-platform).**
In the platform-agnostic `request_frame` closure in `Window::new`, call
`window.refresh()` whenever `RequestFrameOptions::force_render` is
`true`.
`Window::refresh` is the documented escape hatch for cached views (per
the
`AnyView::cached` docs: *"The one exception is when [Window::refresh] is
called, in which case caching is ignored."*). With `refreshing = true`
every `AnyView::prepaint` cache check fails, every cached view fully
repaints, and `paint_glyph` allocates fresh tiles for every glyph, so
`rendered_frame.scene` ends up free of stale `AtlasTile`s.
**2. Add the `force_render_after_recovery` flag on Windows.**
Mirror the Linux fix from #52389: a per-window `Cell<bool>` set after
`WindowsWindowInner::handle_device_lost` succeeds and consumed at the
top
of `draw_window`. Together with the GPUI change above, the first frame
after recovery (whether a stray `WM_PAINT` during the 200 ms recovery
sleep or the explicit `WM_GPUI_FORCE_UPDATE_WINDOW`) is treated as a
forced render that both clears `skip_draws` and bypasses the view cache.
## Testing
- `script/clippy -p gpui` is clean.
- I do not have a Windows toolchain available locally, so I have not
cross-compiled `gpui_windows`. Reviewers with Windows access — please
smoke-test on a machine where the device-lost path can be exercised
(Intel iGPU, suspend/resume, or running a TDR-inducing test on a GPU
driver).
## Related
- Sentry issue ID 7457971403 (DirectX subpixel atlas crash, Intel Iris
Xe).
- Builds on / fixes the residual gap in #52389 (`gpui_linux: Force scene
rebuild after GPU device recovery"). The GPUI change here also hardens
the corresponding Linux path against the same `reuse_paint` mechanism.
Release Notes:
- Fixed a crash on Windows when the GPU device is lost and recovered
during use (typically driver crash, suspend/resume, or display
reconfiguration, most commonly on Intel iGPUs)
@@ -1174,6 +1174,11 @@ impl WindowsWindowInner {
{
panic!("Device lost: {err}");
}
+ // Make sure the first `draw_window` after recovery (whether it comes
+ // from the forced WM_GPUI_FORCE_UPDATE_WINDOW or a stray WM_PAINT in
+ // between) is treated as a forced render so it both clears
+ // `skip_draws` and bypasses the view cache.
+ self.state.force_render_after_recovery.set(true);
Some(0)
}
@@ -1198,6 +1203,7 @@ impl WindowsWindowInner {
}
}
+ let force_render = force_render || self.state.force_render_after_recovery.take();
if force_render {
// Re-enable drawing after a device loss recovery. The forced render
// will rebuild the scene with fresh atlas textures.
@@ -63,6 +63,12 @@ pub struct WindowsWindowState {
pub direct_manipulation: DirectManipulationHandler,
pub renderer: RefCell<DirectXRenderer>,
+ /// Set after a GPU device-lost recovery so the next `draw_window` call is
+ /// treated as a forced render. This guarantees the next frame both
+ /// re-enables drawing (via `mark_drawable`) and bypasses the GPUI view
+ /// cache, which would otherwise replay stale atlas tile references from
+ /// the previous frame and panic in `DirectXAtlasState::texture`.
+ pub force_render_after_recovery: Cell<bool>,
pub click_state: ClickState,
pub current_cursor: Cell<Option<HCURSOR>>,
@@ -159,6 +165,7 @@ impl WindowsWindowState {
last_reported_capslock: Cell::new(last_reported_capslock),
hovered: Cell::new(hovered),
renderer: RefCell::new(renderer),
+ force_render_after_recovery: Cell::new(false),
click_state,
current_cursor: Cell::new(current_cursor),
cursor_visible,