Fix potential race-condition in DisplayLink::drop on macOS (#32116)

Conrad Irwin created

Fix a segfault in CVDisplayLink

We see 1-2 crashes a day on macOS on the `CVDisplayLink` thread.

```
Segmentation fault: 11 on thread 9325960 (CVDisplayLink)
CoreVideo	CVHWTime::reset()
CoreVideo	CVXTime::reset()
CoreVideo	CVDisplayLink::runIOThread()
libsystem_pthread.dylib	_pthread_start
libsystem_pthread.dylib	thread_start
```

With the help of the Zed AI, I dove into the crash report, which looks
like this:

```
Crashed Thread:        49  CVDisplayLink

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x00000000000001f6
Exception Codes:       0x0000000000000001, 0x00000000000001f6

Thread 49 Crashed:: CVDisplayLink
0   CoreVideo                     	       0x18c1ed994 CVHWTime::reset() + 64
1   CoreVideo                     	       0x18c1ee474 CVXTime::reset() + 52
2   CoreVideo                     	       0x18c1ee198 CVDisplayLink::runIOThread() + 176
3   libsystem_pthread.dylib       	       0x18285ac0c _pthread_start + 136
4   libsystem_pthread.dylib       	       0x182855b80 thread_start + 8

Thread 49 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000000   x1: 0x000000018c206e08   x2: 0x0000002c00001513   x3: 0x0001d4630002a433
    x4: 0x00000e2100000000   x5: 0x0001d46300000000   x6: 0x000000000000002c   x7: 0x0000000000000000
    x8: 0x000000000000002e   x9: 0x000000004d555458  x10: 0x0000000000000000  x11: 0x0000000000000000
   x12: 0x0000000000000000  x13: 0x0000000000000000  x14: 0x0000000000000000  x15: 0x0000000000000000
   x16: 0x0000000182856a9c  x17: 0x00000001f19bc540  x18: 0x0000000000000000  x19: 0x0000600003c56ed8
   x20: 0x000000000002a433  x21: 0x0000000000000000  x22: 0x0000000000000000  x23: 0x0000000000000000
   x24: 0x0000000000000000  x25: 0x0000000000000000  x26: 0x0000000000000000  x27: 0x0000000000000000
   x28: 0x0000000000000000   fp: 0x000000016b02ade0   lr: 0x000000018c1ed984
    sp: 0x000000016b02adc0   pc: 0x000000018c1ed994 cpsr: 0x80001000
   far: 0x00000000000001f6  esr: 0x92000006 (Data Abort) byte read Translation fault

Binary Images:
       0x1828c9000 -        0x182e07fff com.apple.CoreFoundation (6.9) <df489a59-b4f6-32b8-9bb4-9b832960aa52> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
```

Using lldb to disassemble `CVHWTime::reset()` (and the AI to interpret
it), the crash is caused by dereferencing the pointer at the start of
the CVHWTime struct + 0x1c8. In this case the pointer has (the clearly
nonsense) value 0x2e (and 0x2e + 0x1c8 = 0x1f6, the failing address).

As to how this could happen...

Looking at the implementation of `CVDisplayLinkRelease`, it calls
straight into `CFRelease` on the main thread; and so it is not safe to
call `CVDisplayLinkRelease` concurrently with other threads that access
the CVDisplayLink. While we already stopped the display link, it turns
out that `CVDisplayLinkStop` just sets a flag on the struct to instruct
the io-thread to exit "soon", and returns immediately. That means we
don't know when the other thread will actually exit, and so we can't
safely call `CVDisplayLinkRelease`.

So, for now, we just leak these objects. They should be created
relatively infrequently (when the app is foregrounded/backgrounded), so
I don't think this is a huge problem.

Release Notes:

- Fix a rare crash on macOS when putting the app in the background.

Change summary

crates/gpui/src/platform/mac/display_link.rs | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

Detailed changes

crates/gpui/src/platform/mac/display_link.rs 🔗

@@ -12,7 +12,7 @@ use std::ffi::c_void;
 use util::ResultExt;
 
 pub struct DisplayLink {
-    display_link: sys::DisplayLink,
+    display_link: Option<sys::DisplayLink>,
     frame_requests: dispatch_source_t,
 }
 
@@ -59,7 +59,7 @@ impl DisplayLink {
             )?;
 
             Ok(Self {
-                display_link,
+                display_link: Some(display_link),
                 frame_requests,
             })
         }
@@ -70,7 +70,7 @@ impl DisplayLink {
             dispatch_resume(crate::dispatch_sys::dispatch_object_t {
                 _ds: self.frame_requests,
             });
-            self.display_link.start()?;
+            self.display_link.as_mut().unwrap().start()?;
         }
         Ok(())
     }
@@ -80,7 +80,7 @@ impl DisplayLink {
             dispatch_suspend(crate::dispatch_sys::dispatch_object_t {
                 _ds: self.frame_requests,
             });
-            self.display_link.stop()?;
+            self.display_link.as_mut().unwrap().stop()?;
         }
         Ok(())
     }
@@ -89,6 +89,14 @@ impl DisplayLink {
 impl Drop for DisplayLink {
     fn drop(&mut self) {
         self.stop().log_err();
+        // We see occasional segfaults on the CVDisplayLink thread.
+        //
+        // It seems possible that this happens because CVDisplayLinkRelease releases the CVDisplayLink
+        // on the main thread immediately, but the background thread that CVDisplayLink uses for timers
+        // is still accessing it.
+        //
+        // We might also want to upgrade to CADisplayLink, but that requires dropping old macOS support.
+        std::mem::forget(self.display_link.take());
         unsafe {
             dispatch_source_cancel(self.frame_requests);
         }