Fix potential race-condition in DisplayLink::drop on macOS (#32116)
Conrad Irwin
created
Fix a segfault in CVDisplayLink
We see 1-2 crashes a day on macOS on the `CVDisplayLink` thread.
```
Segmentation fault: 11 on thread 9325960 (CVDisplayLink)
CoreVideo CVHWTime::reset()
CoreVideo CVXTime::reset()
CoreVideo CVDisplayLink::runIOThread()
libsystem_pthread.dylib _pthread_start
libsystem_pthread.dylib thread_start
```
With the help of the Zed AI, I dove into the crash report, which looks
like this:
```
Crashed Thread: 49 CVDisplayLink
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000000000001f6
Exception Codes: 0x0000000000000001, 0x00000000000001f6
Thread 49 Crashed:: CVDisplayLink
0 CoreVideo 0x18c1ed994 CVHWTime::reset() + 64
1 CoreVideo 0x18c1ee474 CVXTime::reset() + 52
2 CoreVideo 0x18c1ee198 CVDisplayLink::runIOThread() + 176
3 libsystem_pthread.dylib 0x18285ac0c _pthread_start + 136
4 libsystem_pthread.dylib 0x182855b80 thread_start + 8
Thread 49 crashed with ARM Thread State (64-bit):
x0: 0x0000000000000000 x1: 0x000000018c206e08 x2: 0x0000002c00001513 x3: 0x0001d4630002a433
x4: 0x00000e2100000000 x5: 0x0001d46300000000 x6: 0x000000000000002c x7: 0x0000000000000000
x8: 0x000000000000002e x9: 0x000000004d555458 x10: 0x0000000000000000 x11: 0x0000000000000000
x12: 0x0000000000000000 x13: 0x0000000000000000 x14: 0x0000000000000000 x15: 0x0000000000000000
x16: 0x0000000182856a9c x17: 0x00000001f19bc540 x18: 0x0000000000000000 x19: 0x0000600003c56ed8
x20: 0x000000000002a433 x21: 0x0000000000000000 x22: 0x0000000000000000 x23: 0x0000000000000000
x24: 0x0000000000000000 x25: 0x0000000000000000 x26: 0x0000000000000000 x27: 0x0000000000000000
x28: 0x0000000000000000 fp: 0x000000016b02ade0 lr: 0x000000018c1ed984
sp: 0x000000016b02adc0 pc: 0x000000018c1ed994 cpsr: 0x80001000
far: 0x00000000000001f6 esr: 0x92000006 (Data Abort) byte read Translation fault
Binary Images:
0x1828c9000 - 0x182e07fff com.apple.CoreFoundation (6.9) <df489a59-b4f6-32b8-9bb4-9b832960aa52> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
```
Using lldb to disassemble `CVHWTime::reset()` (and the AI to interpret
it), the crash is caused by dereferencing the pointer at the start of
the CVHWTime struct + 0x1c8. In this case the pointer has (the clearly
nonsense) value 0x2e (and 0x2e + 0x1c8 = 0x1f6, the failing address).
As to how this could happen...
Looking at the implementation of `CVDisplayLinkRelease`, it calls
straight into `CFRelease` on the main thread; and so it is not safe to
call `CVDisplayLinkRelease` concurrently with other threads that access
the CVDisplayLink. While we already stopped the display link, it turns
out that `CVDisplayLinkStop` just sets a flag on the struct to instruct
the io-thread to exit "soon", and returns immediately. That means we
don't know when the other thread will actually exit, and so we can't
safely call `CVDisplayLinkRelease`.
So, for now, we just leak these objects. They should be created
relatively infrequently (when the app is foregrounded/backgrounded), so
I don't think this is a huge problem.
Release Notes:
- Fix a rare crash on macOS when putting the app in the background.