plan.md

  1<!-- DO NOT CHECK IN OR DELETE THIS FILE. It is a working plan for the sandbox implementation. -->
  2
  3# Sandbox implementation plan
  4
  5See `README.md` for the design rationale behind each decision here.
  6
  7## Phase 1: Sandbox crate extraction
  8
  9Move sandbox code out of the `terminal` crate into the `sandbox` crate so
 10that process-tracking logic has a proper home and can be used by both the
 11terminal spawn path and the cleanup path.
 12
 13**1.1 Move existing sandbox modules**
 14
 15Move the following files from `crates/terminal/src/` to `crates/sandbox/src/`:
 16- `sandbox_exec.rs` → entry point for `--sandbox-exec`
 17- `sandbox_macos.rs` → Seatbelt SBPL generation and application
 18- `sandbox_linux.rs` → Landlock implementation
 19- `sandbox_tests.rs` → tests
 20
 21Update `crates/terminal/Cargo.toml` to depend on `sandbox`, and update
 22`terminal.rs` to re-export or delegate to the sandbox crate.
 23
 24**1.2 Move `SandboxConfig` and related types**
 25
 26Move `SandboxConfig`, `ResolvedSystemPaths`, and `SandboxConfig::from_settings`
 27from `terminal_settings.rs` into the sandbox crate. The terminal crate
 28re-exports these types for backward compatibility.
 29
 30**1.3 Extract shared sandbox resolution logic**
 31
 32The sandbox config resolution logic is currently duplicated between
 33`crates/project/src/terminals.rs` and `crates/acp_thread/src/terminal.rs`.
 34Extract this into a shared helper on `SandboxConfig` (or a new function in the
 35sandbox crate) that both call sites use. This addresses code review item #5.
 36
 37## Phase 2: Session fingerprint (macOS)
 38
 39Implement the sandbox fingerprint mechanism so that every terminal session's
 40processes can be reliably identified via `sandbox_check()`.
 41
 42**2.1 Add `SessionFingerprint` type**
 43
 44Create a `SessionFingerprint` struct that generates and manages the per-session
 45UUID marker:
 46
 47- `SessionFingerprint::new()` — generates a UUID, creates
 48  `/tmp/.zed-sandbox-<uuid>/allow/` and the parent directory (but not
 49  `/tmp/.zed-sandbox-<uuid>/deny/`)
 50- `SessionFingerprint::matches_pid(pid) -> bool` — probes the process with
 51  `sandbox_check()` using the two-point allow/deny test
 52- `SessionFingerprint::cleanup()` — deletes the temporary directory
 53
 54**2.2 Add FFI bindings for `sandbox_check`**
 55
 56Add `extern "C"` declarations for `sandbox_check()` and the
 57`SANDBOX_FILTER_PATH` constant to `sandbox_macos.rs`. These are declared in
 58`<sandbox.h>`.
 59
 60**2.3 Embed fingerprint in SBPL profiles**
 61
 62Modify `generate_sbpl_profile()` in `sandbox_macos.rs` to accept a
 63`SessionFingerprint` and emit the allow/deny rules for the marker paths.
 64
 65**2.4 Add fingerprint-only SBPL profile**
 66
 67Add a new function (e.g., `generate_fingerprint_only_profile()`) that produces
 68a minimal profile:
 69
 70```
 71(version 1)
 72(allow default)
 73(deny file-read* (subpath "/tmp/.zed-sandbox-<uuid>/deny"))
 74(allow file-read* (subpath "/tmp/.zed-sandbox-<uuid>/allow"))
 75```
 76
 77This is used when no sandbox restrictions are configured but process tracking
 78is still needed.
 79
 80**2.5 Support both profile modes in `sandbox_exec_main()`**
 81
 82Modify `sandbox_exec_main()` so that it can apply either a full restrictive
 83profile or a fingerprint-only profile, depending on what config it receives.
 84The actual plumbing to always invoke the wrapper (even without sandbox
 85restrictions) happens in Phase 5, after Linux cgroup support is also in place.
 86
 87## Phase 3: Convergent cleanup (macOS)
 88
 89Replace the current `Drop` cleanup (100ms timer + `kill_child_process`) with
 90the convergent scan-and-kill loop.
 91
 92**3.1 Add process enumeration**
 93
 94Add a function that enumerates all PIDs owned by the current UID using
 95`sysctl` with `KERN_PROC_UID`. This returns a `Vec<pid_t>`.
 96
 97**3.2 Implement the cleanup loop**
 98
 99Add a `SessionFingerprint::kill_all_processes()` method that implements:
100
1011. `killpg(pgid, SIGKILL)` (best-effort, the group may already be gone) —
102   kills the majority of descendants instantly
1032. Loop: enumerate all PIDs by UID (via `sysctl` `KERN_PROC_UID`) → skip
104   zombies (`kp_proc.p_stat == SZOMB`) → filter by fingerprint match →
105   `SIGKILL` every match → repeat until no matches found
1063. Delete the fingerprint directory
107
108This runs on a background thread (not async — it's a tight loop that should
109complete quickly).
110
111Note: zombie processes must be skipped because they can't be killed by any
112signal (they're already dead, awaiting reaping by their parent). If
113`sandbox_check` still reports the sandbox profile for zombies, failing to skip
114them would cause the loop to spin. The zombie state is detectable from the
115same `sysctl` data used for enumeration.
116
117**3.3 Integrate into `Terminal::Drop`**
118
119Replace the current `Drop` implementation. Instead of the 100ms timer +
120`kill_child_process()`, spawn a background task that runs
121`fingerprint.kill_all_processes()`. The fingerprint is stored alongside the
122`PtyProcessInfo` in `TerminalType::Pty`.
123
124Also update `kill_active_task()` to use the same mechanism.
125
126Note: the cleanup task must complete even if Zed is exiting. The current `Drop`
127impl uses `detach()`, which risks the task being cancelled if the executor
128shuts down. Consider blocking briefly in `Drop` or using a mechanism that
129guarantees completion (e.g., a dedicated cleanup thread that outlives the
130executor).
131
132**3.4 Wire fingerprint through terminal creation**
133
134- `TerminalBuilder::new()` creates the `SessionFingerprint` and passes it to
135  the sandbox wrapper.
136- The fingerprint is stored in `TerminalType::Pty` alongside `info` and
137  `pty_tx`.
138- On drop, the fingerprint is moved into the cleanup task.
139
140## Phase 4: cgroups v2 (Linux)
141
142Implement cgroup-based process tracking for Linux, providing the same
143always-on process-lifetime guarantee.
144
145**4.1 Add cgroup session management**
146
147Add a `CgroupSession` type (Linux-only) that:
148
149- `CgroupSession::new()` — creates a new cgroup under the user's systemd
150  slice (e.g.,
151  `/sys/fs/cgroup/user.slice/user-<uid>.slice/user@<uid>.service/zed-terminal-<uuid>.scope`)
152  by writing to the cgroup filesystem
153- `CgroupSession::add_process(pid)` — writes the PID to `cgroup.procs`
154- `CgroupSession::kill_all()` — writes `1` to `cgroup.freeze`, then writes
155  `SIGKILL` to `cgroup.kill` (kernel 5.14+), or falls back to reading
156  `cgroup.procs` and killing each PID
157- `CgroupSession::cleanup()` — removes the cgroup directory
158
159**4.2 Integrate into sandbox exec**
160
161Modify the `--sandbox-exec` entry point on Linux to accept a cgroup path.
162Before exec-ing the real shell, the wrapper moves itself into the specified
163cgroup (by writing its own PID to `cgroup.procs`). All descendants
164automatically inherit cgroup membership.
165
166**4.3 Integrate into terminal lifecycle**
167
168Same pattern as macOS: `TerminalBuilder::new()` creates the `CgroupSession`,
169passes the cgroup path to the sandbox wrapper, stores the session in
170`TerminalType::Pty`, and uses it for cleanup in `Drop`.
171
172**4.4 Fallback for old kernels**
173
174If cgroup creation fails (old kernel, cgroups v2 not mounted, no permission),
175fall back to the current `killpg` + `kill_child_process` behavior. Log a
176warning so the user knows process tracking is degraded.
177
178## Phase 5: Always-on wrapper
179
180With both macOS fingerprinting (Phase 2) and Linux cgroups (Phase 4) in place,
181wire them up so the `--sandbox-exec` wrapper runs for every terminal session,
182not only when sandbox restrictions are configured.
183
184**5.1 Decouple wrapper invocation from `SandboxConfig`**
185
186Currently `TerminalBuilder::new()` only wraps the shell in `--sandbox-exec`
187when `sandbox_config.is_some()`. Change this so the wrapper is always used on
188Unix platforms. The wrapper receives either:
189- A full `SandboxExecConfig` (restrictions + fingerprint/cgroup), or
190- A tracking-only config (fingerprint on macOS, cgroup path on Linux, no
191  filesystem restrictions)
192
193Update `SandboxExecConfig` to have an optional restrictions payload and a
194required tracking payload.
195
196**5.2 Update both resolution sites**
197
198Modify `crates/project/src/terminals.rs` and `crates/acp_thread/src/terminal.rs`
199to always produce a tracking config. The sandbox restrictions remain gated
200behind the feature flag and `enabled` setting, but the tracking config is
201unconditional.
202
203**5.3 Update `--sandbox-exec` entry point**
204
205Modify `sandbox_exec_main()` to handle the tracking-only case:
206- On macOS: apply the fingerprint-only Seatbelt profile (no restrictions)
207- On Linux: move into the cgroup (no Landlock restrictions)
208- Then exec the real shell as before
209
210## Phase 6: Tests
211
212**6.1 Fingerprint tests (macOS)**
213
214- Test that `SessionFingerprint::matches_pid()` returns true for a process
215  launched with the session's Seatbelt profile.
216- Test that it returns false for an unsandboxed process.
217- Test that it returns false for a process with a different session's profile.
218- Test the two-point fingerprint: a process with blanket `/tmp` access does
219  not match.
220
221**6.2 Convergent cleanup tests (macOS)**
222
223- Test that a simple child process is killed.
224- Test that a process that calls `setsid()` is still found and killed.
225- Test that a double-forking daemon (fork → setsid → fork → parent exits) is
226  still found and killed.
227- Test that the loop terminates.
228
229**6.3 Cgroup tests (Linux)**
230
231- Test that `CgroupSession::kill_all()` kills a child process.
232- Test that a process that calls `setsid()` is still killed (it's in the
233  cgroup).
234- Test the fallback path when cgroups are unavailable.
235
236**6.4 Fingerprint-only mode tests (macOS)**
237
238- Test that a terminal spawned without sandbox restrictions still gets the
239  fingerprint profile applied.
240- Test that cleanup works correctly in fingerprint-only mode.
241- Test that the process is not restricted (can access arbitrary paths, use
242  network, etc.).
243
244## Phase 7: Cleanup of existing code review items
245
246With the new architecture in place, address the remaining items from the code
247review that haven't been handled by earlier phases:
248
249- **Item #1**: Change `(allow signal)` to `(allow signal (target children))`.
250- **Item #4**: Change `current_exe()` fallback to propagate the error with `?`.
251- **Item #6**: Replace `let _ = write!(...)` with `push_str` + `format!` or
252  `.unwrap()`.
253- **Items #7, #8**: Add tests for `additional_executable_paths` and
254  `canonicalize_paths()` with symlinks.