diff --git a/Cargo.lock b/Cargo.lock index 7627471392e0abcc54c5acf0846e48595bf3a686..fbe0a986b0384c15fa8d5d6a068db00ca8d08e77 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -15153,6 +15153,10 @@ dependencies = [ "winapi-util", ] +[[package]] +name = "sandbox" +version = "0.1.0" + [[package]] name = "scc" version = "3.5.6" diff --git a/Cargo.toml b/Cargo.toml index 9541d9e45b17f5ea92029082ab715a3c068067ac..ff1f16cb8af0eb33acd60f6f0d4be0e963b94e7d 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -163,6 +163,7 @@ members = [ "crates/rope", "crates/rpc", "crates/rules_library", + "crates/sandbox", "crates/scheduler", "crates/schema_generator", "crates/search", diff --git a/crates/sandbox/Cargo.toml b/crates/sandbox/Cargo.toml new file mode 100644 index 0000000000000000000000000000000000000000..64760fcc692a5427b6ea70fb2613de860434a61e --- /dev/null +++ b/crates/sandbox/Cargo.toml @@ -0,0 +1,14 @@ +[package] +name = "sandbox" +version = "0.1.0" +edition.workspace = true +publish.workspace = true +license = "GPL-3.0-or-later" +description = "OS-level sandboxing for terminal processes in Zed" + +[lints] +workspace = true + +[lib] +path = "src/sandbox.rs" +doctest = false diff --git a/crates/sandbox/README.md b/crates/sandbox/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b2af7774c7f60fae083faf6487624890483716a9 --- /dev/null +++ b/crates/sandbox/README.md @@ -0,0 +1,289 @@ +# Sandbox + +OS-level sandboxing for terminal processes spawned by Zed — both interactive +user terminals and agent tool invocations. The sandbox restricts filesystem +access, network access, and other capabilities so that commands run in the +terminal can only affect what they're explicitly permitted to. + +## Platform mechanisms + +- **macOS**: Seatbelt (SBPL profiles applied via `sandbox_init()`) +- **Linux**: Landlock LSM for filesystem restrictions, cgroups v2 for process + lifetime management + +Both mechanisms are inherited by child processes and cannot be removed. A +sandboxed shell and everything it spawns remain sandboxed for their entire +lifetime. + +## Always-on process tracking + +Reliable process cleanup is valuable even when the user has not configured any +sandbox restrictions. The standard approach of `killpg()` (kill by process +group) is unreliable — a process can escape via `setsid()` or `setpgid()`, and +the terminal's `Drop` impl will miss it. + +For this reason, **process tracking is always enabled for every terminal +session**, regardless of whether sandbox restrictions are configured: + +- **macOS**: A minimal Seatbelt profile is applied containing only the session + fingerprint (see below) and `(allow default)` for everything else. This + doesn't restrict the process at all, but gives us the `sandbox_check()` + fingerprint needed to reliably find and kill all descendants. When full + sandbox restrictions are also enabled, the fingerprint is embedded in the + restrictive profile instead. + +- **Linux**: A cgroup is created for every terminal session. On cleanup, the + cgroup is frozen and all members are killed. This works regardless of whether + Landlock filesystem restrictions are also enabled. + +This replaces the current cleanup approach (100ms delay + `kill_child_process`) +with a convergent, reliable mechanism on both platforms. + +## Process cleanup on terminal close + +When a terminal session ends, all processes it spawned must be killed. This is +straightforward on Linux (cgroups v2 provides an atomic, inescapable kill), but +requires careful handling on macOS where no equivalent kernel primitive exists. + +### The problem + +A process inside the sandbox can call `setsid()` or `setpgid()` to leave the +shell's process group. After that, `killpg()` (which kills by process group) +won't reach it. If the process also double-forks and the intermediate parent +exits, the grandchild is reparented to PID 1 (launchd), severing the parent +chain entirely. This means: + +- **Process group killing** misses it (different group). +- **Parent chain walking** can't find it (parent is PID 1). +- The process persists after the terminal closes, retaining whatever sandbox + permissions it was granted at spawn time. + +macOS Seatbelt has no operation for `setsid()` — it isn't a filterable +operation in SBPL, so the sandbox can't prevent this. (On Linux, seccomp could +block `setsid()`, but it would break legitimate programs like `ssh`.) + +### Why stale permissions matter + +The sandbox profile is a snapshot frozen at spawn time. If a process escapes +cleanup, it retains the original permissions indefinitely. This is a problem +because: + +- The user might later add secrets to a directory that was in the sandbox's + allowed paths. +- The user might change sandbox settings for future sessions, but the escaped + process still has the old, more-permissive profile. +- For agent tool use especially, the sandbox permissions are granted for a + specific task. An escaped process retaining those permissions after the task + is complete violates the principle of least privilege. + +### Linux: cgroups v2 + +On Linux, the solution is to place the shell in a dedicated cgroup. All +descendants are automatically tracked in the cgroup regardless of `setsid()`, +`setpgid()`, or reparenting. No process can leave a cgroup without +`CAP_SYS_ADMIN`. On terminal close: + +1. Freeze the cgroup (prevents new forks). +2. Kill all processes in the cgroup. +3. Delete the cgroup. + +This is a hard guarantee — the same mechanism containers use. + +cgroups v2 is the default on all modern Linux distributions (Ubuntu 21.10+, +Fedora 31+, Debian 11+, Arch 2020+, RHEL 9+). No installation or +configuration is needed. Regular (non-root) users can create child cgroups +within their own systemd user slice, so no elevated privileges are required. + +### macOS: sandbox fingerprinting with convergent cleanup + +macOS has no public equivalent to cgroups. The approach is a convergent +scan-and-kill loop that uses the Seatbelt sandbox profile itself as an +unforgeable fingerprint. + +#### Sandbox fingerprint + +Each terminal session embeds a unique fingerprint in its SBPL profile: a +per-session UUID path where one child path is allowed and a sibling is denied. + +``` +(allow file-read* (subpath "/tmp/.zed-sandbox-/allow")) +;; /tmp/.zed-sandbox-/deny is implicitly denied by (deny default) +``` + +When the session has no sandbox restrictions (fingerprint-only mode), the +profile uses `(allow default)` instead of `(deny default)`, but still includes +an explicit deny for the fingerprint's deny-side path: + +``` +(version 1) +(allow default) +(deny file-read* (subpath "/tmp/.zed-sandbox-/deny")) +(allow file-read* (subpath "/tmp/.zed-sandbox-/allow")) +``` + +This two-point fingerprint cannot be produced by any other sandbox profile: + +- A sandbox that blanket-allows `/tmp` would allow **both** paths — fails the + deny check. +- A sandbox that blanket-denies `/tmp` would deny **both** paths — fails the + allow check. +- An unsandboxed process allows everything — fails the deny check. +- Only a process with our exact profile allows one and denies the other. + +The fingerprint is checked from outside the process using `sandbox_check()`: + +```c +int allows = sandbox_check(pid, "file-read-data", + SANDBOX_FILTER_PATH, "/tmp/.zed-sandbox-/allow") == 0; +int denies = sandbox_check(pid, "file-read-data", + SANDBOX_FILTER_PATH, "/tmp/.zed-sandbox-/deny") != 0; +// Match requires: allows && denies +``` + +The fingerprint is unforgeable because the Seatbelt sandbox is a kernel-level +invariant — no process can modify or remove its own sandbox profile. + +#### Convergent cleanup loop + +On terminal close: + +1. `killpg(pgid, SIGKILL)` — kill the process group. This instantly handles + the vast majority of descendants (everything that didn't escape the group). +2. Enumerate all processes owned by the current UID (via `sysctl` + `KERN_PROC_UID`). +3. For each process, probe with `sandbox_check` using the session fingerprint. +4. `SIGKILL` every match. +5. Go to step 2. +6. When a full scan finds zero matches, every process from this session is + dead. +7. Delete the fingerprint directory. + +**Why this terminates:** Each iteration either discovers processes (and kills +them) or discovers none (loop exits). The total number of processes is finite, +and the set of living fingerprinted processes shrinks monotonically. + +**Why this is correct:** The Seatbelt sandbox is inherited by all descendants +and cannot be removed. Every descendant of the sandboxed shell — regardless of +`setsid()`, `setpgid()`, double-forking, or reparenting to PID 1 — carries the +session fingerprint. `sandbox_check` finds them by probing the kernel, not by +walking the process tree. + +**Why SIGKILL on sight instead of SIGSTOP:** An earlier design froze escapees +with `SIGSTOP` during scanning, then killed them all at the end. But `SIGSTOP` +only stops the process you send it to, not its children — so children of a +stopped process are still running and can fork. `SIGKILL` is equally effective: +a dead process can't fork, and any children it already created are findable by +fingerprint on the next scan iteration. The simpler approach is just to kill +everything on sight and keep scanning until the scan comes back empty. + +**Why not process-group operations after step 1:** After `killpg` handles the +initial process group, any remaining processes are by definition ones that +escaped via `setsid()` or `setpgid()`. They're in different process groups (or +their own sessions), so further `killpg` calls can't target them without +knowing their group IDs. Worse, if a process double-forks and the intermediate +parent exits, the grandchild is reparented to PID 1 (launchd) — there's no +parent chain linking it back to the original shell, and its process group is +unrelated to ours. The only reliable way to find these escapees is the +fingerprint probe, which works regardless of process group, session, or parent +relationship. + +**Residual race:** Between discovering a process (step 3) and killing it (step +4), the process could fork. But the child inherits the fingerprint, so the next +iteration of the loop finds it. The loop continues until no such children +remain. The only way a process could escape is to fork a child that somehow +doesn't inherit the sandbox — which the kernel guarantees cannot happen. + +### Alternatives considered and rejected + +#### Audit session IDs (BSM) + +macOS's BSM audit framework assigns each process an audit session ID +(`ai_asid`) that is inherited by children. In principle, this could track +descendants. Rejected because: + +- `getaudit_addr()` requires elevated privileges. +- There is no "kill all processes in this audit session" syscall — you still + end up enumerating and killing individually. +- macOS doesn't consistently use POSIX sessions (`ps -e -o sess` shows 0 for + all processes on many systems). + +#### Endpoint Security framework + +Apple's Endpoint Security framework provides kernel-level notifications for +every fork/exec event, which would allow perfectly reliable tracking. Rejected +because: + +- Requires the `com.apple.developer.endpoint-security.client` entitlement, + which must be approved by Apple. +- Designed for security products (antivirus, MDM), not general-purpose apps. +- Significantly increases the complexity and privilege requirements of Zed. + +#### XNU coalitions + +macOS has a kernel concept called "coalitions" that groups related processes for +resource tracking and lifecycle management — essentially Apple's internal +equivalent of cgroups. Rejected because: + +- The APIs (`coalition_create()`, `coalition_terminate()`) are private SPI. +- They require entitlements not available to third-party apps. + +#### Temporary copy / overlay of project directory + +Instead of granting sandbox access to the real project directory, use a +temporary copy or FUSE overlay, then delete it on terminal close. Rejected +because: + +- Copying large projects is expensive. +- File watching, symlinks, and build tool caching break. +- FUSE on macOS requires macFUSE (third-party kext) or FSKit (macOS 15+). +- Tools that embed absolute paths (compiler errors, debugger info) would show + wrong paths. + +#### Symlink indirection + +Grant sandbox access to a symlink path (e.g., `/tmp/.zed-link-` → +`/real/project/`), then delete the symlink on cleanup. Rejected because: + +- Seatbelt resolves symlinks to canonical paths when checking access (this is + why `canonicalize_paths()` is called before building the profile). +- Deleting the symlink wouldn't revoke access to the underlying real path. + +#### Blocking `setsid()` / `setpgid()` + +Prevent processes from leaving the process group in the first place. Rejected +because: + +- Seatbelt has no filterable operation for these syscalls. +- On Linux, seccomp could block them, but this breaks legitimate programs + (`ssh`, some build tools, process managers). + +#### Lightweight VM via Virtualization framework + +Run agent commands inside a macOS Virtualization framework VM. This would give a +hard process-lifetime guarantee (shutting down the VM kills everything). +Rejected (for now) because: + +- Massive architectural change. +- The VM runs Linux, not macOS — macOS-specific tools wouldn't work. +- Resource overhead (memory, CPU, startup time). +- Overkill for the current threat model. + +## Signal scoping (macOS) + +The SBPL profile uses `(allow signal (target children))` rather than a bare +`(allow signal)`. This prevents the sandboxed process from signaling arbitrary +same-user processes (other Zed instances, browsers, etc.) while still allowing +the shell to: + +- Manage jobs (`kill %1`, `bg`, `fg`) +- Use the `kill` command on child processes +- Clean up background jobs on exit (SIGHUP) + +Note that Ctrl+C and Ctrl+Z are sent by the kernel's TTY driver, not by the +shell, so they work regardless of signal sandbox rules. + +`(target self)` was considered but rejected because it would break all job +control and shell cleanup of background processes. + +In fingerprint-only mode (no sandbox restrictions), `(allow default)` already +permits all signals, so no explicit signal rule is needed. diff --git a/crates/sandbox/src/sandbox.rs b/crates/sandbox/src/sandbox.rs new file mode 100644 index 0000000000000000000000000000000000000000..1f6b0ab2c5d1a64fbb13eaf4993992d63cca09ff --- /dev/null +++ b/crates/sandbox/src/sandbox.rs @@ -0,0 +1,7 @@ +// Sandbox crate — OS-level sandboxing for terminal processes. +// +// See README.md for design context and rationale. +// +// The implementation currently lives in the `terminal` crate +// (sandbox_exec.rs, sandbox_macos.rs, sandbox_linux.rs) and will +// be migrated here.