FUTURE.md

  1# Hooks: Future Work
  2
  3This document tracks planned features and design notes for hooks that are not
  4yet implemented. Nothing here is part of the current contract. Treat it as a
  5scratchpad for what's next, not as documentation of current behavior.
  6
  7> [!NOTE] This document was largely LLM-generated.
  8
  9## `context_files`
 10
 11**Status:** planned, not implemented.
 12
 13### Motivation
 14
 15Today, a hook that wants to inject reference material into the agent's context
 16has exactly one knob: `context` (string or array of strings). Whatever the hook
 17puts there is concatenated into what the model sees. That's fine for short notes
 18("current branch: main", "scrubbed secrets") but it scales badly:
 19
 20- Dumping a whole `README.md` or `package.json` into `context` burns tokens on
 21  every tool call where the hook fires.
 22- The model sees the file contents even if it doesn't need them.
 23- Large files can push the turn past the context window.
 24
 25`context_files` is the lazy alternative: the hook returns **paths**, not
 26contents. Crush tells the agent the files exist and are relevant, and the agent
 27decides whether to open them with its existing `view` tool.
 28
 29### Proposed shape
 30
 31Additive envelope field. Accepts a list of strings:
 32
 33```jsonc
 34{
 35  "decision": "allow",
 36  "context": "Scrubbed one secret",
 37  "context_files": ["README.md", "docs/ARCHITECTURE.md"],
 38}
 39```
 40
 41Paths are resolved relative to `CRUSH_CWD`. Non-existent paths are dropped with
 42a debug log (don't fail the hook over a missing file).
 43
 44### How the agent sees it
 45
 46Crush appends a short note to the turn's context along the lines of:
 47
 48```
 49## Referenced files
 50- README.md
 51- docs/ARCHITECTURE.md
 52```
 53
 54No file contents are inlined. The agent opens them with `view` if it decides
 55they're relevant. This keeps cost proportional to need.
 56
 57### Aggregation
 58
 59Matches the existing rules for lists:
 60
 61- Concatenates across matching hooks in config order.
 62- Deduplicates paths (same file referenced by two hooks → listed once).
 63- Dropped entirely if the final decision is `deny` or `halt`.
 64
 65### Backwards compatibility
 66
 67Purely additive. Hooks that don't emit `context_files` are unaffected. Existing
 68envelopes keep working unchanged. No version bump required.
 69
 70### Open questions
 71
 72- Should `context_files` paths be constrained to `CRUSH_PROJECT_DIR`? Probably
 73  yes, to avoid hooks smuggling in arbitrary filesystem reads.
 74- Do we want a per-file line range (`"README.md:1-40"`) or keep it dead simple
 75  (whole-file references only)? Start simple; add ranges only if asked for.
 76- Should we annotate "why this file is relevant" per entry? An object form
 77  (`{"path": "...", "reason": "..."}`) would allow that but complicates the
 78  schema. Defer until there's a real user need.
 79
 80## Sub-agent opt-in
 81
 82**Status:** not implemented.
 83
 84### Background
 85
 86Today hooks fire **only** on the top-level agent's tool calls. Sub-agents
 87(`agent` task tool, `agentic_fetch`, future delegated loops) run without hook
 88interception so a single delegated turn doesn't trigger the user's hook N times.
 89
 90The outer sub-agent tool call itself is hooked, so blanket policy like "never
 91spawn sub-agents" or "rewrite prompts sent to the task agent" still works from
 92the coder's side. The sub-agent's inner loop is the part that's exempt.
 93
 94### Why users might want the escape hatch
 95
 96- Audit logging of every tool call, including delegated ones.
 97- Redaction hooks that want to apply uniformly regardless of who called the
 98  tool.
 99- Policy that cares about the _tool_ not the _caller_: "never fetch from this
100  domain, even in `agentic_fetch`."
101
102Until someone actually asks, don't ship this. YAGNI.
103
104### Proposed shape
105
106Additive, per-hook. Zero-value matches current default (skip sub-agents):
107
108```jsonc
109{
110  "hooks": {
111    "PreToolUse": [
112      {
113        "matcher": "^bash$",
114        "command": "./hooks/audit.sh",
115        "include_sub_agents": true, // default false
116      },
117    ],
118  },
119}
120```
121
122Implementation changes where `wrapToolsWithHooks` decides to skip. Instead of a
123single `isSubAgent` bailout, the runner filters per-hook matches by the hook's
124`include_sub_agents` flag. Hooks that opt in get wrapped into sub-agent tool
125slices too; everything else stays skipped.
126
127### Backwards Compatibility
128
129Purely additive. Hooks that don't set `include_sub_agents` get the default
130(`false` = skip sub-agents). No wire format change, no version bump. The initial
131transition from "hooks fire everywhere" to "hooks skip sub-agents by default"
132was a one-time behavior change; adding the opt-in is pure addition.
133
134### Side benefit: payload awareness
135
136Extend the stdin payload with `"is_sub_agent": true|false` so hook scripts that
137opt in can branch on caller type ("audit top-level and sub-agent calls
138differently"). Also purely additive — hooks that don't read the field are
139unaffected.
140
141### Open questions
142
143- Per-hook flag (above) vs a global `hooks.include_sub_agents` default? A global
144  toggle is simpler but coarse-grained; per-hook is more flexible and
145  composable. Start per-hook; a global default can be layered on later with
146  explicit precedence ("per-hook overrides global").
147- Does an opt-in hook see hooks from _nested_ sub-agents too (a sub-agent that
148  itself calls a sub-agent)? Probably yes — once you've opted in you want the
149  full tree. But call it out explicitly in docs so users aren't surprised by N²
150  explosions on pathological configs.
151
152## `UserPromptSubmit` event
153
154**Status:** not implemented.
155
156### Motivation
157
158Today Crush supports exactly one hook event, `PreToolUse`. That's enough to gate
159and rewrite tool calls but nothing else. The next-most-useful event is
160`UserPromptSubmit`: fires after the user hits Enter but before the turn hits the
161LLM. Lets hooks inject context, rewrite prompts, or gate on content without the
162mutation complexity of `PostToolUse` (output scrubbing, error coercion, size
163limits — all rabbit holes).
164
165### Use cases
166
167- Prepend project context the user didn't think to include ("current branch:
168  `feat/x`; last commit: `<sha> <title>`").
169- Point at reference files via `context_files` (when that lands) so the agent
170  knows where to look without being force-fed contents.
171- Redact secrets out of the prompt before it leaves the machine.
172- Refuse prompts matching a policy ("don't send anything mentioning
173  `production.env`") — with `deny` and a reason the user sees.
174- Expand shorthand (`@TODO` → "please address the TODO in …").
175
176### Proposed shape
177
178Stdin payload extends the common envelope with the prompt:
179
180```jsonc
181{
182  "event": "UserPromptSubmit",
183  "session_id": "…",
184  "cwd": "/home/user/project",
185  "prompt": "fix the login flow",
186  "attachments": ["screenshot.png"],
187}
188```
189
190Output envelope reuses common fields plus one new per-event field,
191`updated_prompt`:
192
193```jsonc
194{
195  "decision": "allow", // optional; deny blocks the submission entirely
196  "reason": "includes a production secret", // shown to the user when denying
197  "context": "Current branch: feat/login",
198  "updated_prompt": "fix the login flow\n\n(from @TODO on line 42)",
199}
200```
201
202`updated_prompt` is a **full replacement** — not a merge patch — because a
203prompt is a single string with no natural key structure. If multiple hooks emit
204`updated_prompt`, later hooks in config order win.
205
206### Aggregation
207
208Reuses the universal rules:
209
210- `halt` is sticky. Halts the whole turn before the LLM is called.
211- `context` concatenates in config order.
212- `updated_prompt`: last writer wins.
213- `decision: "deny"` blocks the submission. The user sees `reason`; the turn
214  never reaches the LLM.
215
216### Differences from `PreToolUse`
217
218- No `updated_input`: there are no tool inputs at this point.
219- No permission-prompt bypass: there's no permission prompt for a user prompt.
220- `decision: "allow"` is functionally identical to silence. It exists only for
221  symmetry with `PreToolUse` and to give hook authors a consistent vocabulary.
222  (Could be argued both ways — consider dropping it here.)
223- Fires on every user submission, including follow-ups in the same session.
224  Hooks should be fast; no subprocess-per-keystroke scenarios but the per-turn
225  overhead is real.
226
227### Implementation sketch
228
229- New event constant `EventUserPromptSubmit` in `internal/hooks/hooks.go`.
230- `Runner.Run` already takes an event name; no interface change.
231- A new call site in `sessionAgent.Run` (or the coordinator's Run path) that
232  fires hooks after creating the user message but before the first LLM call. If
233  the aggregate decision is `deny` or `halt`, abort the turn and surface
234  `reason` to the user.
235- If hooks return `context`, prepend it to the prompt seen by the LLM (or attach
236  as a system-message-level note — decide based on how the prompt is threaded
237  through fantasy).
238- If hooks return `updated_prompt`, replace the prompt body before the first LLM
239  call. The message row in the DB should still store the _original_ prompt so
240  the user sees what they typed; only the outbound version is rewritten. (Or:
241  store both, show the original, send the rewritten — mirror how `updated_input`
242  is handled today.)
243
244### Open questions
245
246- Store original vs rewritten prompt? Probably both, with UI showing original
247  and a subtle indicator that a hook modified it.
248- Do hooks fire on queued prompts too, or only when actually dispatched? If the
249  user queues three prompts and the hook blocks the second, what happens to the
250  third? Simplest rule: fire when dispatched; denial skips to the next queued
251  prompt with a visible note.
252- What about the `/commands` prefix? Does `UserPromptSubmit` fire for slash
253  commands, or are those intercepted earlier? Probably earlier — hooks see only
254  freeform prompts that would actually reach the LLM.
255
256## Cross-platform shell (Windows support)
257
258**Status:** not implemented.
259
260### Problem
261
262Today the hook runner uses `exec.Command("sh", "-c", hook.Command)`. On Windows
263this fails without WSL or Git Bash on PATH. Even with `sh.exe` available,
264Windows has no kernel shebang handling — `./hooks/foo.sh` can't be exec'd
265directly the way it can on Unix. Hooks are effectively Unix-only.
266
267### Approach
268
269Keep the `command` field as a string. Tokenize it shell-style, examine
270`argv[0]`, and branch:
271
272- If `argv[0]` starts with `./`, `../`, `/`, or `~/` — treat it as a **file
273  invocation**. Read the first ≤128 bytes, parse a shebang if present, and
274  dispatch to the named interpreter via `os/exec`. Extra args from the command
275  string pass through to the interpreter.
276- Otherwise — treat the whole string as **shell code** and hand it to mvdan's
277  in-process interpreter. mvdan resolves `node`, `bash`, `jq`, builtins,
278  pipelines, redirects, etc. via its own exec handler.
279
280No sentinel: a script with no shebang defaults to mvdan. A script with an
281explicit shebang (`#!/bin/bash`, `#!/usr/bin/env python3`, etc.) uses the named
282interpreter, which the user is responsible for having on PATH. Same contract on
283every platform.
284
285### Dispatch examples
286
287| `command`                                | `argv[0]`      | Route                    |
288| ---------------------------------------- | -------------- | ------------------------ |
289| `ls -la`                                 | `ls`           | mvdan                    |
290| `bash -c 'ls'`                           | `bash`         | mvdan (which execs bash) |
291| `node ./script.js`                       | `node`         | mvdan (which execs node) |
292| `./script.sh` (no shebang)               | `./script.sh`  | mvdan, fed the file      |
293| `./script.sh` (`#!/bin/bash`)            | `./script.sh`  | `bash ./script.sh`       |
294| `./script.py` (`#!/usr/bin/env python3`) | `./script.py`  | `python3 ./script.py`    |
295| `./script.exe`                           | `./script.exe` | `os/exec` direct         |
296
297### Contract on Windows
298
299- Inline shell runs through mvdan natively. No external dependency.
300- Shebang-dispatched scripts require the named interpreter on PATH (`bash.exe`,
301  `python.exe`, `node.exe`, etc.). Crush does the dispatch that the Windows
302  kernel won't.
303- Shebang-less scripts run through mvdan regardless of extension. CRLF line
304  endings are tolerated.
305
306### Implementation sketch
307
308- New function
309  `dispatch(ctx, cmd string, env []string, stdin io.Reader) (stdout, stderr string, exitCode int, err error)`
310  in `internal/hooks/`.
311- Tokenize using mvdan's parser (already a dep) for consistent quoting/escape
312  behavior with shell intuition.
313- Path-prefix check on `argv[0]`; if path, read shebang with a bounded
314  `io.LimitReader` and parse. Support:
315  - `#!/absolute/interpreter args…`
316  - `#!/usr/bin/env NAME` → resolve `NAME` on PATH
317  - `#!/usr/bin/env -S NAME args…` → treat as above; `-S` is common enough to
318    handle. Other `env` flags can error.
319- Unified exit-code helper. mvdan's `interp.ExitStatus` and `os/exec`'s
320  `ProcessState.ExitCode()` both become a single `int`.
321- Context cancellation: mvdan's exec handler uses `exec.CommandContext` for its
322  children, so a cancelled hook kills both the interpreter and any children.
323  Verify with a `sleep 60` test.
324- One fresh `interp.Runner` per hook invocation (parallel hooks must not share
325  state).
326
327### Swap the call site
328
329`Runner.runOne` in `internal/hooks/runner.go` replaces its
330`exec.Command("sh", "-c", …)` with a call to `dispatch(…)`. Everything
331downstream (exit-code 2 / 49 / other dispatch, stdout JSON parsing,
332stderr-as-reason) stays identical.
333
334### Tests
335
336Cross-platform matrix:
337
338- Inline: `echo hi`; `exit 2`; pipelines; redirections.
339- File, no shebang: treated as shell source through mvdan.
340- File, `#!/bin/bash` on Unix — invokes system bash.
341- File, `#!/usr/bin/env python3` — invokes python if present, skips if not.
342- File, `#!/usr/bin/env -S node --foo` — extra flags preserved.
343- File with CRLF line endings in the shebang.
344- `./missing-file` — non-blocking error, hook proceeds as "no opinion".
345- Timeout: hook that sleeps past its timeout gets killed; context cancellation
346  kills the interpreter and its children.
347- Concurrency: 10 hooks in parallel don't leak env/cwd/state between runners.
348- Windows-specific: `./script.exe` exec'd directly; bash-shebang script fails
349  gracefully when bash isn't on PATH.
350
351### Pitfalls to watch
352
353- **Userland shebang parsing is now our problem.** Edge cases around `env`
354  flags, args with spaces, CRLF, missing interpreter. Well-trodden but needs
355  real tests.
356- **The path-prefix heuristic is a heuristic.** `relative/path.sh` (no leading
357  `./`) gets mvdan'd, not file-dispatched. Matches shell intuition — at a bash
358  prompt, `relative/path.sh` doesn't run unless `.` is on PATH — but worth
359  documenting.
360- **Kernel shebang handling is bypassed on Unix.** Today a chmod+x'd script is
361  exec'd by the kernel; after this change, by our parser. Behavior should be
362  byte-identical; verify with tests.
363- **Two code paths.** mvdan vs direct-exec. Exit codes, stdin, signal
364  propagation, env inheritance must be aligned. Discipline, not cleverness.
365
366### Explicit non-goals
367
368- No bundled `bash.exe` or `python.exe`. Users bring their own interpreters.
369- No custom mvdan builtins (`crush_approve` etc.). Hooks stay portable and
370  testable under bare `bash`.
371- No `.sh`-extension filter on discovery. Hook file shape is driven by shebang,
372  not filename.
373- No Crush-as-script-interpreter mode (users can't write `#!/usr/bin/env crush`
374  and have it mean something). If we want that later, it's an additive feature,
375  not a dependency of this work.