FUTURE.md

  1# Hooks: Future Work
  2
  3This document tracks planned features and design notes for hooks that are not
  4yet implemented. Nothing here is part of the current contract. Treat it as a
  5scratchpad for what's next, not as documentation of current behavior.
  6
  7> [!NOTE]
  8> This document was largely LLM-generated.
  9
 10## `context_files`
 11
 12**Status:** planned, not implemented.
 13
 14### Motivation
 15
 16Today, a hook that wants to inject reference material into the agent's context
 17has exactly one knob: `context` (string or array of strings). Whatever the hook
 18puts there is concatenated into what the model sees. That's fine for short notes
 19("current branch: main", "scrubbed secrets") but it scales badly:
 20
 21- Dumping a whole `README.md` or `package.json` into `context` burns tokens on
 22  every tool call where the hook fires.
 23- The model sees the file contents even if it doesn't need them.
 24- Large files can push the turn past the context window.
 25
 26`context_files` is the lazy alternative: the hook returns **paths**, not
 27contents. Crush tells the agent the files exist and are relevant, and the agent
 28decides whether to open them with its existing `view` tool.
 29
 30### Proposed shape
 31
 32Additive envelope field. Accepts a list of strings:
 33
 34```jsonc
 35{
 36  "decision": "allow",
 37  "context": "Scrubbed one secret",
 38  "context_files": ["README.md", "docs/ARCHITECTURE.md"],
 39}
 40```
 41
 42Paths are resolved relative to `CRUSH_CWD`. Non-existent paths are dropped with
 43a debug log (don't fail the hook over a missing file).
 44
 45### How the agent sees it
 46
 47Crush appends a short note to the turn's context along the lines of:
 48
 49```
 50## Referenced files
 51- README.md
 52- docs/ARCHITECTURE.md
 53```
 54
 55No file contents are inlined. The agent opens them with `view` if it decides
 56they're relevant. This keeps cost proportional to need.
 57
 58### Aggregation
 59
 60Matches the existing rules for lists:
 61
 62- Concatenates across matching hooks in config order.
 63- Deduplicates paths (same file referenced by two hooks → listed once).
 64- Dropped entirely if the final decision is `deny` or `halt`.
 65
 66### Backwards compatibility
 67
 68Purely additive. Hooks that don't emit `context_files` are unaffected. Existing
 69envelopes keep working unchanged. No version bump required.
 70
 71### Open questions
 72
 73- Should `context_files` paths be constrained to `CRUSH_PROJECT_DIR`? Probably
 74  yes, to avoid hooks smuggling in arbitrary filesystem reads.
 75- Do we want a per-file line range (`"README.md:1-40"`) or keep it dead simple
 76  (whole-file references only)? Start simple; add ranges only if asked for.
 77- Should we annotate "why this file is relevant" per entry? An object form
 78  (`{"path": "...", "reason": "..."}`) would allow that but complicates the
 79  schema. Defer until there's a real user need.
 80
 81## Sub-agent opt-in
 82
 83**Status:** not implemented.
 84
 85### Background
 86
 87Today hooks fire **only** on the top-level agent's tool calls. Sub-agents
 88(`agent` task tool, `agentic_fetch`, future delegated loops) run without hook
 89interception so a single delegated turn doesn't trigger the user's hook N times.
 90
 91The outer sub-agent tool call itself is hooked, so blanket policy like "never
 92spawn sub-agents" or "rewrite prompts sent to the task agent" still works from
 93the coder's side. The sub-agent's inner loop is the part that's exempt.
 94
 95### Why users might want the escape hatch
 96
 97- Audit logging of every tool call, including delegated ones.
 98- Redaction hooks that want to apply uniformly regardless of who called the
 99  tool.
100- Policy that cares about the _tool_ not the _caller_: "never fetch from this
101  domain, even in `agentic_fetch`."
102
103Until someone actually asks, don't ship this. YAGNI.
104
105### Proposed shape
106
107Additive, per-hook. Zero-value matches current default (skip sub-agents):
108
109```jsonc
110{
111  "hooks": {
112    "PreToolUse": [
113      {
114        "matcher": "^bash$",
115        "command": "./hooks/audit.sh",
116        "include_sub_agents": true, // default false
117      },
118    ],
119  },
120}
121```
122
123Implementation changes where `wrapToolsWithHooks` decides to skip. Instead of a
124single `isSubAgent` bailout, the runner filters per-hook matches by the hook's
125`include_sub_agents` flag. Hooks that opt in get wrapped into sub-agent tool
126slices too; everything else stays skipped.
127
128### Backwards Compatibility
129
130Purely additive. Hooks that don't set `include_sub_agents` get the default
131(`false` = skip sub-agents). No wire format change, no version bump. The initial
132transition from "hooks fire everywhere" to "hooks skip sub-agents by default"
133was a one-time behavior change; adding the opt-in is pure addition.
134
135### Side benefit: payload awareness
136
137Extend the stdin payload with `"is_sub_agent": true|false` so hook scripts that
138opt in can branch on caller type ("audit top-level and sub-agent calls
139differently"). Also purely additive — hooks that don't read the field are
140unaffected.
141
142### Open questions
143
144- Per-hook flag (above) vs a global `hooks.include_sub_agents` default? A global
145  toggle is simpler but coarse-grained; per-hook is more flexible and
146  composable. Start per-hook; a global default can be layered on later with
147  explicit precedence ("per-hook overrides global").
148- Does an opt-in hook see hooks from _nested_ sub-agents too (a sub-agent that
149  itself calls a sub-agent)? Probably yes — once you've opted in you want the
150  full tree. But call it out explicitly in docs so users aren't surprised by N²
151  explosions on pathological configs.
152
153## `UserPromptSubmit` event
154
155**Status:** not implemented.
156
157### Motivation
158
159Today Crush supports exactly one hook event, `PreToolUse`. That's enough to gate
160and rewrite tool calls but nothing else. The next-most-useful event is
161`UserPromptSubmit`: fires after the user hits Enter but before the turn hits the
162LLM. Lets hooks inject context, rewrite prompts, or gate on content without the
163mutation complexity of `PostToolUse` (output scrubbing, error coercion, size
164limits — all rabbit holes).
165
166### Use cases
167
168- Prepend project context the user didn't think to include ("current branch:
169  `feat/x`; last commit: `<sha> <title>`").
170- Point at reference files via `context_files` (when that lands) so the agent
171  knows where to look without being force-fed contents.
172- Redact secrets out of the prompt before it leaves the machine.
173- Refuse prompts matching a policy ("don't send anything mentioning
174  `production.env`") — with `deny` and a reason the user sees.
175- Expand shorthand (`@TODO` → "please address the TODO in …").
176
177### Proposed shape
178
179Stdin payload extends the common envelope with the prompt:
180
181```jsonc
182{
183  "event": "UserPromptSubmit",
184  "session_id": "…",
185  "cwd": "/home/user/project",
186  "prompt": "fix the login flow",
187  "attachments": ["screenshot.png"],
188}
189```
190
191Output envelope reuses common fields plus one new per-event field,
192`updated_prompt`:
193
194```jsonc
195{
196  "decision": "allow", // optional; deny blocks the submission entirely
197  "reason": "includes a production secret", // shown to the user when denying
198  "context": "Current branch: feat/login",
199  "updated_prompt": "fix the login flow\n\n(from @TODO on line 42)",
200}
201```
202
203`updated_prompt` is a **full replacement** — not a merge patch — because a
204prompt is a single string with no natural key structure. If multiple hooks emit
205`updated_prompt`, later hooks in config order win.
206
207### Aggregation
208
209Reuses the universal rules:
210
211- `halt` is sticky. Halts the whole turn before the LLM is called.
212- `context` concatenates in config order.
213- `updated_prompt`: last writer wins.
214- `decision: "deny"` blocks the submission. The user sees `reason`; the turn
215  never reaches the LLM.
216
217### Differences from `PreToolUse`
218
219- No `updated_input`: there are no tool inputs at this point.
220- No permission-prompt bypass: there's no permission prompt for a user prompt.
221- `decision: "allow"` is functionally identical to silence. It exists only for
222  symmetry with `PreToolUse` and to give hook authors a consistent vocabulary.
223  (Could be argued both ways — consider dropping it here.)
224- Fires on every user submission, including follow-ups in the same session.
225  Hooks should be fast; no subprocess-per-keystroke scenarios but the per-turn
226  overhead is real.
227
228### Implementation sketch
229
230- New event constant `EventUserPromptSubmit` in `internal/hooks/hooks.go`.
231- `Runner.Run` already takes an event name; no interface change.
232- A new call site in `sessionAgent.Run` (or the coordinator's Run path) that
233  fires hooks after creating the user message but before the first LLM call. If
234  the aggregate decision is `deny` or `halt`, abort the turn and surface
235  `reason` to the user.
236- If hooks return `context`, prepend it to the prompt seen by the LLM (or attach
237  as a system-message-level note — decide based on how the prompt is threaded
238  through fantasy).
239- If hooks return `updated_prompt`, replace the prompt body before the first LLM
240  call. The message row in the DB should still store the _original_ prompt so
241  the user sees what they typed; only the outbound version is rewritten. (Or:
242  store both, show the original, send the rewritten — mirror how `updated_input`
243  is handled today.)
244
245### Open questions
246
247- Store original vs rewritten prompt? Probably both, with UI showing original
248  and a subtle indicator that a hook modified it.
249- Do hooks fire on queued prompts too, or only when actually dispatched? If the
250  user queues three prompts and the hook blocks the second, what happens to the
251  third? Simplest rule: fire when dispatched; denial skips to the next queued
252  prompt with a visible note.
253- What about the `/commands` prefix? Does `UserPromptSubmit` fire for slash
254  commands, or are those intercepted earlier? Probably earlier — hooks see only
255  freeform prompts that would actually reach the LLM.
256
257## Cross-platform shell (Windows support)
258
259**Status:** not implemented.
260
261### Problem
262
263Today the hook runner uses `exec.Command("sh", "-c", hook.Command)`. On Windows
264this fails without WSL or Git Bash on PATH. Even with `sh.exe` available,
265Windows has no kernel shebang handling — `./hooks/foo.sh` can't be exec'd
266directly the way it can on Unix. Hooks are effectively Unix-only.
267
268### Approach
269
270Keep the `command` field as a string. Tokenize it shell-style, examine
271`argv[0]`, and branch:
272
273- If `argv[0]` starts with `./`, `../`, `/`, or `~/` — treat it as a **file
274  invocation**. Read the first ≤128 bytes, parse a shebang if present, and
275  dispatch to the named interpreter via `os/exec`. Extra args from the command
276  string pass through to the interpreter.
277- Otherwise — treat the whole string as **shell code** and hand it to mvdan's
278  in-process interpreter. mvdan resolves `node`, `bash`, `jq`, builtins,
279  pipelines, redirects, etc. via its own exec handler.
280
281No sentinel: a script with no shebang defaults to mvdan. A script with an
282explicit shebang (`#!/bin/bash`, `#!/usr/bin/env python3`, etc.) uses the named
283interpreter, which the user is responsible for having on PATH. Same contract on
284every platform.
285
286### Dispatch examples
287
288| `command`                                | `argv[0]`      | Route                    |
289| ---------------------------------------- | -------------- | ------------------------ |
290| `ls -la`                                 | `ls`           | mvdan                    |
291| `bash -c 'ls'`                           | `bash`         | mvdan (which execs bash) |
292| `node ./script.js`                       | `node`         | mvdan (which execs node) |
293| `./script.sh` (no shebang)               | `./script.sh`  | mvdan, fed the file      |
294| `./script.sh` (`#!/bin/bash`)            | `./script.sh`  | `bash ./script.sh`       |
295| `./script.py` (`#!/usr/bin/env python3`) | `./script.py`  | `python3 ./script.py`    |
296| `./script.exe`                           | `./script.exe` | `os/exec` direct         |
297
298### Contract on Windows
299
300- Inline shell runs through mvdan natively. No external dependency.
301- Shebang-dispatched scripts require the named interpreter on PATH (`bash.exe`,
302  `python.exe`, `node.exe`, etc.). Crush does the dispatch that the Windows
303  kernel won't.
304- Shebang-less scripts run through mvdan regardless of extension. CRLF line
305  endings are tolerated.
306
307### Implementation sketch
308
309- New function
310  `dispatch(ctx, cmd string, env []string, stdin io.Reader) (stdout, stderr string, exitCode int, err error)`
311  in `internal/hooks/`.
312- Tokenize using mvdan's parser (already a dep) for consistent quoting/escape
313  behavior with shell intuition.
314- Path-prefix check on `argv[0]`; if path, read shebang with a bounded
315  `io.LimitReader` and parse. Support:
316  - `#!/absolute/interpreter args…`
317  - `#!/usr/bin/env NAME` → resolve `NAME` on PATH
318  - `#!/usr/bin/env -S NAME args…` → treat as above; `-S` is common enough to
319    handle. Other `env` flags can error.
320- Unified exit-code helper. mvdan's `interp.ExitStatus` and `os/exec`'s
321  `ProcessState.ExitCode()` both become a single `int`.
322- Context cancellation: mvdan's exec handler uses `exec.CommandContext` for its
323  children, so a cancelled hook kills both the interpreter and any children.
324  Verify with a `sleep 60` test.
325- One fresh `interp.Runner` per hook invocation (parallel hooks must not share
326  state).
327
328### Swap the call site
329
330`Runner.runOne` in `internal/hooks/runner.go` replaces its
331`exec.Command("sh", "-c", …)` with a call to `dispatch(…)`. Everything
332downstream (exit-code 2 / 49 / other dispatch, stdout JSON parsing,
333stderr-as-reason) stays identical.
334
335### Tests
336
337Cross-platform matrix:
338
339- Inline: `echo hi`; `exit 2`; pipelines; redirections.
340- File, no shebang: treated as shell source through mvdan.
341- File, `#!/bin/bash` on Unix — invokes system bash.
342- File, `#!/usr/bin/env python3` — invokes python if present, skips if not.
343- File, `#!/usr/bin/env -S node --foo` — extra flags preserved.
344- File with CRLF line endings in the shebang.
345- `./missing-file` — non-blocking error, hook proceeds as "no opinion".
346- Timeout: hook that sleeps past its timeout gets killed; context cancellation
347  kills the interpreter and its children.
348- Concurrency: 10 hooks in parallel don't leak env/cwd/state between runners.
349- Windows-specific: `./script.exe` exec'd directly; bash-shebang script fails
350  gracefully when bash isn't on PATH.
351
352### Pitfalls to watch
353
354- **Userland shebang parsing is now our problem.** Edge cases around `env`
355  flags, args with spaces, CRLF, missing interpreter. Well-trodden but needs
356  real tests.
357- **The path-prefix heuristic is a heuristic.** `relative/path.sh` (no leading
358  `./`) gets mvdan'd, not file-dispatched. Matches shell intuition — at a bash
359  prompt, `relative/path.sh` doesn't run unless `.` is on PATH — but worth
360  documenting.
361- **Kernel shebang handling is bypassed on Unix.** Today a chmod+x'd script is
362  exec'd by the kernel; after this change, by our parser. Behavior should be
363  byte-identical; verify with tests.
364- **Two code paths.** mvdan vs direct-exec. Exit codes, stdin, signal
365  propagation, env inheritance must be aligned. Discipline, not cleverness.
366
367### Explicit non-goals
368
369- No bundled `bash.exe` or `python.exe`. Users bring their own interpreters.
370- No custom mvdan builtins (`crush_approve` etc.). Hooks stay portable and
371  testable under bare `bash`.
372- No `.sh`-extension filter on discovery. Hook file shape is driven by shebang,
373  not filename.
374- No Crush-as-script-interpreter mode (users can't write `#!/usr/bin/env crush`
375  and have it mean something). If we want that later, it's an additive feature,
376  not a dependency of this work.