SKILL.md

  1---
  2name: model-forge
  3description: Pick the right model for a task and the right git strategy for the work. Use when about to dispatch a subagent, start a new feature, or unsure whether to use opus/sonnet/haiku/codex, or whether to branch/worktree/work direct. NOT for actually running the work - this only routes.
  4disable-model-invocation: false
  5---
  6
  7# Model Forge
  8
  9Routing reference for picking the right model and the right git strategy. Consult before any subagent dispatch or non-trivial change.
 10
 11## Critical Rule
 12
 13**ALWAYS run subagents in background (`run_in_background: true`) UNLESS the user explicitly asks for foreground.** The main thread stays unblocked, the user can keep working, and long-running dispatches don't freeze the conversation. Foreground subagents are the exception, not the default.
 14
 15## Model Capability Matrix
 16
 17### claude-opus-4-6
 18Heaviest reasoning, best at multi-file architecture and critical correctness. Slowest and most expensive claude.
 19**Speed/cost:** ~3-5x slower than sonnet, ~5x cost premium.
 20**Use for:**
 21- Deep architecture design and hard refactors
 22- Security audits requiring full mental model
 23- Complex math, proofs, algorithmic correctness
 24- Novel problems with no clear path
 25- Multi-system integration design
 26- Critical decisions where being wrong is expensive
 27
 28### claude-sonnet-4-6
 29Balanced workhorse. Default for ~90% of work and most subagent dispatches.
 30**Speed/cost:** solid throughput, mid-tier cost.
 31**Use for:**
 32- Standard feature coding and debugging
 33- Research synthesis and writing
 34- Code review for logic and readability
 35- Scaffolding new modules from spec
 36- Most multi-step subagent tasks
 37- Anything you'd default to without thinking
 38
 39### claude-haiku-4-5
 40Fastest and cheapest. Capable enough for mechanical work.
 41**Speed/cost:** near-instant, fraction of sonnet cost.
 42**Use for:**
 43- File reads, scans, classification
 44- Transcription (youtube, audio, OCR)
 45- Single tool calls with no reasoning
 46- Grep-style searches across large trees
 47- Generating structured metadata from raw input
 48- Bulk context gathering across many files
 49
 50### gpt-5.4 (via codex plugin)
 51Adversarial engineer with different priors than claude. Mechanically excellent on focused tasks, brittle on vague ones. Extremely slow.
 52**Speed/cost:** ~30x slower than sonnet (10 min vs 18s in real tests). 3-4x fewer tokens than claude on comparable work. ChatGPT Plus = 30-150 msgs/5hr window.
 53**Benchmarks:** terminal-bench 77.3% vs claude 65.4%. Debate mode (claude + codex) bug detection jumps 53% → 80%.
 54**IMPORTANT:** Always pass `--model gpt-5.4` (or `model: "gpt-5.4"` in subagent config). Default model selection is unreliable - lock it explicitly every time.
 55**Use for:**
 56- Adversarial code review (highest-ROI use case - independent priors catch what claude misses)
 57- Focused bug fixes with a clear spec and acceptance criteria
 58- Mechanical refactors to a defined pattern (85-90% success on bounded scope)
 59- Test coverage for existing logic (stays in lane, no scope creep)
 60- Maintenance batch work (deps, doc fixes, webhook changes - queue 3-5 at session start)
 61- Terminal/CLI workflows
 62- Second-opinion on completed code
 63
 64## Routing Decision Tree
 65
 661. Is the task trivial / single tool call / scan / transcription / classification? → **haiku**
 672. Standard coding / research / writing / debugging / scaffolding? → **sonnet** (default)
 683. Deep architecture / hard multi-file refactor / correctness-critical / novel? → **opus**
 694. Adversarial bug hunt / security audit / can wait 10+ min? → **codex** (background only)
 705. Time-sensitive (need result < 2 min)? → **never codex**, sonnet or haiku
 71
 72## Parallelism Rules
 73
 74**Dispatch in parallel when:**
 75- Subtasks are independent (different dirs, different topics)
 76- Multiple haiku scanning unrelated paths
 77- Multiple sonnet researching separate questions
 78- Send all dispatches in a single message, collect results
 79
 80**Dispatch in background when:**
 81- Long-running work (codex, opus refactor)
 82- User can keep working while it runs
 83- Result doesn't block immediate next step
 84
 85**Keep in main agent when:**
 86- Small fast iterative loops (write -> test -> fix)
 87- Round-trip overhead exceeds task duration
 88- You need the tool results in your own context
 89
 90**Always:**
 91- Run subagents in background (`run_in_background: true`) so the main thread stays unblocked
 92- Lock codex model explicitly to `gpt-5.4` on every invocation
 93- Send parallel dispatches in a single message, not sequential calls
 94
 95**Never:**
 96- Run parallel agents on the same files without worktrees
 97- Dispatch codex for anything you're watching live
 98- Use opus for trivial tool calls
 99- Run subagents in foreground when background works
100
101## Git Strategy
102
103### Direct on current branch
104- Change is < 50 lines, 1-2 files
105- No parallel agents, pure linear flow
106- Easily reversible
107- **Example:** fix a typo, add a small util, edit a config
108
109### Feature branch (`feat/<slug>`)
110- Diff is large enough to bundle commits
111- Single linear workstream, no overlapping agents
112- Want to test/review before merging
113- **Example:** `git checkout -b feat/auth-refresh` for new auth flow
114
115### Git worktree
116- Multiple subagents touching overlapping files
117- Want experiment isolation while main work continues
118- Long-running opus/codex shouldn't dirty working tree
119- **Example:**
120  ```bash
121  git worktree add ../project-feat-api feat/api-layer
122  # dispatch sonnet into ../project-feat-api
123  # main agent stays unblocked in original tree
124  git merge feat/api-layer
125  git worktree remove ../project-feat-api
126  ```
127
128## Quick Reference
129
130| task type | model | branch | parallel? | background? |
131|---|---|---|---|---|
132| transcription / extraction | haiku | current | yes | yes |
133| file scan / grep | haiku | current | yes | no |
134| bulk context gathering | haiku | current | yes | yes |
135| research synthesis | sonnet | current | yes | optional |
136| code review (logic/style) | sonnet | current | no | no |
137| scaffolding new module | sonnet | feat | no | no |
138| writing docs | sonnet | feat | no | no |
139| debugging (interactive) | sonnet | current | no | no |
140| large refactor | opus | worktree | no | yes |
141| novel architecture | opus | feat | no | no |
142| security design | opus | feat | no | no |
143| adversarial code review | codex | feat | no | yes |
144| bug hunt | codex | worktree | no | yes |
145| security audit | codex | feat | no | yes |
146
147## Codex-Specific Failure Modes
148
149Codex is **too literal**. It will fulfill the exact request and break adjacent things. Real example from HN: asked to "fix compiler warnings", it made a bunch of values nullable to silence the warnings - technically correct, broke data integrity downstream.
150
151**Codex will fail at (NEVER use it for):**
152- **UI work of any kind** - layouts, components, styling, UX, design judgment. Codex has zero taste and no eye for visual hierarchy. Always claude.
153- **Anything you need to discuss** - "yo i need to talk about this", brainstorming, exploration, "what do you think", trade-off conversations. Codex is a one-shot executor, not a collaborator. Always claude.
154- **Reading user intent** - "make it better" → won't make leaps, will pick the most literal interpretation
155- **Architecture/product judgment** - no sense of tradeoffs beyond what's written
156- **Obscure libraries** - hallucinates rather than admitting unknown
157- **Multi-file dependency chains** - gets stuck in circles
158- **Continuous conversation context** - context degrades fast, not built for back-and-forth
159
160**Codex will excel at:**
161- Tasks where being literal is a feature (test writing, mechanical refactors)
162- Clearly-bounded specs with acceptance criteria
163- Adversarial passes with explicit focus arguments
164
165**Prompt requirements for codex:**
166- One concern per prompt (don't bundle)
167- Explicit scope boundaries (which files, what behavior, what done looks like)
168- Structured sections: General → Autonomy → Code Implementation → Editing Constraints → Exploration → Plan Tool
169- Use the positional focus argument: `/codex:adversarial-review challenge whether this caching design was right`
170- Heavy XML structure beats prose for multi-step specs
171
172## Optimal Codex Workflow
173
174The pattern that works (claude + codex hybrid):
175
1761. **Claude builds** - architecture, exploration, back-and-forth iteration
1772. **Codex reviews** - dispatch `/codex:adversarial-review` with a specific focus after a major change. 6-10 min wait, actionable findings
1783. **Claude filters** - triage codex output: "which are real issues vs noise, which need design decisions"
1794. **Codex fixes** - confirmed bugs with clear specs get delegated back to codex
1805. **Batch maintenance** - queue codex with 3-5 P2 tasks at session start, work on real stuff while it churns
181
182What doesn't work:
183- Letting codex drive architecture decisions
184- Vague task descriptions
185- Auto-running codex on every claude response (drains usage fast)
186
187## Anti-patterns
188
1891. **Codex for time-sensitive work** - 10+ min latency kills interactive flow
1902. **Codex with vague prompts** - it will interpret literally and break things
1913. **Opus for trivial tool calls** - haiku does it at 5% cost
1924. **Worktrees for non-overlapping single-agent work** - merge overhead with no upside
1935. **Big refactors on main** - hard rollback, polluted history
1946. **Parallel agents on same files without worktrees** - concurrent write conflicts
1957. **Dispatching codex synchronously** - always background, never wait
1968. **Forgetting `--model gpt-5.4`** - default selection is unreliable
1979. **Codex for UI work** - zero taste, no visual judgment, will produce bricks
19810. **Codex when you need to discuss/brainstorm** - it's a one-shot executor, not a collaborator
199
200## Examples
201
202**Bulk context gathering:** 5 independent file reads → 5 haiku in parallel, single message dispatch.
203
204**New feature scaffolding (small):** sonnet, current branch, no subagents.
205
206**New feature scaffolding (big, multi-file):** sonnet, `feat/<slug>` branch, optional sonnet subagents for independent modules.
207
208**Security audit before launch:** codex in background on `feat/audit` worktree, you keep building on main.
209
210**Hard refactor across 20 files:** opus in worktree, you stay on main fixing other stuff, merge when done.
211
212**"Find all references to X across the codebase":** haiku, current branch, no subagent (Grep tool is faster).