1---
2name: model-forge
3description: Pick the right model for a task and the right git strategy for the work. Use when about to dispatch a subagent, start a new feature, or unsure whether to use opus/sonnet/haiku/codex, or whether to branch/worktree/work direct. NOT for actually running the work - this only routes.
4disable-model-invocation: false
5---
6
7# Model Forge
8
9Routing reference for picking the right model and the right git strategy. Consult before any subagent dispatch or non-trivial change.
10
11## Critical Rule
12
13**ALWAYS run subagents in background (`run_in_background: true`) UNLESS the user explicitly asks for foreground.** The main thread stays unblocked, the user can keep working, and long-running dispatches don't freeze the conversation. Foreground subagents are the exception, not the default.
14
15## Model Capability Matrix
16
17### claude-opus-4-6
18Heaviest reasoning, best at multi-file architecture and critical correctness. Slowest and most expensive claude.
19**Speed/cost:** ~3-5x slower than sonnet, ~5x cost premium.
20**Use for:**
21- Deep architecture design and hard refactors
22- Security audits requiring full mental model
23- Complex math, proofs, algorithmic correctness
24- Novel problems with no clear path
25- Multi-system integration design
26- Critical decisions where being wrong is expensive
27
28### claude-sonnet-4-6
29Balanced workhorse. Default for ~90% of work and most subagent dispatches.
30**Speed/cost:** solid throughput, mid-tier cost.
31**Use for:**
32- Standard feature coding and debugging
33- Research synthesis and writing
34- Code review for logic and readability
35- Scaffolding new modules from spec
36- Most multi-step subagent tasks
37- Anything you'd default to without thinking
38
39### claude-haiku-4-5
40Fastest and cheapest. Capable enough for mechanical work.
41**Speed/cost:** near-instant, fraction of sonnet cost.
42**Use for:**
43- File reads, scans, classification
44- Transcription (youtube, audio, OCR)
45- Single tool calls with no reasoning
46- Grep-style searches across large trees
47- Generating structured metadata from raw input
48- Bulk context gathering across many files
49
50### gpt-5.4 (via codex plugin)
51Adversarial engineer with different priors than claude. Mechanically excellent on focused tasks, brittle on vague ones. Extremely slow.
52**Speed/cost:** ~30x slower than sonnet (10 min vs 18s in real tests). 3-4x fewer tokens than claude on comparable work. ChatGPT Plus = 30-150 msgs/5hr window.
53**Benchmarks:** terminal-bench 77.3% vs claude 65.4%. Debate mode (claude + codex) bug detection jumps 53% → 80%.
54**IMPORTANT:** Always pass `--model gpt-5.4` (or `model: "gpt-5.4"` in subagent config). Default model selection is unreliable - lock it explicitly every time.
55**Use for:**
56- Adversarial code review (highest-ROI use case - independent priors catch what claude misses)
57- Focused bug fixes with a clear spec and acceptance criteria
58- Mechanical refactors to a defined pattern (85-90% success on bounded scope)
59- Test coverage for existing logic (stays in lane, no scope creep)
60- Maintenance batch work (deps, doc fixes, webhook changes - queue 3-5 at session start)
61- Terminal/CLI workflows
62- Second-opinion on completed code
63
64## Routing Decision Tree
65
661. Is the task trivial / single tool call / scan / transcription / classification? → **haiku**
672. Standard coding / research / writing / debugging / scaffolding? → **sonnet** (default)
683. Deep architecture / hard multi-file refactor / correctness-critical / novel? → **opus**
694. Adversarial bug hunt / security audit / can wait 10+ min? → **codex** (background only)
705. Time-sensitive (need result < 2 min)? → **never codex**, sonnet or haiku
71
72## Parallelism Rules
73
74**Dispatch in parallel when:**
75- Subtasks are independent (different dirs, different topics)
76- Multiple haiku scanning unrelated paths
77- Multiple sonnet researching separate questions
78- Send all dispatches in a single message, collect results
79
80**Dispatch in background when:**
81- Long-running work (codex, opus refactor)
82- User can keep working while it runs
83- Result doesn't block immediate next step
84
85**Keep in main agent when:**
86- Small fast iterative loops (write -> test -> fix)
87- Round-trip overhead exceeds task duration
88- You need the tool results in your own context
89
90**Always:**
91- Run subagents in background (`run_in_background: true`) so the main thread stays unblocked
92- Lock codex model explicitly to `gpt-5.4` on every invocation
93- Send parallel dispatches in a single message, not sequential calls
94
95**Never:**
96- Run parallel agents on the same files without worktrees
97- Dispatch codex for anything you're watching live
98- Use opus for trivial tool calls
99- Run subagents in foreground when background works
100
101## Git Strategy
102
103### Direct on current branch
104- Change is < 50 lines, 1-2 files
105- No parallel agents, pure linear flow
106- Easily reversible
107- **Example:** fix a typo, add a small util, edit a config
108
109### Feature branch (`feat/<slug>`)
110- Diff is large enough to bundle commits
111- Single linear workstream, no overlapping agents
112- Want to test/review before merging
113- **Example:** `git checkout -b feat/auth-refresh` for new auth flow
114
115### Git worktree
116- Multiple subagents touching overlapping files
117- Want experiment isolation while main work continues
118- Long-running opus/codex shouldn't dirty working tree
119- **Example:**
120 ```bash
121 git worktree add ../project-feat-api feat/api-layer
122 # dispatch sonnet into ../project-feat-api
123 # main agent stays unblocked in original tree
124 git merge feat/api-layer
125 git worktree remove ../project-feat-api
126 ```
127
128## Quick Reference
129
130| task type | model | branch | parallel? | background? |
131|---|---|---|---|---|
132| transcription / extraction | haiku | current | yes | yes |
133| file scan / grep | haiku | current | yes | no |
134| bulk context gathering | haiku | current | yes | yes |
135| research synthesis | sonnet | current | yes | optional |
136| code review (logic/style) | sonnet | current | no | no |
137| scaffolding new module | sonnet | feat | no | no |
138| writing docs | sonnet | feat | no | no |
139| debugging (interactive) | sonnet | current | no | no |
140| large refactor | opus | worktree | no | yes |
141| novel architecture | opus | feat | no | no |
142| security design | opus | feat | no | no |
143| adversarial code review | codex | feat | no | yes |
144| bug hunt | codex | worktree | no | yes |
145| security audit | codex | feat | no | yes |
146
147## Codex-Specific Failure Modes
148
149Codex is **too literal**. It will fulfill the exact request and break adjacent things. Real example from HN: asked to "fix compiler warnings", it made a bunch of values nullable to silence the warnings - technically correct, broke data integrity downstream.
150
151**Codex will fail at (NEVER use it for):**
152- **UI work of any kind** - layouts, components, styling, UX, design judgment. Codex has zero taste and no eye for visual hierarchy. Always claude.
153- **Anything you need to discuss** - "yo i need to talk about this", brainstorming, exploration, "what do you think", trade-off conversations. Codex is a one-shot executor, not a collaborator. Always claude.
154- **Reading user intent** - "make it better" → won't make leaps, will pick the most literal interpretation
155- **Architecture/product judgment** - no sense of tradeoffs beyond what's written
156- **Obscure libraries** - hallucinates rather than admitting unknown
157- **Multi-file dependency chains** - gets stuck in circles
158- **Continuous conversation context** - context degrades fast, not built for back-and-forth
159
160**Codex will excel at:**
161- Tasks where being literal is a feature (test writing, mechanical refactors)
162- Clearly-bounded specs with acceptance criteria
163- Adversarial passes with explicit focus arguments
164
165**Prompt requirements for codex:**
166- One concern per prompt (don't bundle)
167- Explicit scope boundaries (which files, what behavior, what done looks like)
168- Structured sections: General → Autonomy → Code Implementation → Editing Constraints → Exploration → Plan Tool
169- Use the positional focus argument: `/codex:adversarial-review challenge whether this caching design was right`
170- Heavy XML structure beats prose for multi-step specs
171
172## Optimal Codex Workflow
173
174The pattern that works (claude + codex hybrid):
175
1761. **Claude builds** - architecture, exploration, back-and-forth iteration
1772. **Codex reviews** - dispatch `/codex:adversarial-review` with a specific focus after a major change. 6-10 min wait, actionable findings
1783. **Claude filters** - triage codex output: "which are real issues vs noise, which need design decisions"
1794. **Codex fixes** - confirmed bugs with clear specs get delegated back to codex
1805. **Batch maintenance** - queue codex with 3-5 P2 tasks at session start, work on real stuff while it churns
181
182What doesn't work:
183- Letting codex drive architecture decisions
184- Vague task descriptions
185- Auto-running codex on every claude response (drains usage fast)
186
187## Anti-patterns
188
1891. **Codex for time-sensitive work** - 10+ min latency kills interactive flow
1902. **Codex with vague prompts** - it will interpret literally and break things
1913. **Opus for trivial tool calls** - haiku does it at 5% cost
1924. **Worktrees for non-overlapping single-agent work** - merge overhead with no upside
1935. **Big refactors on main** - hard rollback, polluted history
1946. **Parallel agents on same files without worktrees** - concurrent write conflicts
1957. **Dispatching codex synchronously** - always background, never wait
1968. **Forgetting `--model gpt-5.4`** - default selection is unreliable
1979. **Codex for UI work** - zero taste, no visual judgment, will produce bricks
19810. **Codex when you need to discuss/brainstorm** - it's a one-shot executor, not a collaborator
199
200## Examples
201
202**Bulk context gathering:** 5 independent file reads → 5 haiku in parallel, single message dispatch.
203
204**New feature scaffolding (small):** sonnet, current branch, no subagents.
205
206**New feature scaffolding (big, multi-file):** sonnet, `feat/<slug>` branch, optional sonnet subagents for independent modules.
207
208**Security audit before launch:** codex in background on `feat/audit` worktree, you keep building on main.
209
210**Hard refactor across 20 files:** opus in worktree, you stay on main fixing other stuff, merge when done.
211
212**"Find all references to X across the codebase":** haiku, current branch, no subagent (Grep tool is faster).