name: model-forge description: Pick the right model for a task and the right git strategy for the work. Use when about to dispatch a subagent, start a new feature, or unsure whether to use opus/sonnet/haiku/codex, or whether to branch/worktree/work direct. NOT for actually running the work - this only routes. disable-model-invocation: false
Model Forge
Routing reference for picking the right model and the right git strategy. Consult before any subagent dispatch or non-trivial change.
Critical Rule
ALWAYS run subagents in background (run_in_background: true) UNLESS the user explicitly asks for foreground. The main thread stays unblocked, the user can keep working, and long-running dispatches don't freeze the conversation. Foreground subagents are the exception, not the default.
Model Capability Matrix
claude-opus-4-6
Heaviest reasoning, best at multi-file architecture and critical correctness. Slowest and most expensive claude. Speed/cost: ~3-5x slower than sonnet, ~5x cost premium. Use for:
- Deep architecture design and hard refactors
- Security audits requiring full mental model
- Complex math, proofs, algorithmic correctness
- Novel problems with no clear path
- Multi-system integration design
- Critical decisions where being wrong is expensive
claude-sonnet-4-6
Balanced workhorse. Default for ~90% of work and most subagent dispatches. Speed/cost: solid throughput, mid-tier cost. Use for:
- Standard feature coding and debugging
- Research synthesis and writing
- Code review for logic and readability
- Scaffolding new modules from spec
- Most multi-step subagent tasks
- Anything you'd default to without thinking
claude-haiku-4-5
Fastest and cheapest. Capable enough for mechanical work. Speed/cost: near-instant, fraction of sonnet cost. Use for:
- File reads, scans, classification
- Transcription (youtube, audio, OCR)
- Single tool calls with no reasoning
- Grep-style searches across large trees
- Generating structured metadata from raw input
- Bulk context gathering across many files
gpt-5.4 (via codex plugin)
Adversarial engineer with different priors than claude. Mechanically excellent on focused tasks, brittle on vague ones. Extremely slow.
Speed/cost: ~30x slower than sonnet (10 min vs 18s in real tests). 3-4x fewer tokens than claude on comparable work. ChatGPT Plus = 30-150 msgs/5hr window.
Benchmarks: terminal-bench 77.3% vs claude 65.4%. Debate mode (claude + codex) bug detection jumps 53% → 80%.
IMPORTANT: Always pass --model gpt-5.4 (or model: "gpt-5.4" in subagent config). Default model selection is unreliable - lock it explicitly every time.
Use for:
- Adversarial code review (highest-ROI use case - independent priors catch what claude misses)
- Focused bug fixes with a clear spec and acceptance criteria
- Mechanical refactors to a defined pattern (85-90% success on bounded scope)
- Test coverage for existing logic (stays in lane, no scope creep)
- Maintenance batch work (deps, doc fixes, webhook changes - queue 3-5 at session start)
- Terminal/CLI workflows
- Second-opinion on completed code
Routing Decision Tree
- Is the task trivial / single tool call / scan / transcription / classification? → haiku
- Standard coding / research / writing / debugging / scaffolding? → sonnet (default)
- Deep architecture / hard multi-file refactor / correctness-critical / novel? → opus
- Adversarial bug hunt / security audit / can wait 10+ min? → codex (background only)
- Time-sensitive (need result < 2 min)? → never codex, sonnet or haiku
Parallelism Rules
Dispatch in parallel when:
- Subtasks are independent (different dirs, different topics)
- Multiple haiku scanning unrelated paths
- Multiple sonnet researching separate questions
- Send all dispatches in a single message, collect results
Dispatch in background when:
- Long-running work (codex, opus refactor)
- User can keep working while it runs
- Result doesn't block immediate next step
Keep in main agent when:
- Small fast iterative loops (write -> test -> fix)
- Round-trip overhead exceeds task duration
- You need the tool results in your own context
Always:
- Run subagents in background (
run_in_background: true) so the main thread stays unblocked - Lock codex model explicitly to
gpt-5.4on every invocation - Send parallel dispatches in a single message, not sequential calls
Never:
- Run parallel agents on the same files without worktrees
- Dispatch codex for anything you're watching live
- Use opus for trivial tool calls
- Run subagents in foreground when background works
Git Strategy
Direct on current branch
- Change is < 50 lines, 1-2 files
- No parallel agents, pure linear flow
- Easily reversible
- Example: fix a typo, add a small util, edit a config
Feature branch (feat/<slug>)
- Diff is large enough to bundle commits
- Single linear workstream, no overlapping agents
- Want to test/review before merging
- Example:
git checkout -b feat/auth-refreshfor new auth flow
Git worktree
- Multiple subagents touching overlapping files
- Want experiment isolation while main work continues
- Long-running opus/codex shouldn't dirty working tree
- Example:
git worktree add ../project-feat-api feat/api-layer # dispatch sonnet into ../project-feat-api # main agent stays unblocked in original tree git merge feat/api-layer git worktree remove ../project-feat-api
Quick Reference
| task type | model | branch | parallel? | background? |
|---|---|---|---|---|
| transcription / extraction | haiku | current | yes | yes |
| file scan / grep | haiku | current | yes | no |
| bulk context gathering | haiku | current | yes | yes |
| research synthesis | sonnet | current | yes | optional |
| code review (logic/style) | sonnet | current | no | no |
| scaffolding new module | sonnet | feat | no | no |
| writing docs | sonnet | feat | no | no |
| debugging (interactive) | sonnet | current | no | no |
| large refactor | opus | worktree | no | yes |
| novel architecture | opus | feat | no | no |
| security design | opus | feat | no | no |
| adversarial code review | codex | feat | no | yes |
| bug hunt | codex | worktree | no | yes |
| security audit | codex | feat | no | yes |
Codex-Specific Failure Modes
Codex is too literal. It will fulfill the exact request and break adjacent things. Real example from HN: asked to "fix compiler warnings", it made a bunch of values nullable to silence the warnings - technically correct, broke data integrity downstream.
Codex will fail at (NEVER use it for):
- UI work of any kind - layouts, components, styling, UX, design judgment. Codex has zero taste and no eye for visual hierarchy. Always claude.
- Anything you need to discuss - "yo i need to talk about this", brainstorming, exploration, "what do you think", trade-off conversations. Codex is a one-shot executor, not a collaborator. Always claude.
- Reading user intent - "make it better" → won't make leaps, will pick the most literal interpretation
- Architecture/product judgment - no sense of tradeoffs beyond what's written
- Obscure libraries - hallucinates rather than admitting unknown
- Multi-file dependency chains - gets stuck in circles
- Continuous conversation context - context degrades fast, not built for back-and-forth
Codex will excel at:
- Tasks where being literal is a feature (test writing, mechanical refactors)
- Clearly-bounded specs with acceptance criteria
- Adversarial passes with explicit focus arguments
Prompt requirements for codex:
- One concern per prompt (don't bundle)
- Explicit scope boundaries (which files, what behavior, what done looks like)
- Structured sections: General → Autonomy → Code Implementation → Editing Constraints → Exploration → Plan Tool
- Use the positional focus argument:
/codex:adversarial-review challenge whether this caching design was right - Heavy XML structure beats prose for multi-step specs
Optimal Codex Workflow
The pattern that works (claude + codex hybrid):
- Claude builds - architecture, exploration, back-and-forth iteration
- Codex reviews - dispatch
/codex:adversarial-reviewwith a specific focus after a major change. 6-10 min wait, actionable findings - Claude filters - triage codex output: "which are real issues vs noise, which need design decisions"
- Codex fixes - confirmed bugs with clear specs get delegated back to codex
- Batch maintenance - queue codex with 3-5 P2 tasks at session start, work on real stuff while it churns
What doesn't work:
- Letting codex drive architecture decisions
- Vague task descriptions
- Auto-running codex on every claude response (drains usage fast)
Anti-patterns
- Codex for time-sensitive work - 10+ min latency kills interactive flow
- Codex with vague prompts - it will interpret literally and break things
- Opus for trivial tool calls - haiku does it at 5% cost
- Worktrees for non-overlapping single-agent work - merge overhead with no upside
- Big refactors on main - hard rollback, polluted history
- Parallel agents on same files without worktrees - concurrent write conflicts
- Dispatching codex synchronously - always background, never wait
- Forgetting
--model gpt-5.4- default selection is unreliable - Codex for UI work - zero taste, no visual judgment, will produce bricks
- Codex when you need to discuss/brainstorm - it's a one-shot executor, not a collaborator
Examples
Bulk context gathering: 5 independent file reads → 5 haiku in parallel, single message dispatch.
New feature scaffolding (small): sonnet, current branch, no subagents.
New feature scaffolding (big, multi-file): sonnet, feat/<slug> branch, optional sonnet subagents for independent modules.
Security audit before launch: codex in background on feat/audit worktree, you keep building on main.
Hard refactor across 20 files: opus in worktree, you stay on main fixing other stuff, merge when done.
"Find all references to X across the codebase": haiku, current branch, no subagent (Grep tool is faster).