CLAUDE.md

  1# Project Instructions for Claude
  2
  3## CSS
  4
  5Plain hand-written CSS, no Tailwind, no build step. Bun's HTML loader resolves
  6`<link rel="stylesheet">` and inlines `@import` chains automatically for both
  7`bun run dev` and `bun run build`.
  8
  9The CSS architecture:
 10- `public/css/main.css` - Main entry point, imports the partials and defines tokens/reset
 11- `public/css/workflow.css` - Commands section, glass terminal, case studies styles
 12- `public/css/gallery.css`, `skill-demos.css`, `problem-section.css` - section partials
 13
 14Edit any of these directly and reload — no rebuild needed.
 15
 16## Development Server
 17
 18```bash
 19bun run dev        # Bun dev server at http://localhost:3000
 20bun run preview    # Build + Cloudflare Pages local preview
 21```
 22
 23## Deployment
 24
 25Hosted on Cloudflare Pages. Static assets served from `build/`, API routes handled via `_redirects` rewrites (JSON) and Pages Functions (downloads).
 26
 27```bash
 28bun run deploy     # Build + deploy to Cloudflare Pages
 29```
 30
 31## Build System
 32
 33The build system compiles skills and commands from `source/` to provider-specific formats in `dist/`:
 34
 35```bash
 36bun run build      # Build all providers
 37bun run rebuild    # Clean and rebuild
 38```
 39
 40Source files use placeholders that get replaced per-provider:
 41- `{{model}}` - Model name (Claude, Gemini, GPT, etc.)
 42- `{{config_file}}` - Config file name (CLAUDE.md, .cursorrules, etc.)
 43- `{{ask_instruction}}` - How to ask user questions
 44
 45## Testing
 46
 47```bash
 48bun run test       # Run all tests
 49```
 50
 51Unit tests (build, detector logic) run via `bun test`. Fixture tests (jsdom-based HTML detection) run via `node --test` because bun is too slow with jsdom. The `test` script handles this split automatically.
 52
 53## CLI
 54
 55The CLI lives in this repo under `bin/` and `src/`. Published to npm as `impeccable`.
 56
 57```bash
 58npx impeccable detect [file-or-dir-or-url...]   # detect anti-patterns
 59npx impeccable detect --fast --json src/         # regex-only, JSON output
 60npx impeccable live                              # start browser overlay server
 61npx impeccable skills install                    # install skills
 62npx impeccable --help                            # show help
 63```
 64
 65The browser detector (`src/detect-antipatterns-browser.js`) is generated from the main engine. After changing `src/detect-antipatterns.mjs`, rebuild it:
 66
 67```bash
 68bun run build:browser
 69```
 70
 71**IMPORTANT**: Always use `node` (not `bun`) to run the detect CLI. Bun's jsdom implementation is extremely slow and will cause scans with HTML files to hang for minutes.
 72
 73## Versioning
 74
 75There are three independently versioned components. Only bump the one(s) that actually changed:
 76
 77**CLI** (npm package):
 78- `package.json` → `version`
 79- Bump when: CLI code changes (`bin/`, `src/detect-antipatterns.mjs`, etc.)
 80
 81**Skills** (Claude Code plugin / skill definitions):
 82- `.claude-plugin/plugin.json` → `version`
 83- `.claude-plugin/marketplace.json` → `plugins[0].version`
 84- Bump when: skill content changes (`source/skills/`, skill count changes, etc.)
 85
 86**Chrome extension**:
 87- `extension/manifest.json` → `version`
 88- Bump when: extension code changes (`extension/`)
 89
 90**Website changelog** (`public/index.html`):
 91- Hero version link text + new changelog entry
 92- Update for user-facing changes only, not internal build/tooling details
 93- Use the most prominent version that changed (e.g. skills version for skill consolidation)
 94
 95## Adding New Skills
 96
 97When adding a new user-invocable skill, update the command count in **all** of these locations:
 98
 99- `public/index.html` → meta descriptions, hero box, section lead
100- `public/cheatsheet.html` → meta description, subtitle, `commandCategories`, `commandRelationships`
101- `public/js/data.js` → `commandProcessSteps`, `commandCategories`, `commandRelationships`
102- `public/js/components/framework-viz.js` → `commandSymbols`, `commandNumbers`
103- `public/js/demos/commands/` → new demo file + import in `index.js`
104- `README.md` → intro, command count, commands table
105- `NOTICE.md` → steering commands count
106- `AGENTS.md` → intro command count
107- `.claude-plugin/plugin.json` → description
108- `.claude-plugin/marketplace.json` → metadata description + plugin description
109
110## Evals Framework (private, gitignored)
111
112There is a controlled eval framework at `evals/` that measures whether the `/impeccable` skill improves or harms AI-generated frontend design. It runs the same brief through a model with and without the skill loaded, fingerprints every generation, and aggregates the results into a bias report. The whole `evals/` directory is gitignored — it's intended to stay private (commercial).
113
114**If you're picking up eval work in a new session, read `evals/AGENT.md` first.** It captures everything we've learned: model choices, sample size policy, lessons learned, common workflows, and gotchas. Don't try to reinvent the workflow from scratch — there's significant prior context.
115
116### Quick orientation
117
118- **Primary baseline model**: `gpt-5.4` with `--reasoning-effort medium`. Frontier intelligence at ~5-10× lower cost than high reasoning. **Do NOT use `--reasoning-effort high`** unless you specifically need it — reasoning tokens count against `max_completion_tokens` and burn ~$1-2/file with no quality benefit for our use case.
119- **Secondary validation model**: `qwen/qwen3.6-plus` via OpenRouter. Cheap-ish, decent design quality, no reasoning controls.
120- **Do NOT use Haiku as a primary eval target.** It ignores most negative rules in the skill. We learned this the hard way — it sent us down many wrong paths early on.
121- **Sample size policy**: n=10 per niche for scratch iteration, **n=20 for sweep validation (the standard)**, n=50 reserved for the final published baseline. n=20 is the smallest sample where rare detector findings stabilize and A/B comparisons are statistically meaningful.
122
123### Quick commands
124
125```bash
126# Always start the local server first — the gallery/viewer can't load via file:// (CORS)
127bun run evals/runner/serve.ts
128
129# Standard workflow: generate → detect → aggregate → snapshot
130bun run evals/runner/run.ts --with-refs --model gpt-5.4 --reasoning-effort medium
131bun run evals/runner/detect.ts
132bun run evals/runner/aggregate.ts
133bun run evals/runner/snapshot.ts <slug> --title "..." --note "..."
134
135# Cheap targeted iteration (does not pollute current/)
136bun run evals/runner/run.ts --with-refs --scratch my-test \
137  --niches 06 --n 10 --condition skill-on --model qwen/qwen3.6-plus
138
139# View results in browser
140open http://localhost:8723/viewer.html
141```
142
143### Critical rules
144
145- **Always run a small smoke test (n=2-5 on one niche) before any sweep.** Rate degrades over long runs and time estimates can be off by 10-20×. We once burned 11+ hours on a sweep estimated to take 40 minutes.
146- **Background long runs.** Use `run_in_background: true` for any sweep over ~50 generations. The runner is resumable so killing and restarting is safe.
147- **Don't mix prompt versions in the same dataset.** The variant.json safety check enforces this for `current/` (must pass `--rebuild-skill-on` after a prompt edit). Scratch dirs auto-wipe on prompt change.
148- **Snapshot first, change second.** Always have a known reference point in `evals/output/snapshots/` before editing the skill, so you can compare before/after.
149- **The user is the source of truth on aesthetic quality.** The fingerprinter and detector are useful signals but do not measure "is this design good?" Have the user spot-check the gallery for any meaningful change.
150
151See `evals/AGENT.md` for the full reference: detailed model comparison table, complete lessons learned, all common workflows, and the list of gotchas.