1# Project Instructions for Claude
2
3## CSS
4
5Plain hand-written CSS, no Tailwind, no build step. Bun's HTML loader resolves
6`<link rel="stylesheet">` and inlines `@import` chains automatically for both
7`bun run dev` and `bun run build`.
8
9The CSS architecture:
10- `public/css/main.css` - Main entry point, imports the partials and defines tokens/reset
11- `public/css/workflow.css` - Commands section, glass terminal, case studies styles
12- `public/css/gallery.css`, `skill-demos.css`, `problem-section.css` - section partials
13
14Edit any of these directly and reload — no rebuild needed.
15
16## Development Server
17
18```bash
19bun run dev # Bun dev server at http://localhost:3000
20bun run preview # Build + Cloudflare Pages local preview
21```
22
23## Deployment
24
25Hosted on Cloudflare Pages. Static assets served from `build/`, API routes handled via `_redirects` rewrites (JSON) and Pages Functions (downloads).
26
27```bash
28bun run deploy # Build + deploy to Cloudflare Pages
29```
30
31## Build System
32
33The build system compiles skills and commands from `source/` to provider-specific formats in `dist/`:
34
35```bash
36bun run build # Build all providers
37bun run rebuild # Clean and rebuild
38```
39
40Source files use placeholders that get replaced per-provider:
41- `{{model}}` - Model name (Claude, Gemini, GPT, etc.)
42- `{{config_file}}` - Config file name (CLAUDE.md, .cursorrules, etc.)
43- `{{ask_instruction}}` - How to ask user questions
44
45## Testing
46
47```bash
48bun run test # Run all tests
49```
50
51Unit tests (build, detector logic) run via `bun test`. Fixture tests (jsdom-based HTML detection) run via `node --test` because bun is too slow with jsdom. The `test` script handles this split automatically.
52
53## CLI
54
55The CLI lives in this repo under `bin/` and `src/`. Published to npm as `impeccable`.
56
57```bash
58npx impeccable detect [file-or-dir-or-url...] # detect anti-patterns
59npx impeccable detect --fast --json src/ # regex-only, JSON output
60npx impeccable live # start browser overlay server
61npx impeccable skills install # install skills
62npx impeccable --help # show help
63```
64
65The browser detector (`src/detect-antipatterns-browser.js`) is generated from the main engine. After changing `src/detect-antipatterns.mjs`, rebuild it:
66
67```bash
68bun run build:browser
69```
70
71**IMPORTANT**: Always use `node` (not `bun`) to run the detect CLI. Bun's jsdom implementation is extremely slow and will cause scans with HTML files to hang for minutes.
72
73## Versioning
74
75There are three independently versioned components. Only bump the one(s) that actually changed:
76
77**CLI** (npm package):
78- `package.json` → `version`
79- Bump when: CLI code changes (`bin/`, `src/detect-antipatterns.mjs`, etc.)
80
81**Skills** (Claude Code plugin / skill definitions):
82- `.claude-plugin/plugin.json` → `version`
83- `.claude-plugin/marketplace.json` → `plugins[0].version`
84- Bump when: skill content changes (`source/skills/`, skill count changes, etc.)
85
86**Chrome extension**:
87- `extension/manifest.json` → `version`
88- Bump when: extension code changes (`extension/`)
89
90**Website changelog** (`public/index.html`):
91- Hero version link text + new changelog entry
92- Update for user-facing changes only, not internal build/tooling details
93- Use the most prominent version that changed (e.g. skills version for skill consolidation)
94
95## Adding New Skills
96
97When adding a new user-invocable skill, update the command count in **all** of these locations:
98
99- `public/index.html` → meta descriptions, hero box, section lead
100- `public/cheatsheet.html` → meta description, subtitle, `commandCategories`, `commandRelationships`
101- `public/js/data.js` → `commandProcessSteps`, `commandCategories`, `commandRelationships`
102- `public/js/components/framework-viz.js` → `commandSymbols`, `commandNumbers`
103- `public/js/demos/commands/` → new demo file + import in `index.js`
104- `README.md` → intro, command count, commands table
105- `NOTICE.md` → steering commands count
106- `AGENTS.md` → intro command count
107- `.claude-plugin/plugin.json` → description
108- `.claude-plugin/marketplace.json` → metadata description + plugin description
109
110## Evals Framework (private, gitignored)
111
112There is a controlled eval framework at `evals/` that measures whether the `/impeccable` skill improves or harms AI-generated frontend design. It runs the same brief through a model with and without the skill loaded, fingerprints every generation, and aggregates the results into a bias report. The whole `evals/` directory is gitignored — it's intended to stay private (commercial).
113
114**If you're picking up eval work in a new session, read `evals/AGENT.md` first.** It captures everything we've learned: model choices, sample size policy, lessons learned, common workflows, and gotchas. Don't try to reinvent the workflow from scratch — there's significant prior context.
115
116### Quick orientation
117
118- **Primary baseline model**: `gpt-5.4` with `--reasoning-effort medium`. Frontier intelligence at ~5-10× lower cost than high reasoning. **Do NOT use `--reasoning-effort high`** unless you specifically need it — reasoning tokens count against `max_completion_tokens` and burn ~$1-2/file with no quality benefit for our use case.
119- **Secondary validation model**: `qwen/qwen3.6-plus` via OpenRouter. Cheap-ish, decent design quality, no reasoning controls.
120- **Do NOT use Haiku as a primary eval target.** It ignores most negative rules in the skill. We learned this the hard way — it sent us down many wrong paths early on.
121- **Sample size policy**: n=10 per niche for scratch iteration, **n=20 for sweep validation (the standard)**, n=50 reserved for the final published baseline. n=20 is the smallest sample where rare detector findings stabilize and A/B comparisons are statistically meaningful.
122
123### Quick commands
124
125```bash
126# Always start the local server first — the gallery/viewer can't load via file:// (CORS)
127bun run evals/runner/serve.ts
128
129# Standard workflow: generate → detect → aggregate → snapshot
130bun run evals/runner/run.ts --with-refs --model gpt-5.4 --reasoning-effort medium
131bun run evals/runner/detect.ts
132bun run evals/runner/aggregate.ts
133bun run evals/runner/snapshot.ts <slug> --title "..." --note "..."
134
135# Cheap targeted iteration (does not pollute current/)
136bun run evals/runner/run.ts --with-refs --scratch my-test \
137 --niches 06 --n 10 --condition skill-on --model qwen/qwen3.6-plus
138
139# View results in browser
140open http://localhost:8723/viewer.html
141```
142
143### Critical rules
144
145- **Always run a small smoke test (n=2-5 on one niche) before any sweep.** Rate degrades over long runs and time estimates can be off by 10-20×. We once burned 11+ hours on a sweep estimated to take 40 minutes.
146- **Background long runs.** Use `run_in_background: true` for any sweep over ~50 generations. The runner is resumable so killing and restarting is safe.
147- **Don't mix prompt versions in the same dataset.** The variant.json safety check enforces this for `current/` (must pass `--rebuild-skill-on` after a prompt edit). Scratch dirs auto-wipe on prompt change.
148- **Snapshot first, change second.** Always have a known reference point in `evals/output/snapshots/` before editing the skill, so you can compare before/after.
149- **The user is the source of truth on aesthetic quality.** The fingerprinter and detector are useful signals but do not measure "is this design good?" Have the user spot-check the gallery for any meaningful change.
150
151See `evals/AGENT.md` for the full reference: detailed model comparison table, complete lessons learned, all common workflows, and the list of gotchas.