critique.md

  1### Purpose
  2
  3Resolve one stable target, run two independent assessments, synthesize a design critique, persist a snapshot, and ask the user what to improve next. The chat response is the primary deliverable; the snapshot is an archive/backlog for future commands.
  4
  5### Hard Invariants
  6
  7- Assessment A (design review) and Assessment B (detector/browser evidence) are both required.
  8- Assessment A must finish before detector findings enter the parent synthesis context. Detector output is deterministic, but it still anchors judgment.
  9- If sub-agents are unavailable, fall back sequentially: finish and record Assessment A first, then run Assessment B, then synthesize.
 10- A skipped detector is a failed critique run unless `detect.mjs` is missing or crashes after a real attempt.
 11- Viewable targets require browser inspection when available.
 12- Any local server started only for critique visualization must run in the background, have a recorded stop method, and be stopped before final reporting unless the user asks to keep it.
 13- Do not claim a user-visible overlay exists unless script injection succeeded and the detector ran in the page.
 14
 15### Setup
 16
 171. **Resolve the target** to a concrete file path or URL. Prefer a source path over a dev-server URL when both identify the same surface; ports drift, paths do not.
 18   - "the homepage" -> `site/pages/index.astro` or `index.html`
 19   - "the settings modal" -> the primary component file
 20   - "this page" -> the current URL or source file
 212. **Compute the slug**:
 22   ```bash
 23   node {{scripts_path}}/critique-storage.mjs slug "<resolved-path-or-url>"
 24   ```
 25   Keep it. If the command exits non-zero, skip persistence and trend for this run, but continue the critique.
 263. **Read `.impeccable/critique/ignore.md`** if it exists. Drop matching findings silently; it is the only prior-run input critique consumes.
 27
 28### Assessment Orchestration
 29
 30Delegate Assessment A and Assessment B to separate sub-agents when possible. They must not see each other's output. Do not show findings to the user until synthesis.
 31
 32<codex>
 33Codex sub-agent gate:
 34- If `spawn_agent` is exposed and the user explicitly allowed sub-agents, delegation, or parallel agent work, spawn A and B immediately.
 35- If `spawn_agent` is exposed but the user did not explicitly allow sub-agents, ask exactly once: "Impeccable critique is designed to run two independent sub-agents for an unanchored assessment. May I use sub-agents for this critique?" Then stop until the user answers.
 36- If allowed, spawn A and B. If declined, run sequentially and report `Assessment independence: degraded (sub-agents declined by user)`.
 37- If `spawn_agent` is not exposed, do not ask; run sequentially and report `Assessment independence: degraded (spawn_agent unavailable in this session)`.
 38- If spawning fails after permission, run sequentially and report `Assessment independence: degraded (sub-agent spawn failed: <exact error>)`.
 39Prefer `fork_context: false` with self-contained prompts containing cwd, target, live URL, references, product context, and output contract. If using `fork_context: true`, omit `agent_type`, `model`, and `reasoning_effort`.
 40</codex>
 41
 42If browser automation is available, each assessment creates its own new tab. Never reuse an existing tab, even if it is already at the right URL.
 43
 44### Assessment A: Design Review
 45
 46Read relevant source files and visually inspect the live page when browser automation is available. Think like a design director.
 47
 48Evaluate:
 49- **AI slop**: Would someone believe "AI made this" immediately? Check all DON'T guidance from the parent Impeccable skill.
 50- **Holistic design**: hierarchy, IA, emotional fit, discoverability, composition, typography, color, accessibility, states, copy, and edge cases.
 51- **Cognitive load**: consult [cognitive-load](cognitive-load.md); report checklist failures and decision points with >4 visible options.
 52- **Emotional journey**: peak-end rule, emotional valleys, reassurance at high-stakes moments.
 53- **Nielsen heuristics**: consult [heuristics-scoring](heuristics-scoring.md); score all 10 heuristics 0-4.
 54
 55Return: AI slop verdict, heuristic scores, cognitive load, emotional journey, 2-3 strengths, 3-5 priority issues, persona red flags, minor observations, and provocative questions.
 56
 57### Assessment B: Detector + Browser Evidence
 58
 59Run the bundled detector and browser visualization evidence. Assessment B is mandatory and must remain isolated from Assessment A until both are complete.
 60
 61CLI scan:
 62```bash
 63node {{scripts_path}}/detect.mjs --json [--fast] [target]
 64```
 65
 66- Pass markup files/directories as `[target]`; do not pass CSS-only files.
 67- For URLs, skip CLI scan and use browser visualization.
 68- For 200+ scannable files, use `--fast`; for 500+, narrow scope or ask.
 69- Exit code 0 = clean; 2 = findings.
 70- If the detector entrypoint is missing or fails to load, report deterministic scan unavailable and continue with browser/manual review.
 71
 72Browser visualization is required for a viewable target when browser automation is available. Use a localhost dev/static URL for local files; avoid `file://` unless the available browser explicitly supports this workflow. Overlay flow:
 73
 741. Create a fresh tab and navigate.
 752. Preflight mutable injection by setting `document.title` and appending a `<script>` tag. Read-only evaluate APIs do not count.
 763. If mutation is unavailable, skip live server, browser presentation, and injection; report fallback signal.
 774. If mutation is available, start `node {{scripts_path}}/live-server.mjs --background`, present the browser if supported, label `[Human]`, scroll top, inject `http://localhost:PORT/detect.js`, wait 2-3 seconds, read `impeccable` console messages, then stop the live server.
 785. For multi-view targets, inject on 3-5 representative pages.
 79
 80<codex>
 81Codex Browser note: Use the Browser skill. Do not spend a Browser attempt on `file://`. Only call `visibility.set(true)` after mutable script injection is confirmed for the `[Human]` overlay path; verify with `get()`. Use `tab.dev.logs({ filter: "impeccable" })` for console results. Its Playwright `evaluate(...)` surface is read-only; do not rely on it for mutation.
 82</codex>
 83
 84Return: CLI findings JSON/counts, browser console findings if applicable, false positives, and skipped/failed browser steps with concrete reasons.
 85
 86After Assessment B returns usable CLI findings, reuse them. Do not rerun `detect.mjs` in the parent unless Assessment B failed, was truncated, or omitted count, rule names, or file locations.
 87
 88<codex>
 89Codex failure accounting: final Run Notes must include target slug, ignore list, assessment independence, CLI detector, browser visibility, overlay injection, live-server cleanup, temp-file cleanup, and any fallback signal used. Do not run repo status checks, late API spelunking, or unrelated verification after the report is assembled.
 90</codex>
 91
 92### Generate Combined Critique Report
 93
 94Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.
 95
 96The chat response is the primary user-facing deliverable. Present the full structured critique below in chat; do not replace it with a summary and a link. The persisted snapshot is only an archive/backlog for later commands.
 97
 98<codex>
 99Codex final-answer note: `$impeccable critique` produces a report artifact, so the final chat response should intentionally exceed the usual concise close-out style. Do not title the final response "Critique Summary" unless the user explicitly asked for a summary.
100</codex>
101
102Structure your feedback as a design director would:
103
104#### Design Health Score
105> *Consult [heuristics-scoring](heuristics-scoring.md)*
106
107Present the Nielsen's 10 heuristics scores as a table:
108
109| # | Heuristic | Score | Key Issue |
110|---|-----------|-------|-----------|
111| 1 | Visibility of System Status | ? | [specific finding or "n/a" if solid] |
112| 2 | Match System / Real World | ? | |
113| 3 | User Control and Freedom | ? | |
114| 4 | Consistency and Standards | ? | |
115| 5 | Error Prevention | ? | |
116| 6 | Recognition Rather Than Recall | ? | |
117| 7 | Flexibility and Efficiency | ? | |
118| 8 | Aesthetic and Minimalist Design | ? | |
119| 9 | Error Recovery | ? | |
120| 10 | Help and Documentation | ? | |
121| **Total** | | **??/40** | **[Rating band]** |
122
123Be honest with scores. A 4 means genuinely excellent. Most real interfaces score 20-32.
124
125#### Anti-Patterns Verdict
126
127**Start here.** Does this look AI-generated?
128
129**LLM assessment**: Your own evaluation of AI slop tells. Cover overall aesthetic feel, layout sameness, generic composition, missed opportunities for personality.
130
131**Deterministic scan**: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.
132
133**Visual overlays** (if injection succeeded): Tell the user that overlays are now visible in the **[Human]** tab in their browser, highlighting the detected issues. Summarize what the console output reported. If browser visualization was attempted but injection failed, say that no reliable user-visible overlay is available and report the fallback signal instead.
134
135#### Overall Impression
136A brief gut reaction: what works, what doesn't, and the single biggest opportunity.
137
138#### What's Working
139Highlight 2-3 things done well. Be specific about why they work.
140
141#### Priority Issues
142The 3-5 most impactful design problems, ordered by importance.
143
144For each issue, tag with **P0-P3 severity** (consult [heuristics-scoring](heuristics-scoring.md) for severity definitions):
145- **[P?] What**: Name the problem clearly
146- **Why it matters**: How this hurts users or undermines goals
147- **Fix**: What to do about it (be concrete)
148- **Suggested command**: Which command could address this (from: {{available_commands}})
149
150#### Persona Red Flags
151> *Consult [personas](personas.md)*
152
153Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If `{{config_file}}` contains a `## Design Context` section from `impeccable teach`, also generate 1-2 project-specific personas from the audience/brand info.
154
155For each selected persona, walk through the primary user action and list specific red flags found:
156
157**Alex (Power User)**: No keyboard shortcuts detected. Form requires 8 clicks for primary action. Forced modal onboarding. High abandonment risk.
158
159**Jordan (First-Timer)**: Icon-only nav in sidebar. Technical jargon in error messages ("404 Not Found"). No visible help. Will abandon at step 2.
160
161Be specific. Name the exact elements and interactions that fail each persona. Don't write generic persona descriptions; write what broke for them.
162
163#### Minor Observations
164Quick notes on smaller issues worth addressing.
165
166#### Questions to Consider
167Provocative questions that might unlock better solutions:
168- "What if the primary action were more prominent?"
169- "Does this need to feel this complex?"
170- "What would a confident version of this look like?"
171
172<codex>
173#### Run Notes
174Keep this compact. Include status for target slug, ignore list, assessment independence, CLI detector, browser visibility, overlay injection, live server cleanup, and temp-file cleanup. For failed or skipped steps, give the concrete observed reason and the fallback signal used. In the final chat response, also include snapshot write and trend read status after persistence has run.
175
176Codex Run Notes are final-chat only. Do not include this section in the persisted snapshot body, because persistence, trend read, and temp cleanup happen after the snapshot write and would otherwise archive stale status such as "pending after persistence."
177</codex>
178
179**Remember**:
180- Be direct. Vague feedback wastes everyone's time.
181- Be specific. "The submit button," not "some elements."
182- Say what's wrong AND why it matters to users.
183- Give concrete suggestions. Cut "consider exploring..." entirely.
184- Prioritize ruthlessly. If everything is important, nothing is.
185- Don't soften criticism. Developers need honest feedback to ship great design.
186
187### Persist the Snapshot
188
189Once the report above is finalized, write it to `.impeccable/critique/` so the user can refer back, and so `{{command_prefix}}impeccable polish` can pick up the priority issues without a copy-paste.
190
191Skip this step if the Setup slug was null (vague or root-level target).
192
1931. **Write the body to a temp file** so you can pipe it to the helper. Use the full critique report (heuristic table, anti-patterns verdict, priority issues, persona red flags, minor observations, and questions), but stop before the "Ask the User" / "Recommended Actions" sections that come later.
194
195   <codex>
196   Codex: exclude Run Notes from the temp body file; Run Notes are final-chat only because persistence, trend read, and temp cleanup happen after the snapshot write.
197   </codex>
198
1992. **Pass the structured metadata** through `IMPECCABLE_CRITIQUE_META` (JSON), then run the write command:
200   ```bash
201   IMPECCABLE_CRITIQUE_META='{"target":"<user phrasing>","total_score":<n>,"p0_count":<n>,"p1_count":<n>}' \
202     node {{scripts_path}}/critique-storage.mjs write <slug> <body-file>
203   ```
204   The helper prints the absolute path it wrote.
205
2063. **Delete the temp body file** after the write attempt completes, whether the write succeeded or failed. If deletion fails, mention `temp-file cleanup failed: <reason>` briefly in the final output, but do not block the critique.
207
2084. **Read the trend** for context:
209   ```bash
210   node {{scripts_path}}/critique-storage.mjs trend <slug> 5
211   ```
212   This returns a JSON array of the last 5 frontmatter entries (including the one you just wrote).
213
2145. **Append a single line to the user-visible output**, after the report and before the questions:
215
216   > **Trend for `<slug>` (last 5 runs): 24 → 28 → 32 → 29 → 32**
217   > Wrote `.impeccable/critique/<filename>`.
218
219   If this is the first run for the slug, the trend is just one score; say so: "First run for this target, no trend yet."
220
221This is fire-and-forget. Do not show the user the helper's JSON output; only the human-readable trend line and the written path. Failures here should not block the rest of the flow; print the error and move on.
222
223### Ask the User
224
225**After presenting findings**, use targeted questions based on what was actually found. {{ask_instruction}} These answers will shape the action plan.
226
227Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):
228
2291. **Priority direction**: Based on the issues found, ask which category matters most to the user right now. For example: "I found problems with visual hierarchy, color usage, and information overload. Which area should we tackle first?" Offer the top 2-3 issue categories as options.
230
2312. **Design intent**: If the critique found a tonal mismatch, ask whether it was intentional. For example: "The interface feels clinical and corporate. Is that the intended tone, or should it feel warmer/bolder/more playful?" Offer 2-3 tonal directions as options based on what would fix the issues found.
232
2333. **Scope**: Ask how much the user wants to take on. For example: "I found N issues. Want to address everything, or focus on the top 3?" Offer scope options like "Top 3 only", "All issues", "Critical issues only".
234
2354. **Constraints** (optional; only ask if relevant): If the findings touch many areas, ask if anything is off-limits. For example: "Should any sections stay as-is?" This prevents the plan from touching things the user considers done.
236
237**Rules for questions**:
238- Every question must reference specific findings from the report. Never ask generic "who is your audience?" questions.
239- Keep it to 2-4 questions maximum. Respect the user's time.
240- Offer concrete options, not open-ended prompts.
241- If findings are straightforward (e.g., only 1-2 clear issues), skip questions and go directly to Recommended Actions.
242
243<codex>
244Codex final-question gate: The user-visible response must either include the targeted questions or explicitly say `Questions skipped: <reason>` because the findings were straightforward. Each question must include 2-3 concrete answer options tied to the actual critique findings. Do not end with only open-ended questions.
245</codex>
246
247### Recommended Actions
248
249**After receiving the user's answers**, present a prioritized action summary reflecting the user's priorities and scope from Ask the User.
250
251#### Action Summary
252
253List recommended commands in priority order, based on the user's answers:
254
2551. **`{{command_prefix}}command-name`**: Brief description of what to fix (specific context from critique findings)
2562. **`{{command_prefix}}command-name`**: Brief description (specific context)
257...
258
259**Rules for recommendations**:
260- Only recommend commands from: {{available_commands}}
261- Order by the user's stated priorities first, then by impact
262- Each item's description should carry enough context that the command knows what to focus on
263- Map each Priority Issue to the appropriate command
264- Skip commands that would address zero issues
265- If the user chose a limited scope, only include items within that scope
266- If the user marked areas as off-limits, exclude commands that would touch those areas
267- End with `{{command_prefix}}impeccable polish` as the final step if any fixes were recommended
268
269After presenting the summary, tell the user:
270
271> You can ask me to run these one at a time, all at once, or in any order you prefer.
272>
273> Re-run `{{command_prefix}}impeccable critique` after fixes to see your score improve.