From bad0d43f2470be7067f6b534d3c120715a4e8c4f Mon Sep 17 00:00:00 2001
From: Christian Rocha <christian@rocha.is>
Date: Mon, 27 Apr 2026 22:29:56 -0400
Subject: [PATCH] docs(config): document api key validation methods

---
 AGENTS.md                     |   9 +
 internal/config/VALIDATION.md | 304 ++++++++++++++++++++++++++++++++++
 internal/config/config.go     |   7 +
 3 files changed, 320 insertions(+)
 create mode 100644 internal/config/VALIDATION.md

diff --git a/AGENTS.md b/AGENTS.md
index f25de08542fbc0626a117645cd9f1c2f066868b6..b3ced6746b6477ca9e54dac4a1a57e540dacce0c 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -81,6 +81,15 @@ internal/
   `hookedTool` decorator in `internal/agent/hooked_tool.go` wraps tools at
   the coordinator level. Hooks run before permission checks. See
   `HOOKS.md` for the user-facing protocol.
+- **API key validation**: `ProviderConfig.TestConnection` in
+  `internal/config/config.go` proves authentication per-provider rather
+  than hitting `/models` everywhere (many gateways expose a public
+  `/models`, which proves nothing). Three outcomes: validated, invalid,
+  or `ErrValidationUnsupported` — the last is rendered by the UI as
+  "saved (not verified)" and is a first-class result, not a failure.
+  Any PR that adds or changes a provider probe, classifier, or the
+  openai-compat allowlist must update `internal/config/VALIDATION.md`
+  and the audit table in `config_validate_test.go` in the same commit.
 - **CGO disabled**: builds with `CGO_ENABLED=0` and
   `GOEXPERIMENT=greenteagc`.
 
diff --git a/internal/config/VALIDATION.md b/internal/config/VALIDATION.md
new file mode 100644
index 0000000000000000000000000000000000000000..a594f4942de94d7c6cefbdcf74b7774a22364d81
--- /dev/null
+++ b/internal/config/VALIDATION.md
@@ -0,0 +1,304 @@
+# API Key Validation
+
+This document describes how `ProviderConfig.TestConnection` proves that a user's
+API key authenticates against a given provider, and the conventions for adding
+or changing a provider's validation behavior.
+
+If you are touching `buildValidationProbe`, any `classify*` function, the
+`openaiCompatModelsAllowlist`, or `APIKeyInputState*` in the API key dialog,
+**you must also update this document in the same commit.**
+
+---
+
+## The problem this layer solves
+
+When a user enters an API key in the onboarding dialog, Crush tells them
+whether the key is valid before saving it. The obvious implementation — make
+an HTTP request to the provider and see if it succeeds — turns out to be
+wrong in a specific and silent way.
+
+Historically, Crush validated keys by calling `GET /models` and treating a
+`200 OK` response as proof of authentication. This works for the native
+OpenAI and Anthropic APIs, where `/models` is auth-gated. It does **not**
+work for most OpenAI-compatible gateways, because `/models` on those
+services is deliberately public: it powers SDK catalogs, docs sites, and
+model-picker UIs that render without requiring a signup. A public `/models`
+endpoint returns `200` to every caller — valid key, invalid key, no key at
+all — so the response proves nothing about the caller's credentials.
+
+The consequence was that for roughly ten providers (AiHubMix, Avian,
+Cortecs, HuggingFace Router, io.net, OpenCode Go, OpenCode Zen, QiniuCloud,
+Synthetic, Venice), Crush's "Key validated" message didn't actually reflect
+whether the key would work. Any string the user typed — a typo, a
+copy-paste from the wrong field, the literal word `test` — was reported as
+a valid key. The failure surfaced later, when the user actually tried to
+run the model and got an authentication error from the provider. A
+separate bug made MiniMax and MiniMax China return "validated"
+unconditionally, regardless of endpoint behavior.
+
+The fix is to stop assuming `/models` proves authentication. For each
+provider we either:
+
+1. Find a different endpoint that actually gates on auth (typically
+   account-scoped data like rate limits or credits).
+2. Use the auth-gated `/chat/completions` endpoint with a deliberately
+   malformed body, so we can tell auth-failure apart from schema-failure
+   without running inference.
+3. Admit we cannot reliably verify the key and say so in the UI
+   ("saved, not verified") rather than faking success.
+
+This is per-provider policy by necessity, because providers disagree about
+which endpoints authenticate, which status codes they return for bad keys,
+and whether their gateways authenticate before or after validating the
+request body. Managing that per-provider-ness — without silently
+regressing back into "assume `/models` proves auth" — is what the rest of
+this document exists to do.
+
+---
+
+## Contract
+
+`TestConnection` returns one of three things:
+
+| Return value               | Meaning                                                 | UI state                                             |
+| -------------------------- | ------------------------------------------------------- | ---------------------------------------------------- |
+| `nil`                      | Authentication proven.                                  | `APIKeyInputStateVerified` ("validated")             |
+| `ErrValidationUnsupported` | No deterministic probe exists. Key saved, not verified. | `APIKeyInputStateUnverified` ("saved, not verified") |
+| Any other non-nil `error`  | Probe ran and the server rejected the key.              | `APIKeyInputStateError` ("invalid")                  |
+
+"Saved, not verified" is a **first-class outcome**, not a failure. It is
+strictly preferable to a false-positive "validated" — the original bug that
+motivated this whole machinery was the system telling users bad keys were valid
+because `/models` returned `200` to any caller.
+
+---
+
+## Why not just hit `/models`?
+
+Many OpenAI-compatible gateways intentionally expose a public `/models` endpoint
+so SDKs, docs sites, and model pickers can render a catalog without requiring
+signup. On those providers, `GET /models` returns `200 OK` regardless of the key
+— so `200` proves nothing about authentication.
+
+The fix is to pick a probe that the server's response **depends on the key**.
+That means either:
+
+1. Hit an endpoint the provider actually gates on auth (typically account-scoped
+   data like rate limits or credits).
+2. Hit an auth-gated endpoint (`/chat/completions`) with an intentionally broken
+   payload, so the server authenticates the caller before rejecting the body —
+   without actually running inference.
+3. If neither is available, return `ErrValidationUnsupported` and let the UI
+   fall back to "saved, not verified."
+
+---
+
+## Classifiers
+
+The four `classify*` functions in `config.go` cover every probe currently in
+use. Keep this set small — prefer "use an existing classifier" over "add a new
+one."
+
+| Classifier                    | Valid → `nil`                             | Invalid → error     | Anything else              |
+| ----------------------------- | ----------------------------------------- | ------------------- | -------------------------- |
+| `classifyAuthGated`           | `200`                                     | `401`, `403`        | `ErrValidationUnsupported` |
+| `classifyOpenAIChatMalformed` | `400`, `422` (auth passed, body rejected) | `401`, `403`        | `ErrValidationUnsupported` |
+| `classifyGoogleModels`        | `200`                                     | `400`, `401`, `403` | `ErrValidationUnsupported` |
+| `classifyZAIModels`           | anything except `401`                     | `401` only          | (no unsupported bucket)    |
+
+Transient statuses (`5xx`, `429`, `402`, unexpected `200` on the chat probe)
+collapse into `ErrValidationUnsupported` on purpose — a flaky gateway should not
+surface as "your key is bad."
+
+---
+
+## Probes by provider
+
+### Auth-gated endpoint (GET)
+
+| Provider ID                | Endpoint                             | Auth                              | Classifier             |
+| -------------------------- | ------------------------------------ | --------------------------------- | ---------------------- |
+| `openai`                   | `GET {base}/models`                  | `Authorization: Bearer`           | `classifyAuthGated`    |
+| `openrouter`               | `GET {base}/credits`                 | `Authorization: Bearer`           | `classifyAuthGated`    |
+| `anthropic`                | `GET {base}/models`                  | `x-api-key` + `anthropic-version` | `classifyAuthGated`    |
+| `kimi-coding`              | `GET {base}/v1/models`               | `x-api-key` + `anthropic-version` | `classifyAuthGated`    |
+| `gemini`                   | `GET {base}/v1beta/models?key=<key>` | key in query                      | `classifyGoogleModels` |
+| `venice`                   | `GET {base}/api_keys/rate_limits`    | `Authorization: Bearer`           | `classifyAuthGated`    |
+| `minimax`, `minimax-china` | `GET {base}/v1/models`               | `x-api-key` + `anthropic-version` | `classifyAuthGated`    |
+
+### OpenAI-compat `/models` allowlist
+
+For `openai-compat` providers only. Entry in this allowlist means the provider's
+`/models` endpoint has been **empirically confirmed** to return `401` on a bad
+key:
+
+| Provider ID                                                                         | Probe                                                                                                           |
+| ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
+| `deepseek`, `groq`, `xai`, `zhipu`, `zhipu-coding`, `cerebras`, `nebius`, `copilot` | `GET {base}/models` + `Authorization: Bearer`, `classifyAuthGated`                                              |
+| `zai`                                                                               | Same probe, but uses `classifyZAIModels` (only `401` is invalid; valid keys return assorted non-`200` statuses) |
+
+### Malformed-body chat probe
+
+Used when the provider's `/models` is public but `/chat/completions` is
+auth-gated. Sends `{"__crush_probe__": true}` (missing required fields) so the
+gateway authenticates the caller before rejecting the schema. No tokens
+consumed.
+
+| Provider IDs                                                                                                     |
+| ---------------------------------------------------------------------------------------------------------------- |
+| `aihubmix`, `avian`, `cortecs`, `huggingface`, `ionet`, `opencode-go`, `opencode-zen`, `qiniucloud`, `synthetic` |
+
+All use `POST {base}/chat/completions` + `Authorization: Bearer` +
+`classifyOpenAIChatMalformed`.
+
+### Prefix check (no network)
+
+| Provider ID | Rule                                                                                                                                        |
+| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| `bedrock`   | Key must start with `ABSK`. Weak signal — Bedrock's `/foundation-models` endpoint is region-specific, so we fall back to format validation. |
+| `vercel`    | Key must start with `vck_`. Vercel's `/models` does not gate on auth.                                                                       |
+
+### Explicitly unverified
+
+| Provider ID                          | Why                                                                                                                                  |
+| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
+| `chutes`                             | Observed ambiguous response (`429`) on unauthenticated probe path; classifier cannot reliably distinguish bad-key from rate-limited. |
+| `neuralwatt`                         | Observed classifier ambiguity on malformed-body probe.                                                                               |
+| Any unknown `openai-compat` provider | Default fallback. We don't assume `/models` is auth-gated for providers we haven't tested.                                           |
+
+---
+
+## Adding a new provider
+
+Follow this checklist in order:
+
+### 1. Identify the provider's `Type`
+
+Check the `type` field in the catwalk provider definition:
+
+- `openai`, `anthropic`, `google`, `openrouter`, `bedrock`, `vercel`, `azure`,
+  `vertexai` — type-based default in `buildValidationProbe` already covers you.
+  Stop here unless the provider has quirks.
+- `openai-compat` — keep going.
+
+### 2. Test the provider's `/models` endpoint with a bad key
+
+```sh
+curl -i -H "Authorization: Bearer definitely-not-a-real-key" \
+  https://<provider>/v1/models
+```
+
+| Response                    | Action                                                                                              |
+| --------------------------- | --------------------------------------------------------------------------------------------------- |
+| `401` or `403`              | `/models` is auth-gated. Add the provider ID to `openaiCompatModelsAllowlist` in `config.go`. Done. |
+| `200`                       | `/models` is public. Go to step 3.                                                                  |
+| `429`, `5xx`, anything else | Ambiguous. Mark unsupported (step 5).                                                               |
+
+### 3. Test `/chat/completions` with a bad key and malformed body
+
+```sh
+curl -i -X POST \
+  -H "Authorization: Bearer definitely-not-a-real-key" \
+  -H "Content-Type: application/json" \
+  -d '{"__crush_probe__":true}' \
+  https://<provider>/v1/chat/completions
+```
+
+| Response       | Action                                                                                                 |
+| -------------- | ------------------------------------------------------------------------------------------------------ |
+| `401` or `403` | Auth gate works. Add the provider ID to the chat-probe `case` in `buildValidationProbe`. Go to step 4. |
+| `400` or `422` | Gateway validates schema before auth. `/chat/completions` is not usable as a probe. Go to step 5.      |
+| Anything else  | Mark unsupported (step 5).                                                                             |
+
+### 4. Confirm the good-key behavior
+
+Using a real key, repeat the `/chat/completions` probe with the malformed body.
+Expect `400` or `422`. If you get `200`, the gateway is not validating the body
+and the probe cannot distinguish valid from invalid keys — go to step 5 instead.
+
+### 5. Mark the provider unsupported
+
+Add the provider ID to the
+`InferenceProviderChutes, InferenceProviderNeuralwatt` case in
+`buildValidationProbe` (or extend it). The UI will show "saved (not verified)"
+and the user can still use the provider.
+
+### 6. Update this document and the tests
+
+- Add the provider to the appropriate table in the "Probes by provider" section
+  above.
+- Add a test case to `TestTestConnectionOpenAICompatProviderAudit` in
+  `config_validate_test.go` (for probe-based providers).
+- Add the provider to the appropriate list in
+  `TestTestConnectionPublicModelsAuthGatedChatRegression` (for chat-probe
+  providers) or `TestTestConnectionOpenAICompatAllowlistUsesModelsProbe` (for
+  allowlist providers).
+
+---
+
+## Decision log
+
+### Why allowlist, not default-allow, for `openai-compat` `/models`?
+
+The regression that motivated this document was exactly "default-allow." Commit
+`7d14abb9` (2025-10-20) expanded the `/models` check from `TypeOpenAI` to all
+`openai-compat` providers, silently turning every gateway with a public model
+catalog into a false-positive validator. A default-deny allowlist makes new
+providers opt in explicitly — the cost is one line per provider; the benefit is
+no more silent regressions of this shape.
+
+### Why is ZAI special?
+
+ZAI's `/models` endpoint is authoritative about bad keys (always `401`) but
+noisy about valid keys (returns assorted non-`200` statuses, seemingly depending
+on backend state). Folding it into `classifyAuthGated` would regress valid-key
+detection, since everything except `200` would be classified as
+`ErrValidationUnsupported`. The ZAI classifier treats `401` as invalid and
+everything else as valid.
+
+This is fragile — if ZAI ever changes their endpoint so `401` is no longer
+specific to auth, the classifier becomes wrong. It is documented here so the
+next person to touch it knows why it exists.
+
+### Why is MiniMax in the provider-ID switch and not the `TypeAnthropic` branch?
+
+MiniMax's base URL is `https://api.minimax.io/anthropic`, which means
+`{base}/models` resolves to `/anthropic/models` — a 404. The correct endpoint is
+`{base}/v1/models`. The `TypeAnthropic` default assumes the base URL already
+ends in `/v1`, which holds for native Anthropic but not for MiniMax. Rather than
+special-case the type branch, MiniMax gets its own explicit probe entry.
+
+A previous bug (commit `cce8edf9`, 2026-04-23) removed MiniMax's validation
+entirely by returning `nil` unconditionally. Don't reintroduce that.
+
+### Why does Google need its own classifier?
+
+Google returns `400 INVALID_ARGUMENT` for unknown API keys, not `401`. If
+`classifyAuthGated` were used, bad Google keys would produce `400` →
+`ErrValidationUnsupported` → "saved, not verified" — downgrading a real auth
+failure into a soft warning. `classifyGoogleModels` adds `400` to the invalid
+bucket specifically for this case.
+
+### Why not push all this into catwalk?
+
+Probe metadata (method, path, classifier kind) is provider-identity data and
+arguably belongs alongside the rest of the catwalk provider definition. The
+classifier functions and UI mapping are crush-specific behavior and should stay
+here.
+
+This refactor hasn't been done yet because the set of providers and classifiers
+is still small enough that the indirection cost (catwalk schema change, version
+bump, fallback for older catwalks) outweighs the benefit. If the allowlist or
+classifier set grows significantly, revisit this decision.
+
+---
+
+## Regression history
+
+| Date       | Commit     | Effect                                                                                                                                                              |
+| ---------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| 2025-10-20 | `7d14abb9` | Expanded `TestConnection` from `openai` to `openai-compat` with generic `/models` probe. Silently false-validated bad keys for ~10 providers with public `/models`. |
+| 2026-04-23 | `cce8edf9` | Removed MiniMax's format-prefix guard, causing `TestConnection` to return `nil` unconditionally for MiniMax / MiniMax China.                                        |
+| (current)  | (this PR)  | Replaced generic `/models` with per-provider probes, added `ErrValidationUnsupported` sentinel, wired UI "saved (not verified)" state.                              |
+
+Any future regression in this layer should be recorded here.
diff --git a/internal/config/config.go b/internal/config/config.go
index bd08b5b56ead98ab8b0ce56145ef7b65490950d1..8195a54f7804cfdbfd247aca751e7f14122af3de 100644
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -573,6 +573,13 @@ func (c *Config) SetupAgents() {
 	c.Agents = agents
 }
 
+// API key validation lives between this block and [ProviderConfig.TestConnection]
+// below. See internal/config/VALIDATION.md for the full contract, the
+// per-provider probe table, the classifier inventory, and the checklist for
+// adding or changing a provider's validation behavior. Any change to
+// [buildValidationProbe], the classify* functions, or
+// [openaiCompatModelsAllowlist] must be reflected in that document.
+
 // ErrValidationUnsupported is returned from [ProviderConfig.TestConnection]
 // when the provider does not expose a deterministic endpoint that proves API
 // key authentication without performing inference. Callers should treat this