language_model_provider_extensions_plan.md

   1# Language Model Provider Extensions Plan
   2
   3## Executive Summary
   4
   5This document outlines a comprehensive plan to introduce **Language Model Provider Extensions** to Zed. This feature will allow third-party developers to create extensions that register new language model providers, enabling users to select and use custom language models in Zed's AI features (Agent, inline assist, commit message generation, etc.).
   6
   7## Table of Contents
   8
   91. [Current Architecture Overview](#current-architecture-overview)
  102. [Goals and Requirements](#goals-and-requirements)
  113. [Proposed Architecture](#proposed-architecture)
  124. [Implementation Phases](#implementation-phases)
  135. [WIT Interface Design](#wit-interface-design)
  146. [Extension Manifest Changes](#extension-manifest-changes)
  157. [Migration Plan for Built-in Providers](#migration-plan-for-built-in-providers)
  168. [Testing Strategy](#testing-strategy)
  179. [Security Considerations](#security-considerations)
  1810. [Appendix: Provider-Specific Requirements](#appendix-provider-specific-requirements)
  19
  20---
  21
  22## Current Architecture Overview
  23
  24### Key Components
  25
  26#### `language_model` crate (`crates/language_model/`)
  27- **`LanguageModel` trait** (`src/language_model.rs:580-718`): Core trait defining model capabilities
  28  - `id()`, `name()`, `provider_id()`, `provider_name()`
  29  - `supports_images()`, `supports_tools()`, `supports_tool_choice()`
  30  - `max_token_count()`, `max_output_tokens()`
  31  - `count_tokens()` - async token counting
  32  - `stream_completion()` - the main completion streaming method
  33  - `cache_configuration()` - optional prompt caching config
  34
  35- **`LanguageModelProvider` trait** (`src/language_model.rs:743-764`): Provider registration
  36  - `id()`, `name()`, `icon()`
  37  - `default_model()`, `default_fast_model()`
  38  - `provided_models()`, `recommended_models()`
  39  - `is_authenticated()`, `authenticate()`
  40  - `configuration_view()` - UI for provider configuration
  41  - `reset_credentials()`
  42
  43- **`LanguageModelRegistry`** (`src/registry.rs`): Global registry for providers
  44  - `register_provider()` / `unregister_provider()`
  45  - Model selection and configuration
  46  - Event emission for UI updates
  47
  48#### `language_models` crate (`crates/language_models/`)
  49Contains all built-in provider implementations:
  50- `provider/anthropic.rs` - Anthropic Claude models
  51- `provider/cloud.rs` - Zed Cloud (proxied models)
  52- `provider/google.rs` - Google Gemini models
  53- `provider/open_ai.rs` - OpenAI GPT models
  54- `provider/ollama.rs` - Local Ollama models
  55- `provider/deepseek.rs` - DeepSeek models
  56- `provider/open_router.rs` - OpenRouter aggregator
  57- `provider/bedrock.rs` - AWS Bedrock
  58- And more...
  59
  60#### Extension System (`crates/extension_host/`, `crates/extension_api/`)
  61- **WIT interface** (`extension_api/wit/since_v0.6.0/`): WebAssembly Interface Types definitions
  62- **WASM host** (`extension_host/src/wasm_host.rs`): Executes extension WASM modules
  63- **Extension trait** (`extension/src/extension.rs`): Rust trait for extensions
  64- **HTTP client** (`extension_api/src/http_client.rs`): Existing HTTP capability for extensions
  65
  66### Request/Response Flow
  67
  68```
  69User Request
  70    ↓
  71LanguageModelRequest (crates/language_model/src/request.rs)
  72    ↓
  73Provider-specific conversion (e.g., into_anthropic(), into_open_ai())
  74    ↓
  75HTTP API call (provider-specific crate)
  76    ↓
  77Stream of provider-specific events
  78    ↓
  79Event mapping to LanguageModelCompletionEvent
  80    ↓
  81Consumer (Agent, Inline Assist, etc.)
  82```
  83
  84### Key Data Structures
  85
  86```rust
  87// Request
  88pub struct LanguageModelRequest {
  89    pub thread_id: Option<String>,
  90    pub prompt_id: Option<String>,
  91    pub intent: Option<CompletionIntent>,
  92    pub mode: Option<CompletionMode>,
  93    pub messages: Vec<LanguageModelRequestMessage>,
  94    pub tools: Vec<LanguageModelRequestTool>,
  95    pub tool_choice: Option<LanguageModelToolChoice>,
  96    pub stop: Vec<String>,
  97    pub temperature: Option<f32>,
  98    pub thinking_allowed: bool,
  99}
 100
 101// Completion Events
 102pub enum LanguageModelCompletionEvent {
 103    Queued { position: usize },
 104    Started,
 105    UsageUpdated { amount: usize, limit: usize },
 106    ToolUseLimitReached,
 107    Stop(StopReason),
 108    Text(String),
 109    Thinking { text: String, signature: Option<String> },
 110    RedactedThinking { data: String },
 111    ToolUse(LanguageModelToolUse),
 112    ToolUseJsonParseError { ... },
 113    StartMessage { message_id: Option<String> },
 114    ReasoningDetails(serde_json::Value),
 115    UsageUpdate(TokenUsage),
 116}
 117```
 118
 119---
 120
 121## Goals and Requirements
 122
 123### Primary Goals
 124
 1251. **Extensibility**: Allow any developer to add new LLM providers via extensions
 1262. **Parity**: Extension-based providers should have feature parity with built-in providers
 1273. **Performance**: Minimize overhead from WASM boundary crossings during streaming
 1284. **Security**: Sandbox API key handling and network access appropriately
 1295. **User Experience**: Seamless integration with existing model selectors and configuration UI
 130
 131### Functional Requirements
 132
 1331. Extensions can register one or more language model providers
 1342. Extensions can define multiple models per provider
 1353. Extensions handle authentication (API keys, OAuth, etc.)
 1364. Extensions implement the streaming completion API
 1375. Extensions can specify model capabilities (tools, images, thinking, etc.)
 1386. Extensions can provide token counting logic
 1397. Extensions can provide configuration UI components
 1408. Extensions receive full request context for API customization
 141
 142### Non-Functional Requirements
 143
 1441. Streaming should feel as responsive as built-in providers
 1452. Extension crashes should not crash Zed
 1463. API keys should never be logged or exposed
 1474. Extensions should be able to make arbitrary HTTP requests
 1485. Settings should persist across sessions
 149
 150---
 151
 152## Proposed Architecture
 153
 154### High-Level Design
 155
 156```
 157┌─────────────────────────────────────────────────────────────────┐
 158│                         Zed Application                          │
 159├─────────────────────────────────────────────────────────────────┤
 160│  ┌─────────────────────────────────────────────────────────────┐│
 161│  │                  LanguageModelRegistry                       ││
 162│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  ││
 163│  │  │ Built-in     │  │ Extension    │  │ Extension        │  ││
 164│  │  │ Providers    │  │ Provider A   │  │ Provider B       │  ││
 165│  │  │ (Anthropic,  │  │ (WASM)       │  │ (WASM)           │  ││
 166│  │  │  OpenAI...)  │  │              │  │                  │  ││
 167│  │  └──────────────┘  └──────────────┘  └──────────────────┘  ││
 168│  └─────────────────────────────────────────────────────────────┘│
 169│                              ↑                                   │
 170│                              │                                   │
 171│  ┌───────────────────────────┴─────────────────────────────────┐│
 172│  │              ExtensionLanguageModelProvider                  ││
 173│  │  ┌─────────────────────────────────────────────────────────┐││
 174│  │  │ • Bridges WASM extension to LanguageModelProvider trait │││
 175│  │  │ • Manages streaming across WASM boundary                │││
 176│  │  │ • Handles credential storage via credentials_provider   │││
 177│  │  │ • Provides configuration UI scaffolding                 │││
 178│  │  └─────────────────────────────────────────────────────────┘││
 179│  └─────────────────────────────────────────────────────────────┘│
 180│                              ↑                                   │
 181│  ┌───────────────────────────┴─────────────────────────────────┐│
 182│  │                    WasmHost / WasmExtension                  ││
 183│  │  • Executes WASM module                                      ││
 184│  │  • Provides WIT interface for LLM operations                 ││
 185│  │  • HTTP client for API calls                                 ││
 186│  └─────────────────────────────────────────────────────────────┘│
 187└─────────────────────────────────────────────────────────────────┘
 188```
 189
 190### New Components
 191
 192#### 1. `ExtensionLanguageModelProvider`
 193
 194A new struct in `extension_host` that implements `LanguageModelProvider` and wraps a WASM extension:
 195
 196```rust
 197pub struct ExtensionLanguageModelProvider {
 198    extension: WasmExtension,
 199    provider_info: ExtensionLlmProviderInfo,
 200    state: Entity<ExtensionLlmProviderState>,
 201}
 202
 203struct ExtensionLlmProviderState {
 204    is_authenticated: bool,
 205    available_models: Vec<ExtensionLanguageModel>,
 206}
 207```
 208
 209#### 2. `ExtensionLanguageModel`
 210
 211Implements `LanguageModel` trait, delegating to WASM calls:
 212
 213```rust
 214pub struct ExtensionLanguageModel {
 215    extension: WasmExtension,
 216    model_info: ExtensionLlmModelInfo,
 217    provider_id: LanguageModelProviderId,
 218}
 219```
 220
 221#### 3. WIT Interface Extensions
 222
 223New WIT definitions for LLM provider functionality (see [WIT Interface Design](#wit-interface-design)).
 224
 225---
 226
 227## Implementation Phases
 228
 229### Phase 1: Foundation (2-3 weeks)
 230
 231**Goal**: Establish the core infrastructure for extension-based LLM providers.
 232
 233#### Tasks
 234
 2351. **Define WIT interface for LLM providers** (`extension_api/wit/since_v0.7.0/llm-provider.wit`)
 236   - Provider metadata (id, name, icon)
 237   - Model definitions (id, name, capabilities, limits)
 238   - Credential management hooks
 239   - Completion request/response types
 240
 2412. **Create `ExtensionLanguageModelProvider`** (`extension_host/src/wasm_host/llm_provider.rs`)
 242   - Implement `LanguageModelProvider` trait
 243   - Handle provider registration/unregistration
 244   - Basic authentication state management
 245
 2463. **Create `ExtensionLanguageModel`** (`extension_host/src/wasm_host/llm_model.rs`)
 247   - Implement `LanguageModel` trait
 248   - Simple synchronous completion (non-streaming initially)
 249
 2504. **Update `ExtensionManifest`** (`extension/src/extension_manifest.rs`)
 251   - Add `language_model_providers` field
 252   - Parse provider configuration from `extension.toml`
 253
 2545. **Update extension loading** (`extension_host/src/extension_host.rs`)
 255   - Detect LLM provider declarations in manifest
 256   - Register providers with `LanguageModelRegistry`
 257
 258#### Deliverables
 259- Extensions can register a provider that appears in model selector
 260- Basic (non-streaming) completions work
 261- Manual testing with a test extension
 262
 263### Phase 2: Streaming Support (2-3 weeks)
 264
 265**Goal**: Enable efficient streaming completions across the WASM boundary.
 266
 267#### Tasks
 268
 2691. **Design streaming protocol**
 270   - Option A: Chunked responses via repeated WASM calls
 271   - Option B: Callback-based streaming (preferred)
 272   - Option C: Shared memory buffer with polling
 273
 2742. **Implement streaming in WIT**
 275   ```wit
 276   resource completion-stream {
 277       next-event: func() -> result<option<completion-event>, string>;
 278   }
 279   
 280   export stream-completion: func(
 281       provider-id: string,
 282       model-id: string,
 283       request: completion-request
 284   ) -> result<completion-stream, string>;
 285   ```
 286
 2873. **Implement `http-response-stream` integration**
 288   - Extensions already have access to `fetch-stream`
 289   - Need to parse SSE/chunked responses in WASM
 290   - Map to completion events
 291
 2924. **Update `ExtensionLanguageModel::stream_completion`**
 293   - Bridge WASM completion-stream to Rust BoxStream
 294   - Handle backpressure and cancellation
 295
 2965. **Performance optimization**
 297   - Batch small events to reduce WASM boundary crossings
 298   - Consider using shared memory for large payloads
 299
 300#### Deliverables
 301- Streaming completions work with acceptable latency
 302- Performance benchmarks vs built-in providers
 303
 304### Phase 3: Full Feature Parity (2-3 weeks)
 305
 306**Goal**: Support all advanced features that built-in providers have.
 307
 308#### Tasks
 309
 3101. **Tool/Function calling support**
 311   - Add tool definitions to request
 312   - Parse tool use events from response
 313   - Handle tool results in follow-up requests
 314
 3152. **Image support**
 316   - Pass image data in messages
 317   - Handle base64 encoding/size limits
 318
 3193. **Thinking/reasoning support** (for Claude-like models)
 320   - `Thinking` and `RedactedThinking` events
 321   - Thought signatures for tool calls
 322
 3234. **Token counting**
 324   - WIT interface for `count_tokens`
 325   - Allow extensions to provide custom tokenizers or call API
 326
 3275. **Prompt caching configuration**
 328   - Cache control markers in messages
 329   - Cache configuration reporting
 330
 3316. **Rate limiting and error handling**
 332   - Standard error types in WIT
 333   - Retry-after headers
 334   - Rate limit events
 335
 336#### Deliverables
 337- Extension providers can use tools
 338- Extension providers can process images
 339- Full error handling parity
 340
 341### Phase 4: Credential Management & Configuration UI (1-2 weeks)
 342
 343**Goal**: Secure credential storage and user-friendly configuration.
 344
 345#### Tasks
 346
 3471. **Credential storage integration**
 348   - Use existing `credentials_provider` crate
 349   - Extensions request credentials via WIT
 350   - Credentials never exposed to WASM directly (only "is_authenticated" status)
 351
 3522. **API key input flow**
 353   ```wit
 354   import request-credential: func(
 355       credential-type: credential-type,
 356       label: string,
 357       placeholder: string
 358   ) -> result<bool, string>;
 359   ```
 360
 3613. **Configuration view scaffolding**
 362   - Generic configuration view that works for most providers
 363   - Extensions can provide additional settings via JSON schema
 364   - Settings stored in extension-specific namespace
 365
 3664. **Environment variable support**
 367   - Allow specifying env var names for API keys
 368   - Read from environment on startup
 369
 370#### Deliverables
 371- Secure API key storage
 372- Configuration UI for extension providers
 373- Environment variable fallback
 374
 375### Phase 5: Testing & Documentation (1-2 weeks)
 376
 377**Goal**: Comprehensive testing and developer documentation.
 378
 379#### Tasks
 380
 3811. **Integration tests**
 382   - Test extension loading and registration
 383   - Test streaming completions
 384   - Test error handling
 385   - Test credential management
 386
 3872. **Performance tests**
 388   - Latency benchmarks
 389   - Memory usage under load
 390   - Comparison with built-in providers
 391
 3923. **Example extensions**
 393   - Simple OpenAI-compatible provider
 394   - Provider with custom authentication
 395   - Provider with tool support
 396
 3974. **Documentation**
 398   - Extension developer guide
 399   - API reference
 400   - Migration guide for custom providers
 401
 402#### Deliverables
 403- Full test coverage
 404- Published documentation
 405- Example extensions in `extensions/` directory
 406
 407### Phase 6: Migration of Built-in Providers (Optional, Long-term)
 408
 409**Goal**: Prove the extension system by migrating one or more built-in providers.
 410
 411#### Tasks
 412
 4131. **Select candidate provider** (suggest: Ollama or LM Studio - simplest API)
 4142. **Create extension version**
 4153. **Feature parity testing**
 4164. **Performance comparison**
 4175. **Gradual rollout (feature flag)
 418
 419---
 420
 421## WIT Interface Design
 422
 423### New File: `extension_api/wit/since_v0.7.0/llm-provider.wit`
 424
 425```wit
 426interface llm-provider {
 427    /// Information about a language model provider
 428    record provider-info {
 429        /// Unique identifier for the provider (e.g., "my-extension.my-provider")
 430        id: string,
 431        /// Display name for the provider
 432        name: string,
 433        /// Icon name from Zed's icon set (optional)
 434        icon: option<string>,
 435    }
 436
 437    /// Capabilities of a language model
 438    record model-capabilities {
 439        /// Whether the model supports image inputs
 440        supports-images: bool,
 441        /// Whether the model supports tool/function calling
 442        supports-tools: bool,
 443        /// Whether the model supports tool choice (auto/any/none)
 444        supports-tool-choice-auto: bool,
 445        supports-tool-choice-any: bool,
 446        supports-tool-choice-none: bool,
 447        /// Whether the model supports extended thinking
 448        supports-thinking: bool,
 449        /// The format for tool input schemas
 450        tool-input-format: tool-input-format,
 451    }
 452
 453    /// Format for tool input schemas
 454    enum tool-input-format {
 455        json-schema,
 456        simplified,
 457    }
 458
 459    /// Information about a specific model
 460    record model-info {
 461        /// Unique identifier for the model
 462        id: string,
 463        /// Display name for the model
 464        name: string,
 465        /// Maximum input token count
 466        max-token-count: u64,
 467        /// Maximum output tokens (optional)
 468        max-output-tokens: option<u64>,
 469        /// Model capabilities
 470        capabilities: model-capabilities,
 471        /// Whether this is the default model for the provider
 472        is-default: bool,
 473        /// Whether this is the default fast model
 474        is-default-fast: bool,
 475    }
 476
 477    /// A message in a completion request
 478    record request-message {
 479        role: message-role,
 480        content: list<message-content>,
 481        cache: bool,
 482    }
 483
 484    enum message-role {
 485        user,
 486        assistant,
 487        system,
 488    }
 489
 490    /// Content within a message
 491    variant message-content {
 492        text(string),
 493        image(image-data),
 494        tool-use(tool-use),
 495        tool-result(tool-result),
 496        thinking(thinking-content),
 497        redacted-thinking(string),
 498    }
 499
 500    record image-data {
 501        /// Base64-encoded image data
 502        source: string,
 503        /// Estimated dimensions
 504        width: option<u32>,
 505        height: option<u32>,
 506    }
 507
 508    record tool-use {
 509        id: string,
 510        name: string,
 511        input: string, // JSON string
 512        thought-signature: option<string>,
 513    }
 514
 515    record tool-result {
 516        tool-use-id: string,
 517        tool-name: string,
 518        is-error: bool,
 519        content: tool-result-content,
 520    }
 521
 522    variant tool-result-content {
 523        text(string),
 524        image(image-data),
 525    }
 526
 527    record thinking-content {
 528        text: string,
 529        signature: option<string>,
 530    }
 531
 532    /// A tool definition
 533    record tool-definition {
 534        name: string,
 535        description: string,
 536        /// JSON Schema for input parameters
 537        input-schema: string,
 538    }
 539
 540    /// Tool choice preference
 541    enum tool-choice {
 542        auto,
 543        any,
 544        none,
 545    }
 546
 547    /// A completion request
 548    record completion-request {
 549        messages: list<request-message>,
 550        tools: list<tool-definition>,
 551        tool-choice: option<tool-choice>,
 552        stop-sequences: list<string>,
 553        temperature: option<f32>,
 554        thinking-allowed: bool,
 555        /// Maximum tokens to generate
 556        max-tokens: option<u64>,
 557    }
 558
 559    /// Events emitted during completion streaming
 560    variant completion-event {
 561        /// Completion has started
 562        started,
 563        /// Text content
 564        text(string),
 565        /// Thinking/reasoning content
 566        thinking(thinking-content),
 567        /// Redacted thinking (encrypted)
 568        redacted-thinking(string),
 569        /// Tool use request
 570        tool-use(tool-use),
 571        /// Completion stopped
 572        stop(stop-reason),
 573        /// Token usage update
 574        usage(token-usage),
 575    }
 576
 577    enum stop-reason {
 578        end-turn,
 579        max-tokens,
 580        tool-use,
 581    }
 582
 583    record token-usage {
 584        input-tokens: u64,
 585        output-tokens: u64,
 586        cache-creation-input-tokens: option<u64>,
 587        cache-read-input-tokens: option<u64>,
 588    }
 589
 590    /// A streaming completion response
 591    resource completion-stream {
 592        /// Get the next event from the stream.
 593        /// Returns None when the stream is complete.
 594        next-event: func() -> result<option<completion-event>, string>;
 595    }
 596
 597    /// Credential types that can be requested
 598    enum credential-type {
 599        api-key,
 600        oauth-token,
 601    }
 602}
 603```
 604
 605### Updates to `extension_api/wit/since_v0.7.0/extension.wit`
 606
 607```wit
 608world extension {
 609    // ... existing imports ...
 610    import llm-provider;
 611    
 612    use llm-provider.{
 613        provider-info, model-info, completion-request, 
 614        completion-stream, credential-type
 615    };
 616
 617    /// Returns information about language model providers offered by this extension
 618    export llm-providers: func() -> list<provider-info>;
 619
 620    /// Returns the models available for a provider
 621    export llm-provider-models: func(provider-id: string) -> result<list<model-info>, string>;
 622
 623    /// Check if the provider is authenticated
 624    export llm-provider-is-authenticated: func(provider-id: string) -> bool;
 625
 626    /// Attempt to authenticate the provider
 627    export llm-provider-authenticate: func(provider-id: string) -> result<_, string>;
 628
 629    /// Reset credentials for the provider
 630    export llm-provider-reset-credentials: func(provider-id: string) -> result<_, string>;
 631
 632    /// Count tokens for a request
 633    export llm-count-tokens: func(
 634        provider-id: string, 
 635        model-id: string, 
 636        request: completion-request
 637    ) -> result<u64, string>;
 638
 639    /// Stream a completion
 640    export llm-stream-completion: func(
 641        provider-id: string,
 642        model-id: string,
 643        request: completion-request
 644    ) -> result<completion-stream, string>;
 645
 646    /// Request a credential from the user
 647    import llm-request-credential: func(
 648        provider-id: string,
 649        credential-type: credential-type,
 650        label: string,
 651        placeholder: string
 652    ) -> result<bool, string>;
 653
 654    /// Get a stored credential
 655    import llm-get-credential: func(provider-id: string) -> option<string>;
 656
 657    /// Store a credential
 658    import llm-store-credential: func(provider-id: string, value: string) -> result<_, string>;
 659
 660    /// Delete a stored credential
 661    import llm-delete-credential: func(provider-id: string) -> result<_, string>;
 662}
 663```
 664
 665---
 666
 667## Extension Manifest Changes
 668
 669### Updated `extension.toml` Schema
 670
 671```toml
 672id = "my-llm-extension"
 673name = "My LLM Provider"
 674description = "Adds support for My LLM API"
 675version = "1.0.0"
 676schema_version = 1
 677authors = ["Developer <dev@example.com>"]
 678repository = "https://github.com/example/my-llm-extension"
 679
 680[lib]
 681kind = "rust"
 682version = "0.7.0"
 683
 684# New section for LLM providers
 685[language_model_providers.my-provider]
 686name = "My LLM"
 687icon = "sparkle"  # Optional, from Zed's icon set
 688
 689# Optional: Default models to show even before API connection
 690[[language_model_providers.my-provider.models]]
 691id = "my-model-large"
 692name = "My Model Large"
 693max_token_count = 200000
 694max_output_tokens = 8192
 695supports_images = true
 696supports_tools = true
 697
 698[[language_model_providers.my-provider.models]]
 699id = "my-model-small"
 700name = "My Model Small"
 701max_token_count = 100000
 702max_output_tokens = 4096
 703supports_images = false
 704supports_tools = true
 705
 706# Optional: Environment variable for API key
 707[language_model_providers.my-provider.auth]
 708env_var = "MY_LLM_API_KEY"
 709credential_label = "API Key"
 710```
 711
 712### `ExtensionManifest` Changes
 713
 714```rust
 715// In extension/src/extension_manifest.rs
 716
 717#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
 718pub struct LanguageModelProviderManifestEntry {
 719    pub name: String,
 720    #[serde(default)]
 721    pub icon: Option<String>,
 722    #[serde(default)]
 723    pub models: Vec<LanguageModelManifestEntry>,
 724    #[serde(default)]
 725    pub auth: Option<LanguageModelAuthConfig>,
 726}
 727
 728#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
 729pub struct LanguageModelManifestEntry {
 730    pub id: String,
 731    pub name: String,
 732    #[serde(default)]
 733    pub max_token_count: u64,
 734    #[serde(default)]
 735    pub max_output_tokens: Option<u64>,
 736    #[serde(default)]
 737    pub supports_images: bool,
 738    #[serde(default)]
 739    pub supports_tools: bool,
 740    #[serde(default)]
 741    pub supports_thinking: bool,
 742}
 743
 744#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
 745pub struct LanguageModelAuthConfig {
 746    pub env_var: Option<String>,
 747    pub credential_label: Option<String>,
 748}
 749
 750// Add to ExtensionManifest struct:
 751pub struct ExtensionManifest {
 752    // ... existing fields ...
 753    #[serde(default)]
 754    pub language_model_providers: BTreeMap<Arc<str>, LanguageModelProviderManifestEntry>,
 755}
 756```
 757
 758---
 759
 760## Migration Plan for Built-in Providers
 761
 762This section analyzes each built-in provider and what would be required to implement them as extensions.
 763
 764### Provider Comparison Matrix
 765
 766| Provider | API Style | Auth Method | Special Features | Migration Complexity |
 767|----------|-----------|-------------|------------------|---------------------|
 768| Anthropic | REST/SSE | API Key | Thinking, Caching, Tool signatures | High |
 769| OpenAI | REST/SSE | API Key | Reasoning effort, Prompt caching | Medium |
 770| Google | REST/SSE | API Key | Thinking, Tool signatures | High |
 771| Ollama | REST/SSE | None (local) | Dynamic model discovery | Low |
 772| DeepSeek | REST/SSE | API Key | Reasoning mode | Medium |
 773| OpenRouter | REST/SSE | API Key | Reasoning details, Model routing | Medium |
 774| LM Studio | REST/SSE | None (local) | OpenAI-compatible | Low |
 775| Bedrock | AWS SDK | AWS Credentials | Multiple underlying providers | High |
 776| Zed Cloud | Zed Auth | Zed Account | Proxied providers | N/A (keep built-in) |
 777
 778### Provider-by-Provider Analysis
 779
 780#### Anthropic (`provider/anthropic.rs`)
 781
 782**Current Implementation Highlights:**
 783- Uses `anthropic` crate for API types and streaming
 784- Custom event mapper (`AnthropicEventMapper`) for SSE → completion events
 785- Supports thinking/reasoning with thought signatures
 786- Prompt caching with cache control markers
 787- Beta headers for experimental features
 788
 789**Extension Requirements:**
 790- Full SSE parsing in WASM
 791- Complex event mapping logic
 792- Thinking content with signatures
 793- Cache configuration reporting
 794
 795**Unique Challenges:**
 796```rust
 797// Thought signatures in tool use
 798pub struct LanguageModelToolUse {
 799    pub thought_signature: Option<String>, // Anthropic-specific
 800}
 801
 802// Thinking events with signatures
 803Thinking { text: String, signature: Option<String> }
 804```
 805
 806**Migration Approach:**
 8071. Port `anthropic` crate types to extension-compatible structures
 8082. Implement SSE parser in extension (can use existing `fetch-stream`)
 8093. Map Anthropic events to generic completion events
 8104. Handle beta headers via custom HTTP headers
 811
 812#### OpenAI (`provider/open_ai.rs`)
 813
 814**Current Implementation Highlights:**
 815- Uses `open_ai` crate for API types
 816- Tiktoken-based token counting
 817- Parallel tool calls support
 818- Reasoning effort parameter (o1/o3 models)
 819
 820**Extension Requirements:**
 821- SSE parsing (standard format)
 822- Token counting (could call API or use simplified estimate)
 823- Tool call aggregation across chunks
 824
 825**Unique Challenges:**
 826```rust
 827// Reasoning effort for o-series models
 828pub reasoning_effort: Option<String>, // "low", "medium", "high"
 829
 830// Prompt cache key (preview feature)
 831pub prompt_cache_key: Option<String>,
 832```
 833
 834**Migration Approach:**
 8351. Standard SSE parsing
 8362. Token counting via API or tiktoken WASM port
 8373. Support reasoning_effort as model-specific config
 838
 839#### Google/Gemini (`provider/google.rs`)
 840
 841**Current Implementation Highlights:**
 842- Uses `google_ai` crate
 843- Different API structure from OpenAI/Anthropic
 844- Thinking support similar to Anthropic
 845- Tool signatures in function calls
 846
 847**Extension Requirements:**
 848- Different request/response format
 849- Thinking content handling
 850- Tool signature preservation
 851
 852**Unique Challenges:**
 853```rust
 854// Google uses different content structure
 855enum ContentPart {
 856    Text { text: String },
 857    InlineData { mime_type: String, data: String },
 858    FunctionCall { name: String, args: Value },
 859    FunctionResponse { name: String, response: Value },
 860}
 861```
 862
 863**Migration Approach:**
 8641. Implement Google-specific request building
 8652. Map Google events to generic completion events
 8663. Handle thinking/function call signatures
 867
 868#### Ollama (`provider/ollama.rs`)
 869
 870**Current Implementation Highlights:**
 871- Local-only, no authentication needed
 872- Dynamic model discovery via API
 873- OpenAI-compatible chat endpoint
 874- Simple streaming format
 875
 876**Extension Requirements:**
 877- API URL configuration
 878- Model list fetching
 879- Basic streaming
 880
 881**Why This is a Good First Migration Target:**
 882- No authentication complexity
 883- Simple API format
 884- Dynamic model discovery is isolated
 885- Good test case for local provider pattern
 886
 887**Migration Approach:**
 8881. Configuration for API URL
 8892. Model discovery endpoint call
 8903. OpenAI-compatible streaming
 891
 892#### DeepSeek (`provider/deepseek.rs`)
 893
 894**Current Implementation Highlights:**
 895- OpenAI-compatible API with extensions
 896- Reasoner model support
 897- Different handling for reasoning vs standard models
 898
 899**Extension Requirements:**
 900- API key authentication
 901- Model-specific request modifications
 902- Reasoning content handling
 903
 904**Migration Approach:**
 9051. Standard OpenAI-compatible base
 9062. Special handling for reasoner model
 9073. Temperature disabled for reasoning
 908
 909#### OpenRouter (`provider/open_router.rs`)
 910
 911**Current Implementation Highlights:**
 912- Aggregates multiple providers
 913- Dynamic model fetching
 914- Reasoning details preservation
 915- Tool call signatures
 916
 917**Extension Requirements:**
 918- API key authentication
 919- Model list from API
 920- Reasoning details in responses
 921
 922**Migration Approach:**
 9231. Model discovery from API
 9242. Standard OpenAI-compatible streaming
 9253. Preserve reasoning_details in events
 926
 927#### LM Studio (`provider/lmstudio.rs`)
 928
 929**Current Implementation Highlights:**
 930- Local-only, OpenAI-compatible
 931- Model discovery from API
 932- Simple configuration
 933
 934**Why This is a Good First Migration Target:**
 935- No authentication
 936- OpenAI-compatible (reusable streaming code)
 937- Similar to Ollama
 938
 939#### Bedrock (`provider/bedrock.rs`)
 940
 941**Current Implementation Highlights:**
 942- AWS SDK-based authentication
 943- Multiple authentication methods (IAM, Profile, etc.)
 944- Proxies to Claude, Llama, etc.
 945
 946**Extension Requirements:**
 947- AWS credential handling (complex)
 948- AWS Signature V4 signing
 949- Region configuration
 950
 951**Why This Should Stay Built-in (Initially):**
 952- AWS credential management is complex
 953- SDK dependency not easily portable to WASM
 954- Security implications of AWS credentials in extensions
 955
 956---
 957
 958## Testing Strategy
 959
 960### Unit Tests
 961
 962```rust
 963// extension_host/src/wasm_host/llm_provider_tests.rs
 964
 965#[gpui::test]
 966async fn test_extension_provider_registration(cx: &mut TestAppContext) {
 967    // Load test extension with LLM provider
 968    // Verify provider appears in registry
 969    // Verify models are listed correctly
 970}
 971
 972#[gpui::test]
 973async fn test_extension_streaming_completion(cx: &mut TestAppContext) {
 974    // Create mock HTTP server
 975    // Load extension
 976    // Send completion request
 977    // Verify streaming events received correctly
 978}
 979
 980#[gpui::test]
 981async fn test_extension_tool_calling(cx: &mut TestAppContext) {
 982    // Test tool definitions are passed correctly
 983    // Test tool use events are parsed
 984    // Test tool results can be sent back
 985}
 986
 987#[gpui::test]
 988async fn test_extension_credential_management(cx: &mut TestAppContext) {
 989    // Test credential storage
 990    // Test credential retrieval
 991    // Test authentication state
 992}
 993
 994#[gpui::test]
 995async fn test_extension_error_handling(cx: &mut TestAppContext) {
 996    // Test API errors are propagated correctly
 997    // Test rate limiting is handled
 998    // Test network errors are handled
 999}
1000```
1001
1002### Integration Tests
1003
1004```rust
1005// crates/extension_host/src/extension_store_test.rs (additions)
1006
1007#[gpui::test]
1008async fn test_llm_extension_lifecycle(cx: &mut TestAppContext) {
1009    // Install extension with LLM provider
1010    // Verify provider registered
1011    // Configure credentials
1012    // Make completion request
1013    // Uninstall extension
1014    // Verify provider unregistered
1015}
1016```
1017
1018### Manual Testing Checklist
1019
10201. **Provider Discovery**
1021   - [ ] Extension provider appears in model selector
1022   - [ ] Provider icon displays correctly
1023   - [ ] Models list correctly
1024
10252. **Authentication**
1026   - [ ] API key prompt appears when not authenticated
1027   - [ ] API key is stored securely
1028   - [ ] Environment variable fallback works
1029   - [ ] "Reset credentials" works
1030
10313. **Completions**
1032   - [ ] Basic text completion works
1033   - [ ] Streaming is smooth (no jank)
1034   - [ ] Long responses complete successfully
1035   - [ ] Cancellation works
1036
10374. **Advanced Features**
1038   - [ ] Tool calling works (Agent panel)
1039   - [ ] Image inputs work (if supported)
1040   - [ ] Thinking/reasoning displays correctly
1041
10425. **Error Handling**
1043   - [ ] Invalid API key shows error
1044   - [ ] Rate limiting shows appropriate message
1045   - [ ] Network errors are handled gracefully
1046
10476. **Performance**
1048   - [ ] First token latency acceptable (<500ms overhead)
1049   - [ ] Memory usage reasonable
1050   - [ ] No memory leaks on repeated requests
1051
1052---
1053
1054## Security Considerations
1055
1056### Credential Handling
1057
10581. **Never expose raw credentials to WASM**
1059   - Extensions request credentials via import function
1060   - Zed stores credentials in secure storage (keychain/credential manager)
1061   - Extensions receive only "authenticated: true/false" status
1062
10632. **Credential scope isolation**
1064   - Each extension has its own credential namespace
1065   - Extensions cannot access other extensions' credentials
1066   - Provider ID is prefixed with extension ID
1067
10683. **Audit logging**
1069   - Log when credentials are accessed (not the values)
1070   - Log when credentials are modified
1071
1072### Network Access
1073
10741. **HTTP request validation**
1075   - Extensions already have HTTP access via `fetch` / `fetch-stream`
1076   - Consider domain allowlisting for LLM providers
1077   - Log outbound requests for debugging
1078
10792. **Request/Response inspection**
1080   - API keys in headers should be redacted in logs
1081   - Response bodies may contain sensitive data
1082
1083### Extension Sandbox
1084
10851. **WASM isolation**
1086   - Extensions run in WASM sandbox
1087   - Cannot access filesystem outside work directory
1088   - Cannot access other extensions' data
1089
10902. **Resource limits**
1091   - Memory limits per extension
1092   - CPU time limits (epoch-based interruption already exists)
1093   - Concurrent request limits
1094
1095### Capability Requirements
1096
1097```toml
1098# Extensions with LLM providers should declare:
1099[[capabilities]]
1100kind = "network:http"
1101domains = ["api.example.com"]  # Optional domain restriction
1102
1103[[capabilities]]
1104kind = "credential:store"
1105```
1106
1107---
1108
1109## Appendix: Provider-Specific Requirements
1110
1111### A. Anthropic Implementation Details
1112
1113**Request Format:**
1114```json
1115{
1116  "model": "claude-sonnet-4-20250514",
1117  "max_tokens": 8192,
1118  "messages": [
1119    {"role": "user", "content": [{"type": "text", "text": "Hello"}]}
1120  ],
1121  "system": [{"type": "text", "text": "You are helpful"}],
1122  "tools": [...],
1123  "thinking": {"type": "enabled", "budget_tokens": 10000}
1124}
1125```
1126
1127**SSE Events:**
1128- `message_start` - Contains message ID, model, usage
1129- `content_block_start` - Starts text/tool_use/thinking block
1130- `content_block_delta` - Incremental content (text_delta, input_json_delta, thinking_delta)
1131- `content_block_stop` - Block complete
1132- `message_delta` - Stop reason, final usage
1133- `message_stop` - End of message
1134
1135**Special Considerations:**
1136- Beta headers for thinking: `anthropic-beta: interleaved-thinking-2025-05-14`
1137- Cache control markers in messages
1138- Thought signatures on tool uses
1139
1140### B. OpenAI Implementation Details
1141
1142**Request Format:**
1143```json
1144{
1145  "model": "gpt-4o",
1146  "messages": [
1147    {"role": "system", "content": "You are helpful"},
1148    {"role": "user", "content": "Hello"}
1149  ],
1150  "stream": true,
1151  "tools": [...],
1152  "max_completion_tokens": 4096
1153}
1154```
1155
1156**SSE Events:**
1157```
1158data: {"choices":[{"delta":{"content":"Hello"}}]}
1159data: {"choices":[{"delta":{"tool_calls":[...]}}]}
1160data: [DONE]
1161```
1162
1163**Special Considerations:**
1164- `reasoning_effort` for o-series models
1165- `parallel_tool_calls` option
1166- Token counting via tiktoken
1167
1168### C. Google/Gemini Implementation Details
1169
1170**Request Format:**
1171```json
1172{
1173  "contents": [
1174    {"role": "user", "parts": [{"text": "Hello"}]}
1175  ],
1176  "generationConfig": {
1177    "maxOutputTokens": 8192,
1178    "temperature": 0.7
1179  },
1180  "tools": [...]
1181}
1182```
1183
1184**Response Format:**
1185```json
1186{
1187  "candidates": [{
1188    "content": {
1189      "parts": [
1190        {"text": "Response"},
1191        {"functionCall": {"name": "...", "args": {...}}}
1192      ]
1193    }
1194  }]
1195}
1196```
1197
1198**Special Considerations:**
1199- Different streaming format (not SSE, line-delimited JSON)
1200- Tool signatures in function calls
1201- Thinking support similar to Anthropic
1202
1203### D. OpenAI-Compatible Providers (Ollama, LM Studio, DeepSeek)
1204
1205These providers can share common implementation:
1206
1207**Shared Code:**
1208```rust
1209// In extension
1210fn stream_openai_compatible(
1211    api_url: &str,
1212    api_key: Option<&str>,
1213    request: CompletionRequest,
1214) -> Result<CompletionStream, String> {
1215    let request_body = build_openai_request(request);
1216    let stream = http_client::fetch_stream(HttpRequest {
1217        method: HttpMethod::Post,
1218        url: format!("{}/v1/chat/completions", api_url),
1219        headers: build_headers(api_key),
1220        body: Some(serde_json::to_vec(&request_body)?),
1221        redirect_policy: RedirectPolicy::NoFollow,
1222    })?;
1223    
1224    Ok(OpenAiStreamParser::new(stream))
1225}
1226```
1227
1228### E. Example Extension: Simple OpenAI-Compatible Provider
1229
1230```rust
1231// src/my_provider.rs
1232use zed_extension_api::{self as zed, Result};
1233use zed_extension_api::http_client::{HttpMethod, HttpRequest, RedirectPolicy};
1234
1235struct MyLlmExtension {
1236    api_key: Option<String>,
1237}
1238
1239impl zed::Extension for MyLlmExtension {
1240    fn new() -> Self {
1241        Self { api_key: None }
1242    }
1243
1244    fn llm_providers(&self) -> Vec<zed::LlmProviderInfo> {
1245        vec![zed::LlmProviderInfo {
1246            id: "my-provider".into(),
1247            name: "My LLM Provider".into(),
1248            icon: Some("sparkle".into()),
1249        }]
1250    }
1251
1252    fn llm_provider_models(&self, provider_id: &str) -> Result<Vec<zed::LlmModelInfo>> {
1253        Ok(vec![
1254            zed::LlmModelInfo {
1255                id: "my-model".into(),
1256                name: "My Model".into(),
1257                max_token_count: 128000,
1258                max_output_tokens: Some(4096),
1259                capabilities: zed::LlmModelCapabilities {
1260                    supports_images: true,
1261                    supports_tools: true,
1262                    ..Default::default()
1263                },
1264                is_default: true,
1265                is_default_fast: false,
1266            }
1267        ])
1268    }
1269
1270    fn llm_provider_is_authenticated(&self, _provider_id: &str) -> bool {
1271        self.api_key.is_some() || std::env::var("MY_API_KEY").is_ok()
1272    }
1273
1274    fn llm_provider_authenticate(&mut self, provider_id: &str) -> Result<()> {
1275        if let Some(key) = zed::llm_get_credential(provider_id)? {
1276            self.api_key = Some(key);
1277            return Ok(());
1278        }
1279        
1280        if zed::llm_request_credential(
1281            provider_id,
1282            zed::CredentialType::ApiKey,
1283            "API Key",
1284            "Enter your API key",
1285        )? {
1286            self.api_key = zed::llm_get_credential(provider_id)?;
1287        }
1288        
1289        Ok(())
1290    }
1291
1292    fn llm_stream_completion(
1293        &self,
1294        provider_id: &str,
1295        model_id: &str,
1296        request: zed::LlmCompletionRequest,
1297    ) -> Result<zed::LlmCompletionStream> {
1298        let api_key = self.api_key.as_ref()
1299            .or_else(|| std::env::var("MY_API_KEY").ok().as_ref())
1300            .ok_or("Not authenticated")?;
1301
1302        let body = serde_json::json!({
1303            "model": model_id,
1304            "messages": self.convert_messages(&request.messages),
1305            "stream": true,
1306            "max_tokens": request.max_tokens.unwrap_or(4096),
1307        });
1308
1309        let stream = HttpRequest::builder()
1310            .method(HttpMethod::Post)
1311            .url("https://api.my-provider.com/v1/chat/completions")
1312            .header("Authorization", format!("Bearer {}", api_key))
1313            .header("Content-Type", "application/json")
1314            .body(serde_json::to_vec(&body)?)
1315            .build()?
1316            .fetch_stream()?;
1317
1318        Ok(zed::LlmCompletionStream::new(OpenAiStreamParser::new(stream)))
1319    }
1320}
1321
1322zed::register_extension!(MyLlmExtension);
1323```
1324
1325---
1326
1327## Timeline Summary
1328
1329| Phase | Duration | Key Deliverables |
1330|-------|----------|------------------|
1331| 1. Foundation | 2-3 weeks | WIT interface, basic provider registration |
1332| 2. Streaming | 2-3 weeks | Efficient streaming across WASM boundary |
1333| 3. Full Features | 2-3 weeks | Tools, images, thinking support |
1334| 4. Credentials & UI | 1-2 weeks | Secure credentials, configuration UI |
1335| 5. Testing & Docs | 1-2 weeks | Tests, documentation, examples |
1336| 6. Migration (optional) | Ongoing | Migrate built-in providers |
1337
1338**Total estimated time: 8-13 weeks**
1339
1340---
1341
1342## Open Questions
1343
13441. **Streaming efficiency**: Is callback-based streaming feasible in WASM, or should we use polling?
1345
13462. **Token counting**: Should we require extensions to implement token counting, or provide a fallback estimation?
1347
13483. **Configuration UI**: Should extensions be able to provide custom UI components, or just JSON schema-driven forms?
1349
13504. **Provider priorities**: Should extension providers appear before or after built-in providers in the selector?
1351
13525. **Backward compatibility**: How do we handle extensions built against older WIT versions when adding new LLM features?
1353
13546. **Rate limiting**: Should the host help with rate limiting, or leave it entirely to extensions?
1355
1356---
1357
1358## Conclusion
1359
1360This plan provides a comprehensive roadmap for implementing Language Model Provider Extensions in Zed. The phased approach allows for incremental delivery of value while building toward full feature parity with built-in providers.
1361
1362The key architectural decisions are:
13631. **WIT-based interface** for WASM interop, consistent with existing extension patterns
13642. **Streaming via resources** to minimize WASM boundary crossing overhead
13653. **Host-managed credentials** for security
13664. **Manifest-based discovery** for static model information
1367
1368The migration analysis shows that simpler providers (Ollama, LM Studio) can be migrated first as proof of concept, while more complex providers (Anthropic, Bedrock) may remain built-in initially.