1# Language Model Provider Extensions Plan
2
3## Executive Summary
4
5This document outlines a comprehensive plan to introduce **Language Model Provider Extensions** to Zed. This feature will allow third-party developers to create extensions that register new language model providers, enabling users to select and use custom language models in Zed's AI features (Agent, inline assist, commit message generation, etc.).
6
7## Table of Contents
8
91. [Current Architecture Overview](#current-architecture-overview)
102. [Goals and Requirements](#goals-and-requirements)
113. [Proposed Architecture](#proposed-architecture)
124. [Implementation Phases](#implementation-phases)
135. [WIT Interface Design](#wit-interface-design)
146. [Extension Manifest Changes](#extension-manifest-changes)
157. [Migration Plan for Built-in Providers](#migration-plan-for-built-in-providers)
168. [Testing Strategy](#testing-strategy)
179. [Security Considerations](#security-considerations)
1810. [Appendix: Provider-Specific Requirements](#appendix-provider-specific-requirements)
19
20---
21
22## Current Architecture Overview
23
24### Key Components
25
26#### `language_model` crate (`crates/language_model/`)
27- **`LanguageModel` trait** (`src/language_model.rs:580-718`): Core trait defining model capabilities
28 - `id()`, `name()`, `provider_id()`, `provider_name()`
29 - `supports_images()`, `supports_tools()`, `supports_tool_choice()`
30 - `max_token_count()`, `max_output_tokens()`
31 - `count_tokens()` - async token counting
32 - `stream_completion()` - the main completion streaming method
33 - `cache_configuration()` - optional prompt caching config
34
35- **`LanguageModelProvider` trait** (`src/language_model.rs:743-764`): Provider registration
36 - `id()`, `name()`, `icon()`
37 - `default_model()`, `default_fast_model()`
38 - `provided_models()`, `recommended_models()`
39 - `is_authenticated()`, `authenticate()`
40 - `configuration_view()` - UI for provider configuration
41 - `reset_credentials()`
42
43- **`LanguageModelRegistry`** (`src/registry.rs`): Global registry for providers
44 - `register_provider()` / `unregister_provider()`
45 - Model selection and configuration
46 - Event emission for UI updates
47
48#### `language_models` crate (`crates/language_models/`)
49Contains all built-in provider implementations:
50- `provider/anthropic.rs` - Anthropic Claude models
51- `provider/cloud.rs` - Zed Cloud (proxied models)
52- `provider/google.rs` - Google Gemini models
53- `provider/open_ai.rs` - OpenAI GPT models
54- `provider/ollama.rs` - Local Ollama models
55- `provider/deepseek.rs` - DeepSeek models
56- `provider/open_router.rs` - OpenRouter aggregator
57- `provider/bedrock.rs` - AWS Bedrock
58- And more...
59
60#### Extension System (`crates/extension_host/`, `crates/extension_api/`)
61- **WIT interface** (`extension_api/wit/since_v0.6.0/`): WebAssembly Interface Types definitions
62- **WASM host** (`extension_host/src/wasm_host.rs`): Executes extension WASM modules
63- **Extension trait** (`extension/src/extension.rs`): Rust trait for extensions
64- **HTTP client** (`extension_api/src/http_client.rs`): Existing HTTP capability for extensions
65
66### Request/Response Flow
67
68```
69User Request
70 ↓
71LanguageModelRequest (crates/language_model/src/request.rs)
72 ↓
73Provider-specific conversion (e.g., into_anthropic(), into_open_ai())
74 ↓
75HTTP API call (provider-specific crate)
76 ↓
77Stream of provider-specific events
78 ↓
79Event mapping to LanguageModelCompletionEvent
80 ↓
81Consumer (Agent, Inline Assist, etc.)
82```
83
84### Key Data Structures
85
86```rust
87// Request
88pub struct LanguageModelRequest {
89 pub thread_id: Option<String>,
90 pub prompt_id: Option<String>,
91 pub intent: Option<CompletionIntent>,
92 pub mode: Option<CompletionMode>,
93 pub messages: Vec<LanguageModelRequestMessage>,
94 pub tools: Vec<LanguageModelRequestTool>,
95 pub tool_choice: Option<LanguageModelToolChoice>,
96 pub stop: Vec<String>,
97 pub temperature: Option<f32>,
98 pub thinking_allowed: bool,
99}
100
101// Completion Events
102pub enum LanguageModelCompletionEvent {
103 Queued { position: usize },
104 Started,
105 UsageUpdated { amount: usize, limit: usize },
106 ToolUseLimitReached,
107 Stop(StopReason),
108 Text(String),
109 Thinking { text: String, signature: Option<String> },
110 RedactedThinking { data: String },
111 ToolUse(LanguageModelToolUse),
112 ToolUseJsonParseError { ... },
113 StartMessage { message_id: Option<String> },
114 ReasoningDetails(serde_json::Value),
115 UsageUpdate(TokenUsage),
116}
117```
118
119---
120
121## Goals and Requirements
122
123### Primary Goals
124
1251. **Extensibility**: Allow any developer to add new LLM providers via extensions
1262. **Parity**: Extension-based providers should have feature parity with built-in providers
1273. **Performance**: Minimize overhead from WASM boundary crossings during streaming
1284. **Security**: Sandbox API key handling and network access appropriately
1295. **User Experience**: Seamless integration with existing model selectors and configuration UI
130
131### Functional Requirements
132
1331. Extensions can register one or more language model providers
1342. Extensions can define multiple models per provider
1353. Extensions handle authentication (API keys, OAuth, etc.)
1364. Extensions implement the streaming completion API
1375. Extensions can specify model capabilities (tools, images, thinking, etc.)
1386. Extensions can provide token counting logic
1397. Extensions can provide configuration UI components
1408. Extensions receive full request context for API customization
141
142### Non-Functional Requirements
143
1441. Streaming should feel as responsive as built-in providers
1452. Extension crashes should not crash Zed
1463. API keys should never be logged or exposed
1474. Extensions should be able to make arbitrary HTTP requests
1485. Settings should persist across sessions
149
150---
151
152## Proposed Architecture
153
154### High-Level Design
155
156```
157┌─────────────────────────────────────────────────────────────────┐
158│ Zed Application │
159├─────────────────────────────────────────────────────────────────┤
160│ ┌─────────────────────────────────────────────────────────────┐│
161│ │ LanguageModelRegistry ││
162│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ ││
163│ │ │ Built-in │ │ Extension │ │ Extension │ ││
164│ │ │ Providers │ │ Provider A │ │ Provider B │ ││
165│ │ │ (Anthropic, │ │ (WASM) │ │ (WASM) │ ││
166│ │ │ OpenAI...) │ │ │ │ │ ││
167│ │ └──────────────┘ └──────────────┘ └──────────────────┘ ││
168│ └─────────────────────────────────────────────────────────────┘│
169│ ↑ │
170│ │ │
171│ ┌───────────────────────────┴─────────────────────────────────┐│
172│ │ ExtensionLanguageModelProvider ││
173│ │ ┌─────────────────────────────────────────────────────────┐││
174│ │ │ • Bridges WASM extension to LanguageModelProvider trait │││
175│ │ │ • Manages streaming across WASM boundary │││
176│ │ │ • Handles credential storage via credentials_provider │││
177│ │ │ • Provides configuration UI scaffolding │││
178│ │ └─────────────────────────────────────────────────────────┘││
179│ └─────────────────────────────────────────────────────────────┘│
180│ ↑ │
181│ ┌───────────────────────────┴─────────────────────────────────┐│
182│ │ WasmHost / WasmExtension ││
183│ │ • Executes WASM module ││
184│ │ • Provides WIT interface for LLM operations ││
185│ │ • HTTP client for API calls ││
186│ └─────────────────────────────────────────────────────────────┘│
187└─────────────────────────────────────────────────────────────────┘
188```
189
190### New Components
191
192#### 1. `ExtensionLanguageModelProvider`
193
194A new struct in `extension_host` that implements `LanguageModelProvider` and wraps a WASM extension:
195
196```rust
197pub struct ExtensionLanguageModelProvider {
198 extension: WasmExtension,
199 provider_info: ExtensionLlmProviderInfo,
200 state: Entity<ExtensionLlmProviderState>,
201}
202
203struct ExtensionLlmProviderState {
204 is_authenticated: bool,
205 available_models: Vec<ExtensionLanguageModel>,
206}
207```
208
209#### 2. `ExtensionLanguageModel`
210
211Implements `LanguageModel` trait, delegating to WASM calls:
212
213```rust
214pub struct ExtensionLanguageModel {
215 extension: WasmExtension,
216 model_info: ExtensionLlmModelInfo,
217 provider_id: LanguageModelProviderId,
218}
219```
220
221#### 3. WIT Interface Extensions
222
223New WIT definitions for LLM provider functionality (see [WIT Interface Design](#wit-interface-design)).
224
225---
226
227## Implementation Phases
228
229### Phase 1: Foundation (2-3 weeks)
230
231**Goal**: Establish the core infrastructure for extension-based LLM providers.
232
233#### Tasks
234
2351. **Define WIT interface for LLM providers** (`extension_api/wit/since_v0.7.0/llm-provider.wit`)
236 - Provider metadata (id, name, icon)
237 - Model definitions (id, name, capabilities, limits)
238 - Credential management hooks
239 - Completion request/response types
240
2412. **Create `ExtensionLanguageModelProvider`** (`extension_host/src/wasm_host/llm_provider.rs`)
242 - Implement `LanguageModelProvider` trait
243 - Handle provider registration/unregistration
244 - Basic authentication state management
245
2463. **Create `ExtensionLanguageModel`** (`extension_host/src/wasm_host/llm_model.rs`)
247 - Implement `LanguageModel` trait
248 - Simple synchronous completion (non-streaming initially)
249
2504. **Update `ExtensionManifest`** (`extension/src/extension_manifest.rs`)
251 - Add `language_model_providers` field
252 - Parse provider configuration from `extension.toml`
253
2545. **Update extension loading** (`extension_host/src/extension_host.rs`)
255 - Detect LLM provider declarations in manifest
256 - Register providers with `LanguageModelRegistry`
257
258#### Deliverables
259- Extensions can register a provider that appears in model selector
260- Basic (non-streaming) completions work
261- Manual testing with a test extension
262
263### Phase 2: Streaming Support (2-3 weeks)
264
265**Goal**: Enable efficient streaming completions across the WASM boundary.
266
267#### Tasks
268
2691. **Design streaming protocol**
270 - Option A: Chunked responses via repeated WASM calls
271 - Option B: Callback-based streaming (preferred)
272 - Option C: Shared memory buffer with polling
273
2742. **Implement streaming in WIT**
275 ```wit
276 resource completion-stream {
277 next-event: func() -> result<option<completion-event>, string>;
278 }
279
280 export stream-completion: func(
281 provider-id: string,
282 model-id: string,
283 request: completion-request
284 ) -> result<completion-stream, string>;
285 ```
286
2873. **Implement `http-response-stream` integration**
288 - Extensions already have access to `fetch-stream`
289 - Need to parse SSE/chunked responses in WASM
290 - Map to completion events
291
2924. **Update `ExtensionLanguageModel::stream_completion`**
293 - Bridge WASM completion-stream to Rust BoxStream
294 - Handle backpressure and cancellation
295
2965. **Performance optimization**
297 - Batch small events to reduce WASM boundary crossings
298 - Consider using shared memory for large payloads
299
300#### Deliverables
301- Streaming completions work with acceptable latency
302- Performance benchmarks vs built-in providers
303
304### Phase 3: Full Feature Parity (2-3 weeks)
305
306**Goal**: Support all advanced features that built-in providers have.
307
308#### Tasks
309
3101. **Tool/Function calling support**
311 - Add tool definitions to request
312 - Parse tool use events from response
313 - Handle tool results in follow-up requests
314
3152. **Image support**
316 - Pass image data in messages
317 - Handle base64 encoding/size limits
318
3193. **Thinking/reasoning support** (for Claude-like models)
320 - `Thinking` and `RedactedThinking` events
321 - Thought signatures for tool calls
322
3234. **Token counting**
324 - WIT interface for `count_tokens`
325 - Allow extensions to provide custom tokenizers or call API
326
3275. **Prompt caching configuration**
328 - Cache control markers in messages
329 - Cache configuration reporting
330
3316. **Rate limiting and error handling**
332 - Standard error types in WIT
333 - Retry-after headers
334 - Rate limit events
335
336#### Deliverables
337- Extension providers can use tools
338- Extension providers can process images
339- Full error handling parity
340
341### Phase 4: Credential Management & Configuration UI (1-2 weeks)
342
343**Goal**: Secure credential storage and user-friendly configuration.
344
345#### Tasks
346
3471. **Credential storage integration**
348 - Use existing `credentials_provider` crate
349 - Extensions request credentials via WIT
350 - Credentials never exposed to WASM directly (only "is_authenticated" status)
351
3522. **API key input flow**
353 ```wit
354 import request-credential: func(
355 credential-type: credential-type,
356 label: string,
357 placeholder: string
358 ) -> result<bool, string>;
359 ```
360
3613. **Configuration view scaffolding**
362 - Generic configuration view that works for most providers
363 - Extensions can provide additional settings via JSON schema
364 - Settings stored in extension-specific namespace
365
3664. **Environment variable support**
367 - Allow specifying env var names for API keys
368 - Read from environment on startup
369
370#### Deliverables
371- Secure API key storage
372- Configuration UI for extension providers
373- Environment variable fallback
374
375### Phase 5: Testing & Documentation (1-2 weeks)
376
377**Goal**: Comprehensive testing and developer documentation.
378
379#### Tasks
380
3811. **Integration tests**
382 - Test extension loading and registration
383 - Test streaming completions
384 - Test error handling
385 - Test credential management
386
3872. **Performance tests**
388 - Latency benchmarks
389 - Memory usage under load
390 - Comparison with built-in providers
391
3923. **Example extensions**
393 - Simple OpenAI-compatible provider
394 - Provider with custom authentication
395 - Provider with tool support
396
3974. **Documentation**
398 - Extension developer guide
399 - API reference
400 - Migration guide for custom providers
401
402#### Deliverables
403- Full test coverage
404- Published documentation
405- Example extensions in `extensions/` directory
406
407### Phase 6: Migration of Built-in Providers (Optional, Long-term)
408
409**Goal**: Prove the extension system by migrating one or more built-in providers.
410
411#### Tasks
412
4131. **Select candidate provider** (suggest: Ollama or LM Studio - simplest API)
4142. **Create extension version**
4153. **Feature parity testing**
4164. **Performance comparison**
4175. **Gradual rollout (feature flag)
418
419---
420
421## WIT Interface Design
422
423### New File: `extension_api/wit/since_v0.7.0/llm-provider.wit`
424
425```wit
426interface llm-provider {
427 /// Information about a language model provider
428 record provider-info {
429 /// Unique identifier for the provider (e.g., "my-extension.my-provider")
430 id: string,
431 /// Display name for the provider
432 name: string,
433 /// Icon name from Zed's icon set (optional)
434 icon: option<string>,
435 }
436
437 /// Capabilities of a language model
438 record model-capabilities {
439 /// Whether the model supports image inputs
440 supports-images: bool,
441 /// Whether the model supports tool/function calling
442 supports-tools: bool,
443 /// Whether the model supports tool choice (auto/any/none)
444 supports-tool-choice-auto: bool,
445 supports-tool-choice-any: bool,
446 supports-tool-choice-none: bool,
447 /// Whether the model supports extended thinking
448 supports-thinking: bool,
449 /// The format for tool input schemas
450 tool-input-format: tool-input-format,
451 }
452
453 /// Format for tool input schemas
454 enum tool-input-format {
455 json-schema,
456 simplified,
457 }
458
459 /// Information about a specific model
460 record model-info {
461 /// Unique identifier for the model
462 id: string,
463 /// Display name for the model
464 name: string,
465 /// Maximum input token count
466 max-token-count: u64,
467 /// Maximum output tokens (optional)
468 max-output-tokens: option<u64>,
469 /// Model capabilities
470 capabilities: model-capabilities,
471 /// Whether this is the default model for the provider
472 is-default: bool,
473 /// Whether this is the default fast model
474 is-default-fast: bool,
475 }
476
477 /// A message in a completion request
478 record request-message {
479 role: message-role,
480 content: list<message-content>,
481 cache: bool,
482 }
483
484 enum message-role {
485 user,
486 assistant,
487 system,
488 }
489
490 /// Content within a message
491 variant message-content {
492 text(string),
493 image(image-data),
494 tool-use(tool-use),
495 tool-result(tool-result),
496 thinking(thinking-content),
497 redacted-thinking(string),
498 }
499
500 record image-data {
501 /// Base64-encoded image data
502 source: string,
503 /// Estimated dimensions
504 width: option<u32>,
505 height: option<u32>,
506 }
507
508 record tool-use {
509 id: string,
510 name: string,
511 input: string, // JSON string
512 thought-signature: option<string>,
513 }
514
515 record tool-result {
516 tool-use-id: string,
517 tool-name: string,
518 is-error: bool,
519 content: tool-result-content,
520 }
521
522 variant tool-result-content {
523 text(string),
524 image(image-data),
525 }
526
527 record thinking-content {
528 text: string,
529 signature: option<string>,
530 }
531
532 /// A tool definition
533 record tool-definition {
534 name: string,
535 description: string,
536 /// JSON Schema for input parameters
537 input-schema: string,
538 }
539
540 /// Tool choice preference
541 enum tool-choice {
542 auto,
543 any,
544 none,
545 }
546
547 /// A completion request
548 record completion-request {
549 messages: list<request-message>,
550 tools: list<tool-definition>,
551 tool-choice: option<tool-choice>,
552 stop-sequences: list<string>,
553 temperature: option<f32>,
554 thinking-allowed: bool,
555 /// Maximum tokens to generate
556 max-tokens: option<u64>,
557 }
558
559 /// Events emitted during completion streaming
560 variant completion-event {
561 /// Completion has started
562 started,
563 /// Text content
564 text(string),
565 /// Thinking/reasoning content
566 thinking(thinking-content),
567 /// Redacted thinking (encrypted)
568 redacted-thinking(string),
569 /// Tool use request
570 tool-use(tool-use),
571 /// Completion stopped
572 stop(stop-reason),
573 /// Token usage update
574 usage(token-usage),
575 }
576
577 enum stop-reason {
578 end-turn,
579 max-tokens,
580 tool-use,
581 }
582
583 record token-usage {
584 input-tokens: u64,
585 output-tokens: u64,
586 cache-creation-input-tokens: option<u64>,
587 cache-read-input-tokens: option<u64>,
588 }
589
590 /// A streaming completion response
591 resource completion-stream {
592 /// Get the next event from the stream.
593 /// Returns None when the stream is complete.
594 next-event: func() -> result<option<completion-event>, string>;
595 }
596
597 /// Credential types that can be requested
598 enum credential-type {
599 api-key,
600 oauth-token,
601 }
602}
603```
604
605### Updates to `extension_api/wit/since_v0.7.0/extension.wit`
606
607```wit
608world extension {
609 // ... existing imports ...
610 import llm-provider;
611
612 use llm-provider.{
613 provider-info, model-info, completion-request,
614 completion-stream, credential-type
615 };
616
617 /// Returns information about language model providers offered by this extension
618 export llm-providers: func() -> list<provider-info>;
619
620 /// Returns the models available for a provider
621 export llm-provider-models: func(provider-id: string) -> result<list<model-info>, string>;
622
623 /// Check if the provider is authenticated
624 export llm-provider-is-authenticated: func(provider-id: string) -> bool;
625
626 /// Attempt to authenticate the provider
627 export llm-provider-authenticate: func(provider-id: string) -> result<_, string>;
628
629 /// Reset credentials for the provider
630 export llm-provider-reset-credentials: func(provider-id: string) -> result<_, string>;
631
632 /// Count tokens for a request
633 export llm-count-tokens: func(
634 provider-id: string,
635 model-id: string,
636 request: completion-request
637 ) -> result<u64, string>;
638
639 /// Stream a completion
640 export llm-stream-completion: func(
641 provider-id: string,
642 model-id: string,
643 request: completion-request
644 ) -> result<completion-stream, string>;
645
646 /// Request a credential from the user
647 import llm-request-credential: func(
648 provider-id: string,
649 credential-type: credential-type,
650 label: string,
651 placeholder: string
652 ) -> result<bool, string>;
653
654 /// Get a stored credential
655 import llm-get-credential: func(provider-id: string) -> option<string>;
656
657 /// Store a credential
658 import llm-store-credential: func(provider-id: string, value: string) -> result<_, string>;
659
660 /// Delete a stored credential
661 import llm-delete-credential: func(provider-id: string) -> result<_, string>;
662}
663```
664
665---
666
667## Extension Manifest Changes
668
669### Updated `extension.toml` Schema
670
671```toml
672id = "my-llm-extension"
673name = "My LLM Provider"
674description = "Adds support for My LLM API"
675version = "1.0.0"
676schema_version = 1
677authors = ["Developer <dev@example.com>"]
678repository = "https://github.com/example/my-llm-extension"
679
680[lib]
681kind = "rust"
682version = "0.7.0"
683
684# New section for LLM providers
685[language_model_providers.my-provider]
686name = "My LLM"
687icon = "sparkle" # Optional, from Zed's icon set
688
689# Optional: Default models to show even before API connection
690[[language_model_providers.my-provider.models]]
691id = "my-model-large"
692name = "My Model Large"
693max_token_count = 200000
694max_output_tokens = 8192
695supports_images = true
696supports_tools = true
697
698[[language_model_providers.my-provider.models]]
699id = "my-model-small"
700name = "My Model Small"
701max_token_count = 100000
702max_output_tokens = 4096
703supports_images = false
704supports_tools = true
705
706# Optional: Environment variable for API key
707[language_model_providers.my-provider.auth]
708env_var = "MY_LLM_API_KEY"
709credential_label = "API Key"
710```
711
712### `ExtensionManifest` Changes
713
714```rust
715// In extension/src/extension_manifest.rs
716
717#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
718pub struct LanguageModelProviderManifestEntry {
719 pub name: String,
720 #[serde(default)]
721 pub icon: Option<String>,
722 #[serde(default)]
723 pub models: Vec<LanguageModelManifestEntry>,
724 #[serde(default)]
725 pub auth: Option<LanguageModelAuthConfig>,
726}
727
728#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
729pub struct LanguageModelManifestEntry {
730 pub id: String,
731 pub name: String,
732 #[serde(default)]
733 pub max_token_count: u64,
734 #[serde(default)]
735 pub max_output_tokens: Option<u64>,
736 #[serde(default)]
737 pub supports_images: bool,
738 #[serde(default)]
739 pub supports_tools: bool,
740 #[serde(default)]
741 pub supports_thinking: bool,
742}
743
744#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
745pub struct LanguageModelAuthConfig {
746 pub env_var: Option<String>,
747 pub credential_label: Option<String>,
748}
749
750// Add to ExtensionManifest struct:
751pub struct ExtensionManifest {
752 // ... existing fields ...
753 #[serde(default)]
754 pub language_model_providers: BTreeMap<Arc<str>, LanguageModelProviderManifestEntry>,
755}
756```
757
758---
759
760## Migration Plan for Built-in Providers
761
762This section analyzes each built-in provider and what would be required to implement them as extensions.
763
764### Provider Comparison Matrix
765
766| Provider | API Style | Auth Method | Special Features | Migration Complexity |
767|----------|-----------|-------------|------------------|---------------------|
768| Anthropic | REST/SSE | API Key | Thinking, Caching, Tool signatures | High |
769| OpenAI | REST/SSE | API Key | Reasoning effort, Prompt caching | Medium |
770| Google | REST/SSE | API Key | Thinking, Tool signatures | High |
771| Ollama | REST/SSE | None (local) | Dynamic model discovery | Low |
772| DeepSeek | REST/SSE | API Key | Reasoning mode | Medium |
773| OpenRouter | REST/SSE | API Key | Reasoning details, Model routing | Medium |
774| LM Studio | REST/SSE | None (local) | OpenAI-compatible | Low |
775| Bedrock | AWS SDK | AWS Credentials | Multiple underlying providers | High |
776| Zed Cloud | Zed Auth | Zed Account | Proxied providers | N/A (keep built-in) |
777
778### Provider-by-Provider Analysis
779
780#### Anthropic (`provider/anthropic.rs`)
781
782**Current Implementation Highlights:**
783- Uses `anthropic` crate for API types and streaming
784- Custom event mapper (`AnthropicEventMapper`) for SSE → completion events
785- Supports thinking/reasoning with thought signatures
786- Prompt caching with cache control markers
787- Beta headers for experimental features
788
789**Extension Requirements:**
790- Full SSE parsing in WASM
791- Complex event mapping logic
792- Thinking content with signatures
793- Cache configuration reporting
794
795**Unique Challenges:**
796```rust
797// Thought signatures in tool use
798pub struct LanguageModelToolUse {
799 pub thought_signature: Option<String>, // Anthropic-specific
800}
801
802// Thinking events with signatures
803Thinking { text: String, signature: Option<String> }
804```
805
806**Migration Approach:**
8071. Port `anthropic` crate types to extension-compatible structures
8082. Implement SSE parser in extension (can use existing `fetch-stream`)
8093. Map Anthropic events to generic completion events
8104. Handle beta headers via custom HTTP headers
811
812#### OpenAI (`provider/open_ai.rs`)
813
814**Current Implementation Highlights:**
815- Uses `open_ai` crate for API types
816- Tiktoken-based token counting
817- Parallel tool calls support
818- Reasoning effort parameter (o1/o3 models)
819
820**Extension Requirements:**
821- SSE parsing (standard format)
822- Token counting (could call API or use simplified estimate)
823- Tool call aggregation across chunks
824
825**Unique Challenges:**
826```rust
827// Reasoning effort for o-series models
828pub reasoning_effort: Option<String>, // "low", "medium", "high"
829
830// Prompt cache key (preview feature)
831pub prompt_cache_key: Option<String>,
832```
833
834**Migration Approach:**
8351. Standard SSE parsing
8362. Token counting via API or tiktoken WASM port
8373. Support reasoning_effort as model-specific config
838
839#### Google/Gemini (`provider/google.rs`)
840
841**Current Implementation Highlights:**
842- Uses `google_ai` crate
843- Different API structure from OpenAI/Anthropic
844- Thinking support similar to Anthropic
845- Tool signatures in function calls
846
847**Extension Requirements:**
848- Different request/response format
849- Thinking content handling
850- Tool signature preservation
851
852**Unique Challenges:**
853```rust
854// Google uses different content structure
855enum ContentPart {
856 Text { text: String },
857 InlineData { mime_type: String, data: String },
858 FunctionCall { name: String, args: Value },
859 FunctionResponse { name: String, response: Value },
860}
861```
862
863**Migration Approach:**
8641. Implement Google-specific request building
8652. Map Google events to generic completion events
8663. Handle thinking/function call signatures
867
868#### Ollama (`provider/ollama.rs`)
869
870**Current Implementation Highlights:**
871- Local-only, no authentication needed
872- Dynamic model discovery via API
873- OpenAI-compatible chat endpoint
874- Simple streaming format
875
876**Extension Requirements:**
877- API URL configuration
878- Model list fetching
879- Basic streaming
880
881**Why This is a Good First Migration Target:**
882- No authentication complexity
883- Simple API format
884- Dynamic model discovery is isolated
885- Good test case for local provider pattern
886
887**Migration Approach:**
8881. Configuration for API URL
8892. Model discovery endpoint call
8903. OpenAI-compatible streaming
891
892#### DeepSeek (`provider/deepseek.rs`)
893
894**Current Implementation Highlights:**
895- OpenAI-compatible API with extensions
896- Reasoner model support
897- Different handling for reasoning vs standard models
898
899**Extension Requirements:**
900- API key authentication
901- Model-specific request modifications
902- Reasoning content handling
903
904**Migration Approach:**
9051. Standard OpenAI-compatible base
9062. Special handling for reasoner model
9073. Temperature disabled for reasoning
908
909#### OpenRouter (`provider/open_router.rs`)
910
911**Current Implementation Highlights:**
912- Aggregates multiple providers
913- Dynamic model fetching
914- Reasoning details preservation
915- Tool call signatures
916
917**Extension Requirements:**
918- API key authentication
919- Model list from API
920- Reasoning details in responses
921
922**Migration Approach:**
9231. Model discovery from API
9242. Standard OpenAI-compatible streaming
9253. Preserve reasoning_details in events
926
927#### LM Studio (`provider/lmstudio.rs`)
928
929**Current Implementation Highlights:**
930- Local-only, OpenAI-compatible
931- Model discovery from API
932- Simple configuration
933
934**Why This is a Good First Migration Target:**
935- No authentication
936- OpenAI-compatible (reusable streaming code)
937- Similar to Ollama
938
939#### Bedrock (`provider/bedrock.rs`)
940
941**Current Implementation Highlights:**
942- AWS SDK-based authentication
943- Multiple authentication methods (IAM, Profile, etc.)
944- Proxies to Claude, Llama, etc.
945
946**Extension Requirements:**
947- AWS credential handling (complex)
948- AWS Signature V4 signing
949- Region configuration
950
951**Why This Should Stay Built-in (Initially):**
952- AWS credential management is complex
953- SDK dependency not easily portable to WASM
954- Security implications of AWS credentials in extensions
955
956---
957
958## Testing Strategy
959
960### Unit Tests
961
962```rust
963// extension_host/src/wasm_host/llm_provider_tests.rs
964
965#[gpui::test]
966async fn test_extension_provider_registration(cx: &mut TestAppContext) {
967 // Load test extension with LLM provider
968 // Verify provider appears in registry
969 // Verify models are listed correctly
970}
971
972#[gpui::test]
973async fn test_extension_streaming_completion(cx: &mut TestAppContext) {
974 // Create mock HTTP server
975 // Load extension
976 // Send completion request
977 // Verify streaming events received correctly
978}
979
980#[gpui::test]
981async fn test_extension_tool_calling(cx: &mut TestAppContext) {
982 // Test tool definitions are passed correctly
983 // Test tool use events are parsed
984 // Test tool results can be sent back
985}
986
987#[gpui::test]
988async fn test_extension_credential_management(cx: &mut TestAppContext) {
989 // Test credential storage
990 // Test credential retrieval
991 // Test authentication state
992}
993
994#[gpui::test]
995async fn test_extension_error_handling(cx: &mut TestAppContext) {
996 // Test API errors are propagated correctly
997 // Test rate limiting is handled
998 // Test network errors are handled
999}
1000```
1001
1002### Integration Tests
1003
1004```rust
1005// crates/extension_host/src/extension_store_test.rs (additions)
1006
1007#[gpui::test]
1008async fn test_llm_extension_lifecycle(cx: &mut TestAppContext) {
1009 // Install extension with LLM provider
1010 // Verify provider registered
1011 // Configure credentials
1012 // Make completion request
1013 // Uninstall extension
1014 // Verify provider unregistered
1015}
1016```
1017
1018### Manual Testing Checklist
1019
10201. **Provider Discovery**
1021 - [ ] Extension provider appears in model selector
1022 - [ ] Provider icon displays correctly
1023 - [ ] Models list correctly
1024
10252. **Authentication**
1026 - [ ] API key prompt appears when not authenticated
1027 - [ ] API key is stored securely
1028 - [ ] Environment variable fallback works
1029 - [ ] "Reset credentials" works
1030
10313. **Completions**
1032 - [ ] Basic text completion works
1033 - [ ] Streaming is smooth (no jank)
1034 - [ ] Long responses complete successfully
1035 - [ ] Cancellation works
1036
10374. **Advanced Features**
1038 - [ ] Tool calling works (Agent panel)
1039 - [ ] Image inputs work (if supported)
1040 - [ ] Thinking/reasoning displays correctly
1041
10425. **Error Handling**
1043 - [ ] Invalid API key shows error
1044 - [ ] Rate limiting shows appropriate message
1045 - [ ] Network errors are handled gracefully
1046
10476. **Performance**
1048 - [ ] First token latency acceptable (<500ms overhead)
1049 - [ ] Memory usage reasonable
1050 - [ ] No memory leaks on repeated requests
1051
1052---
1053
1054## Security Considerations
1055
1056### Credential Handling
1057
10581. **Never expose raw credentials to WASM**
1059 - Extensions request credentials via import function
1060 - Zed stores credentials in secure storage (keychain/credential manager)
1061 - Extensions receive only "authenticated: true/false" status
1062
10632. **Credential scope isolation**
1064 - Each extension has its own credential namespace
1065 - Extensions cannot access other extensions' credentials
1066 - Provider ID is prefixed with extension ID
1067
10683. **Audit logging**
1069 - Log when credentials are accessed (not the values)
1070 - Log when credentials are modified
1071
1072### Network Access
1073
10741. **HTTP request validation**
1075 - Extensions already have HTTP access via `fetch` / `fetch-stream`
1076 - Consider domain allowlisting for LLM providers
1077 - Log outbound requests for debugging
1078
10792. **Request/Response inspection**
1080 - API keys in headers should be redacted in logs
1081 - Response bodies may contain sensitive data
1082
1083### Extension Sandbox
1084
10851. **WASM isolation**
1086 - Extensions run in WASM sandbox
1087 - Cannot access filesystem outside work directory
1088 - Cannot access other extensions' data
1089
10902. **Resource limits**
1091 - Memory limits per extension
1092 - CPU time limits (epoch-based interruption already exists)
1093 - Concurrent request limits
1094
1095### Capability Requirements
1096
1097```toml
1098# Extensions with LLM providers should declare:
1099[[capabilities]]
1100kind = "network:http"
1101domains = ["api.example.com"] # Optional domain restriction
1102
1103[[capabilities]]
1104kind = "credential:store"
1105```
1106
1107---
1108
1109## Appendix: Provider-Specific Requirements
1110
1111### A. Anthropic Implementation Details
1112
1113**Request Format:**
1114```json
1115{
1116 "model": "claude-sonnet-4-20250514",
1117 "max_tokens": 8192,
1118 "messages": [
1119 {"role": "user", "content": [{"type": "text", "text": "Hello"}]}
1120 ],
1121 "system": [{"type": "text", "text": "You are helpful"}],
1122 "tools": [...],
1123 "thinking": {"type": "enabled", "budget_tokens": 10000}
1124}
1125```
1126
1127**SSE Events:**
1128- `message_start` - Contains message ID, model, usage
1129- `content_block_start` - Starts text/tool_use/thinking block
1130- `content_block_delta` - Incremental content (text_delta, input_json_delta, thinking_delta)
1131- `content_block_stop` - Block complete
1132- `message_delta` - Stop reason, final usage
1133- `message_stop` - End of message
1134
1135**Special Considerations:**
1136- Beta headers for thinking: `anthropic-beta: interleaved-thinking-2025-05-14`
1137- Cache control markers in messages
1138- Thought signatures on tool uses
1139
1140### B. OpenAI Implementation Details
1141
1142**Request Format:**
1143```json
1144{
1145 "model": "gpt-4o",
1146 "messages": [
1147 {"role": "system", "content": "You are helpful"},
1148 {"role": "user", "content": "Hello"}
1149 ],
1150 "stream": true,
1151 "tools": [...],
1152 "max_completion_tokens": 4096
1153}
1154```
1155
1156**SSE Events:**
1157```
1158data: {"choices":[{"delta":{"content":"Hello"}}]}
1159data: {"choices":[{"delta":{"tool_calls":[...]}}]}
1160data: [DONE]
1161```
1162
1163**Special Considerations:**
1164- `reasoning_effort` for o-series models
1165- `parallel_tool_calls` option
1166- Token counting via tiktoken
1167
1168### C. Google/Gemini Implementation Details
1169
1170**Request Format:**
1171```json
1172{
1173 "contents": [
1174 {"role": "user", "parts": [{"text": "Hello"}]}
1175 ],
1176 "generationConfig": {
1177 "maxOutputTokens": 8192,
1178 "temperature": 0.7
1179 },
1180 "tools": [...]
1181}
1182```
1183
1184**Response Format:**
1185```json
1186{
1187 "candidates": [{
1188 "content": {
1189 "parts": [
1190 {"text": "Response"},
1191 {"functionCall": {"name": "...", "args": {...}}}
1192 ]
1193 }
1194 }]
1195}
1196```
1197
1198**Special Considerations:**
1199- Different streaming format (not SSE, line-delimited JSON)
1200- Tool signatures in function calls
1201- Thinking support similar to Anthropic
1202
1203### D. OpenAI-Compatible Providers (Ollama, LM Studio, DeepSeek)
1204
1205These providers can share common implementation:
1206
1207**Shared Code:**
1208```rust
1209// In extension
1210fn stream_openai_compatible(
1211 api_url: &str,
1212 api_key: Option<&str>,
1213 request: CompletionRequest,
1214) -> Result<CompletionStream, String> {
1215 let request_body = build_openai_request(request);
1216 let stream = http_client::fetch_stream(HttpRequest {
1217 method: HttpMethod::Post,
1218 url: format!("{}/v1/chat/completions", api_url),
1219 headers: build_headers(api_key),
1220 body: Some(serde_json::to_vec(&request_body)?),
1221 redirect_policy: RedirectPolicy::NoFollow,
1222 })?;
1223
1224 Ok(OpenAiStreamParser::new(stream))
1225}
1226```
1227
1228### E. Example Extension: Simple OpenAI-Compatible Provider
1229
1230```rust
1231// src/my_provider.rs
1232use zed_extension_api::{self as zed, Result};
1233use zed_extension_api::http_client::{HttpMethod, HttpRequest, RedirectPolicy};
1234
1235struct MyLlmExtension {
1236 api_key: Option<String>,
1237}
1238
1239impl zed::Extension for MyLlmExtension {
1240 fn new() -> Self {
1241 Self { api_key: None }
1242 }
1243
1244 fn llm_providers(&self) -> Vec<zed::LlmProviderInfo> {
1245 vec![zed::LlmProviderInfo {
1246 id: "my-provider".into(),
1247 name: "My LLM Provider".into(),
1248 icon: Some("sparkle".into()),
1249 }]
1250 }
1251
1252 fn llm_provider_models(&self, provider_id: &str) -> Result<Vec<zed::LlmModelInfo>> {
1253 Ok(vec![
1254 zed::LlmModelInfo {
1255 id: "my-model".into(),
1256 name: "My Model".into(),
1257 max_token_count: 128000,
1258 max_output_tokens: Some(4096),
1259 capabilities: zed::LlmModelCapabilities {
1260 supports_images: true,
1261 supports_tools: true,
1262 ..Default::default()
1263 },
1264 is_default: true,
1265 is_default_fast: false,
1266 }
1267 ])
1268 }
1269
1270 fn llm_provider_is_authenticated(&self, _provider_id: &str) -> bool {
1271 self.api_key.is_some() || std::env::var("MY_API_KEY").is_ok()
1272 }
1273
1274 fn llm_provider_authenticate(&mut self, provider_id: &str) -> Result<()> {
1275 if let Some(key) = zed::llm_get_credential(provider_id)? {
1276 self.api_key = Some(key);
1277 return Ok(());
1278 }
1279
1280 if zed::llm_request_credential(
1281 provider_id,
1282 zed::CredentialType::ApiKey,
1283 "API Key",
1284 "Enter your API key",
1285 )? {
1286 self.api_key = zed::llm_get_credential(provider_id)?;
1287 }
1288
1289 Ok(())
1290 }
1291
1292 fn llm_stream_completion(
1293 &self,
1294 provider_id: &str,
1295 model_id: &str,
1296 request: zed::LlmCompletionRequest,
1297 ) -> Result<zed::LlmCompletionStream> {
1298 let api_key = self.api_key.as_ref()
1299 .or_else(|| std::env::var("MY_API_KEY").ok().as_ref())
1300 .ok_or("Not authenticated")?;
1301
1302 let body = serde_json::json!({
1303 "model": model_id,
1304 "messages": self.convert_messages(&request.messages),
1305 "stream": true,
1306 "max_tokens": request.max_tokens.unwrap_or(4096),
1307 });
1308
1309 let stream = HttpRequest::builder()
1310 .method(HttpMethod::Post)
1311 .url("https://api.my-provider.com/v1/chat/completions")
1312 .header("Authorization", format!("Bearer {}", api_key))
1313 .header("Content-Type", "application/json")
1314 .body(serde_json::to_vec(&body)?)
1315 .build()?
1316 .fetch_stream()?;
1317
1318 Ok(zed::LlmCompletionStream::new(OpenAiStreamParser::new(stream)))
1319 }
1320}
1321
1322zed::register_extension!(MyLlmExtension);
1323```
1324
1325---
1326
1327## Timeline Summary
1328
1329| Phase | Duration | Key Deliverables |
1330|-------|----------|------------------|
1331| 1. Foundation | 2-3 weeks | WIT interface, basic provider registration |
1332| 2. Streaming | 2-3 weeks | Efficient streaming across WASM boundary |
1333| 3. Full Features | 2-3 weeks | Tools, images, thinking support |
1334| 4. Credentials & UI | 1-2 weeks | Secure credentials, configuration UI |
1335| 5. Testing & Docs | 1-2 weeks | Tests, documentation, examples |
1336| 6. Migration (optional) | Ongoing | Migrate built-in providers |
1337
1338**Total estimated time: 8-13 weeks**
1339
1340---
1341
1342## Open Questions
1343
13441. **Streaming efficiency**: Is callback-based streaming feasible in WASM, or should we use polling?
1345
13462. **Token counting**: Should we require extensions to implement token counting, or provide a fallback estimation?
1347
13483. **Configuration UI**: Should extensions be able to provide custom UI components, or just JSON schema-driven forms?
1349
13504. **Provider priorities**: Should extension providers appear before or after built-in providers in the selector?
1351
13525. **Backward compatibility**: How do we handle extensions built against older WIT versions when adding new LLM features?
1353
13546. **Rate limiting**: Should the host help with rate limiting, or leave it entirely to extensions?
1355
1356---
1357
1358## Conclusion
1359
1360This plan provides a comprehensive roadmap for implementing Language Model Provider Extensions in Zed. The phased approach allows for incremental delivery of value while building toward full feature parity with built-in providers.
1361
1362The key architectural decisions are:
13631. **WIT-based interface** for WASM interop, consistent with existing extension patterns
13642. **Streaming via resources** to minimize WASM boundary crossing overhead
13653. **Host-managed credentials** for security
13664. **Manifest-based discovery** for static model information
1367
1368The migration analysis shows that simpler providers (Ollama, LM Studio) can be migrated first as proof of concept, while more complex providers (Anthropic, Bedrock) may remain built-in initially.