Language Model Provider Extensions Plan

Executive Summary

This document outlines a comprehensive plan to introduce Language Model Provider Extensions to Zed. This feature will allow third-party developers to create extensions that register new language model providers, enabling users to select and use custom language models in Zed's AI features (Agent, inline assist, commit message generation, etc.).

Current Architecture Overview
Goals and Requirements
Proposed Architecture
Implementation Phases
WIT Interface Design
Extension Manifest Changes
Migration Plan for Built-in Providers
Testing Strategy
Security Considerations
Appendix: Provider-Specific Requirements

Current Architecture Overview

Key Components

`language_model` crate (`crates/language_model/`)

LanguageModel trait (src/language_model.rs:580-718): Core trait defining model capabilities
- id(), name(), provider_id(), provider_name()
- supports_images(), supports_tools(), supports_tool_choice()
- max_token_count(), max_output_tokens()
- count_tokens() - async token counting
- stream_completion() - the main completion streaming method
- cache_configuration() - optional prompt caching config
LanguageModelProvider trait (src/language_model.rs:743-764): Provider registration
- id(), name(), icon()
- default_model(), default_fast_model()
- provided_models(), recommended_models()
- is_authenticated(), authenticate()
- configuration_view() - UI for provider configuration
- reset_credentials()
LanguageModelRegistry (src/registry.rs): Global registry for providers
- register_provider() / unregister_provider()
- Model selection and configuration
- Event emission for UI updates

`language_models` crate (`crates/language_models/`)

Contains all built-in provider implementations:

provider/anthropic.rs - Anthropic Claude models
provider/cloud.rs - Zed Cloud (proxied models)
provider/google.rs - Google Gemini models
provider/open_ai.rs - OpenAI GPT models
provider/ollama.rs - Local Ollama models
provider/deepseek.rs - DeepSeek models
provider/open_router.rs - OpenRouter aggregator
provider/bedrock.rs - AWS Bedrock
And more...

Extension System (`crates/extension_host/`, `crates/extension_api/`)

WIT interface (extension_api/wit/since_v0.6.0/): WebAssembly Interface Types definitions
WASM host (extension_host/src/wasm_host.rs): Executes extension WASM modules
Extension trait (extension/src/extension.rs): Rust trait for extensions
HTTP client (extension_api/src/http_client.rs): Existing HTTP capability for extensions

Request/Response Flow

User Request
    ↓
LanguageModelRequest (crates/language_model/src/request.rs)
    ↓
Provider-specific conversion (e.g., into_anthropic(), into_open_ai())
    ↓
HTTP API call (provider-specific crate)
    ↓
Stream of provider-specific events
    ↓
Event mapping to LanguageModelCompletionEvent
    ↓
Consumer (Agent, Inline Assist, etc.)

Key Data Structures

// Request
pub struct LanguageModelRequest {
    pub thread_id: Option<String>,
    pub prompt_id: Option<String>,
    pub intent: Option<CompletionIntent>,
    pub mode: Option<CompletionMode>,
    pub messages: Vec<LanguageModelRequestMessage>,
    pub tools: Vec<LanguageModelRequestTool>,
    pub tool_choice: Option<LanguageModelToolChoice>,
    pub stop: Vec<String>,
    pub temperature: Option<f32>,
    pub thinking_allowed: bool,
}

// Completion Events
pub enum LanguageModelCompletionEvent {
    Queued { position: usize },
    Started,
    UsageUpdated { amount: usize, limit: usize },
    ToolUseLimitReached,
    Stop(StopReason),
    Text(String),
    Thinking { text: String, signature: Option<String> },
    RedactedThinking { data: String },
    ToolUse(LanguageModelToolUse),
    ToolUseJsonParseError { ... },
    StartMessage { message_id: Option<String> },
    ReasoningDetails(serde_json::Value),
    UsageUpdate(TokenUsage),
}

Goals and Requirements

Primary Goals

Extensibility: Allow any developer to add new LLM providers via extensions
Parity: Extension-based providers should have feature parity with built-in providers
Performance: Minimize overhead from WASM boundary crossings during streaming
Security: Sandbox API key handling and network access appropriately
User Experience: Seamless integration with existing model selectors and configuration UI

Functional Requirements

Extensions can register one or more language model providers
Extensions can define multiple models per provider
Extensions handle authentication (API keys, OAuth, etc.)
Extensions implement the streaming completion API
Extensions can specify model capabilities (tools, images, thinking, etc.)
Extensions can provide token counting logic
Extensions can provide configuration UI components
Extensions receive full request context for API customization

Non-Functional Requirements

Streaming should feel as responsive as built-in providers
Extension crashes should not crash Zed
API keys should never be logged or exposed
Extensions should be able to make arbitrary HTTP requests
Settings should persist across sessions

Proposed Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                         Zed Application                          │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                  LanguageModelRegistry                       ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  ││
│  │  │ Built-in     │  │ Extension    │  │ Extension        │  ││
│  │  │ Providers    │  │ Provider A   │  │ Provider B       │  ││
│  │  │ (Anthropic,  │  │ (WASM)       │  │ (WASM)           │  ││
│  │  │  OpenAI...)  │  │              │  │                  │  ││
│  │  └──────────────┘  └──────────────┘  └──────────────────┘  ││
│  └─────────────────────────────────────────────────────────────┘│
│                              ↑                                   │
│                              │                                   │
│  ┌───────────────────────────┴─────────────────────────────────┐│
│  │              ExtensionLanguageModelProvider                  ││
│  │  ┌─────────────────────────────────────────────────────────┐││
│  │  │ • Bridges WASM extension to LanguageModelProvider trait │││
│  │  │ • Manages streaming across WASM boundary                │││
│  │  │ • Handles credential storage via credentials_provider   │││
│  │  │ • Provides configuration UI scaffolding                 │││
│  │  └─────────────────────────────────────────────────────────┘││
│  └─────────────────────────────────────────────────────────────┘│
│                              ↑                                   │
│  ┌───────────────────────────┴─────────────────────────────────┐│
│  │                    WasmHost / WasmExtension                  ││
│  │  • Executes WASM module                                      ││
│  │  • Provides WIT interface for LLM operations                 ││
│  │  • HTTP client for API calls                                 ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

New Components

1. `ExtensionLanguageModelProvider`

A new struct in extension_host that implements LanguageModelProvider and wraps a WASM extension:

pub struct ExtensionLanguageModelProvider {
    extension: WasmExtension,
    provider_info: ExtensionLlmProviderInfo,
    state: Entity<ExtensionLlmProviderState>,
}

struct ExtensionLlmProviderState {
    is_authenticated: bool,
    available_models: Vec<ExtensionLanguageModel>,
}

2. `ExtensionLanguageModel`

Implements LanguageModel trait, delegating to WASM calls:

pub struct ExtensionLanguageModel {
    extension: WasmExtension,
    model_info: ExtensionLlmModelInfo,
    provider_id: LanguageModelProviderId,
}

3. WIT Interface Extensions

New WIT definitions for LLM provider functionality (see WIT Interface Design).

Implementation Phases

Phase 1: Foundation (2-3 weeks)

Goal: Establish the core infrastructure for extension-based LLM providers.

Tasks

Define WIT interface for LLM providers (extension_api/wit/since_v0.7.0/llm-provider.wit)
- Provider metadata (id, name, icon)
- Model definitions (id, name, capabilities, limits)
- Credential management hooks
- Completion request/response types
Create ExtensionLanguageModelProvider (extension_host/src/wasm_host/llm_provider.rs)
- Implement LanguageModelProvider trait
- Handle provider registration/unregistration
- Basic authentication state management
Create ExtensionLanguageModel (extension_host/src/wasm_host/llm_model.rs)
- Implement LanguageModel trait
- Simple synchronous completion (non-streaming initially)
Update ExtensionManifest (extension/src/extension_manifest.rs)
- Add language_model_providers field
- Parse provider configuration from extension.toml
Update extension loading (extension_host/src/extension_host.rs)
- Detect LLM provider declarations in manifest
- Register providers with LanguageModelRegistry

Deliverables

Extensions can register a provider that appears in model selector
Basic (non-streaming) completions work
Manual testing with a test extension

Phase 2: Streaming Support (2-3 weeks)

Goal: Enable efficient streaming completions across the WASM boundary.

Tasks

Design streaming protocol
- Option A: Chunked responses via repeated WASM calls
- Option B: Callback-based streaming (preferred)
- Option C: Shared memory buffer with polling

Implement streaming in WIT

resource completion-stream {
    next-event: func() -> result<option<completion-event>, string>;
}

export stream-completion: func(
    provider-id: string,
    model-id: string,
    request: completion-request
) -> result<completion-stream, string>;

Implement http-response-stream integration
- Extensions already have access to fetch-stream
- Need to parse SSE/chunked responses in WASM
- Map to completion events
Update ExtensionLanguageModel::stream_completion
- Bridge WASM completion-stream to Rust BoxStream
- Handle backpressure and cancellation
Performance optimization
- Batch small events to reduce WASM boundary crossings
- Consider using shared memory for large payloads

Deliverables

Streaming completions work with acceptable latency
Performance benchmarks vs built-in providers

Phase 3: Full Feature Parity (2-3 weeks)

Goal: Support all advanced features that built-in providers have.

Tasks

Tool/Function calling support
- Add tool definitions to request
- Parse tool use events from response
- Handle tool results in follow-up requests
Image support
- Pass image data in messages
- Handle base64 encoding/size limits
Thinking/reasoning support (for Claude-like models)
- Thinking and RedactedThinking events
- Thought signatures for tool calls
Token counting
- WIT interface for count_tokens
- Allow extensions to provide custom tokenizers or call API
Prompt caching configuration
- Cache control markers in messages
- Cache configuration reporting
Rate limiting and error handling
- Standard error types in WIT
- Retry-after headers
- Rate limit events

Deliverables

Extension providers can use tools
Extension providers can process images
Full error handling parity

Phase 4: Credential Management & Configuration UI (1-2 weeks)

Goal: Secure credential storage and user-friendly configuration.

Tasks

Credential storage integration
- Use existing credentials_provider crate
- Extensions request credentials via WIT
- Credentials never exposed to WASM directly (only "is_authenticated" status)

API key input flow

import request-credential: func(
    credential-type: credential-type,
    label: string,
    placeholder: string
) -> result<bool, string>;

Configuration view scaffolding
- Generic configuration view that works for most providers
- Extensions can provide additional settings via JSON schema
- Settings stored in extension-specific namespace
Environment variable support
- Allow specifying env var names for API keys
- Read from environment on startup

Deliverables

Secure API key storage
Configuration UI for extension providers
Environment variable fallback

Phase 5: Testing & Documentation (1-2 weeks)

Goal: Comprehensive testing and developer documentation.

Tasks

Integration tests
- Test extension loading and registration
- Test streaming completions
- Test error handling
- Test credential management
Performance tests
- Latency benchmarks
- Memory usage under load
- Comparison with built-in providers
Example extensions
- Simple OpenAI-compatible provider
- Provider with custom authentication
- Provider with tool support
Documentation
- Extension developer guide
- API reference
- Migration guide for custom providers

Deliverables

Full test coverage
Published documentation
Example extensions in extensions/ directory

Phase 6: Migration of Built-in Providers (Optional, Long-term)

Goal: Prove the extension system by migrating one or more built-in providers.

Tasks

Select candidate provider (suggest: Ollama or LM Studio - simplest API)
Create extension version
Feature parity testing
Performance comparison
**Gradual rollout (feature flag)

WIT Interface Design

New File: `extension_api/wit/since_v0.7.0/llm-provider.wit`

interface llm-provider {
    /// Information about a language model provider
    record provider-info {
        /// Unique identifier for the provider (e.g., "my-extension.my-provider")
        id: string,
        /// Display name for the provider
        name: string,
        /// Icon name from Zed's icon set (optional)
        icon: option<string>,
    }

    /// Capabilities of a language model
    record model-capabilities {
        /// Whether the model supports image inputs
        supports-images: bool,
        /// Whether the model supports tool/function calling
        supports-tools: bool,
        /// Whether the model supports tool choice (auto/any/none)
        supports-tool-choice-auto: bool,
        supports-tool-choice-any: bool,
        supports-tool-choice-none: bool,
        /// Whether the model supports extended thinking
        supports-thinking: bool,
        /// The format for tool input schemas
        tool-input-format: tool-input-format,
    }

    /// Format for tool input schemas
    enum tool-input-format {
        json-schema,
        simplified,
    }

    /// Information about a specific model
    record model-info {
        /// Unique identifier for the model
        id: string,
        /// Display name for the model
        name: string,
        /// Maximum input token count
        max-token-count: u64,
        /// Maximum output tokens (optional)
        max-output-tokens: option<u64>,
        /// Model capabilities
        capabilities: model-capabilities,
        /// Whether this is the default model for the provider
        is-default: bool,
        /// Whether this is the default fast model
        is-default-fast: bool,
    }

    /// A message in a completion request
    record request-message {
        role: message-role,
        content: list<message-content>,
        cache: bool,
    }

    enum message-role {
        user,
        assistant,
        system,
    }

    /// Content within a message
    variant message-content {
        text(string),
        image(image-data),
        tool-use(tool-use),
        tool-result(tool-result),
        thinking(thinking-content),
        redacted-thinking(string),
    }

    record image-data {
        /// Base64-encoded image data
        source: string,
        /// Estimated dimensions
        width: option<u32>,
        height: option<u32>,
    }

    record tool-use {
        id: string,
        name: string,
        input: string, // JSON string
        thought-signature: option<string>,
    }

    record tool-result {
        tool-use-id: string,
        tool-name: string,
        is-error: bool,
        content: tool-result-content,
    }

    variant tool-result-content {
        text(string),
        image(image-data),
    }

    record thinking-content {
        text: string,
        signature: option<string>,
    }

    /// A tool definition
    record tool-definition {
        name: string,
        description: string,
        /// JSON Schema for input parameters
        input-schema: string,
    }

    /// Tool choice preference
    enum tool-choice {
        auto,
        any,
        none,
    }

    /// A completion request
    record completion-request {
        messages: list<request-message>,
        tools: list<tool-definition>,
        tool-choice: option<tool-choice>,
        stop-sequences: list<string>,
        temperature: option<f32>,
        thinking-allowed: bool,
        /// Maximum tokens to generate
        max-tokens: option<u64>,
    }

    /// Events emitted during completion streaming
    variant completion-event {
        /// Completion has started
        started,
        /// Text content
        text(string),
        /// Thinking/reasoning content
        thinking(thinking-content),
        /// Redacted thinking (encrypted)
        redacted-thinking(string),
        /// Tool use request
        tool-use(tool-use),
        /// Completion stopped
        stop(stop-reason),
        /// Token usage update
        usage(token-usage),
    }

    enum stop-reason {
        end-turn,
        max-tokens,
        tool-use,
    }

    record token-usage {
        input-tokens: u64,
        output-tokens: u64,
        cache-creation-input-tokens: option<u64>,
        cache-read-input-tokens: option<u64>,
    }

    /// A streaming completion response
    resource completion-stream {
        /// Get the next event from the stream.
        /// Returns None when the stream is complete.
        next-event: func() -> result<option<completion-event>, string>;
    }

    /// Credential types that can be requested
    enum credential-type {
        api-key,
        oauth-token,
    }
}

Updates to `extension_api/wit/since_v0.7.0/extension.wit`

world extension {
    // ... existing imports ...
    import llm-provider;
    
    use llm-provider.{
        provider-info, model-info, completion-request, 
        completion-stream, credential-type
    };

    /// Returns information about language model providers offered by this extension
    export llm-providers: func() -> list<provider-info>;

    /// Returns the models available for a provider
    export llm-provider-models: func(provider-id: string) -> result<list<model-info>, string>;

    /// Check if the provider is authenticated
    export llm-provider-is-authenticated: func(provider-id: string) -> bool;

    /// Attempt to authenticate the provider
    export llm-provider-authenticate: func(provider-id: string) -> result<_, string>;

    /// Reset credentials for the provider
    export llm-provider-reset-credentials: func(provider-id: string) -> result<_, string>;

    /// Count tokens for a request
    export llm-count-tokens: func(
        provider-id: string, 
        model-id: string, 
        request: completion-request
    ) -> result<u64, string>;

    /// Stream a completion
    export llm-stream-completion: func(
        provider-id: string,
        model-id: string,
        request: completion-request
    ) -> result<completion-stream, string>;

    /// Request a credential from the user
    import llm-request-credential: func(
        provider-id: string,
        credential-type: credential-type,
        label: string,
        placeholder: string
    ) -> result<bool, string>;

    /// Get a stored credential
    import llm-get-credential: func(provider-id: string) -> option<string>;

    /// Store a credential
    import llm-store-credential: func(provider-id: string, value: string) -> result<_, string>;

    /// Delete a stored credential
    import llm-delete-credential: func(provider-id: string) -> result<_, string>;
}

Extension Manifest Changes

Updated `extension.toml` Schema

id = "my-llm-extension"
name = "My LLM Provider"
description = "Adds support for My LLM API"
version = "1.0.0"
schema_version = 1
authors = ["Developer <dev@example.com>"]
repository = "https://github.com/example/my-llm-extension"

[lib]
kind = "rust"
version = "0.7.0"

# New section for LLM providers
[language_model_providers.my-provider]
name = "My LLM"
icon = "sparkle"  # Optional, from Zed's icon set

# Optional: Default models to show even before API connection
[[language_model_providers.my-provider.models]]
id = "my-model-large"
name = "My Model Large"
max_token_count = 200000
max_output_tokens = 8192
supports_images = true
supports_tools = true

[[language_model_providers.my-provider.models]]
id = "my-model-small"
name = "My Model Small"
max_token_count = 100000
max_output_tokens = 4096
supports_images = false
supports_tools = true

# Optional: Environment variable for API key
[language_model_providers.my-provider.auth]
env_var = "MY_LLM_API_KEY"
credential_label = "API Key"

`ExtensionManifest` Changes

// In extension/src/extension_manifest.rs

#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
pub struct LanguageModelProviderManifestEntry {
    pub name: String,
    #[serde(default)]
    pub icon: Option<String>,
    #[serde(default)]
    pub models: Vec<LanguageModelManifestEntry>,
    #[serde(default)]
    pub auth: Option<LanguageModelAuthConfig>,
}

#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
pub struct LanguageModelManifestEntry {
    pub id: String,
    pub name: String,
    #[serde(default)]
    pub max_token_count: u64,
    #[serde(default)]
    pub max_output_tokens: Option<u64>,
    #[serde(default)]
    pub supports_images: bool,
    #[serde(default)]
    pub supports_tools: bool,
    #[serde(default)]
    pub supports_thinking: bool,
}

#[derive(Clone, Default, PartialEq, Eq, Debug, Deserialize, Serialize)]
pub struct LanguageModelAuthConfig {
    pub env_var: Option<String>,
    pub credential_label: Option<String>,
}

// Add to ExtensionManifest struct:
pub struct ExtensionManifest {
    // ... existing fields ...
    #[serde(default)]
    pub language_model_providers: BTreeMap<Arc<str>, LanguageModelProviderManifestEntry>,
}

Migration Plan for Built-in Providers

This section analyzes each built-in provider and what would be required to implement them as extensions.

Provider Comparison Matrix

Provider	API Style	Auth Method	Special Features	Migration Complexity
Anthropic	REST/SSE	API Key	Thinking, Caching, Tool signatures	High
OpenAI	REST/SSE	API Key	Reasoning effort, Prompt caching	Medium
Google	REST/SSE	API Key	Thinking, Tool signatures	High
Ollama	REST/SSE	None (local)	Dynamic model discovery	Low
DeepSeek	REST/SSE	API Key	Reasoning mode	Medium
OpenRouter	REST/SSE	API Key	Reasoning details, Model routing	Medium
LM Studio	REST/SSE	None (local)	OpenAI-compatible	Low
Bedrock	AWS SDK	AWS Credentials	Multiple underlying providers	High
Zed Cloud	Zed Auth	Zed Account	Proxied providers	N/A (keep built-in)

Provider-by-Provider Analysis

Anthropic (`provider/anthropic.rs`)

Current Implementation Highlights:

Uses anthropic crate for API types and streaming
Custom event mapper (AnthropicEventMapper) for SSE → completion events
Supports thinking/reasoning with thought signatures
Prompt caching with cache control markers
Beta headers for experimental features

Extension Requirements:

Full SSE parsing in WASM
Complex event mapping logic
Thinking content with signatures
Cache configuration reporting

Unique Challenges:

// Thought signatures in tool use
pub struct LanguageModelToolUse {
    pub thought_signature: Option<String>, // Anthropic-specific
}

// Thinking events with signatures
Thinking { text: String, signature: Option<String> }

Migration Approach:

Port anthropic crate types to extension-compatible structures
Implement SSE parser in extension (can use existing fetch-stream)
Map Anthropic events to generic completion events
Handle beta headers via custom HTTP headers

OpenAI (`provider/open_ai.rs`)

Current Implementation Highlights:

Uses open_ai crate for API types
Tiktoken-based token counting
Parallel tool calls support
Reasoning effort parameter (o1/o3 models)

Extension Requirements:

SSE parsing (standard format)
Token counting (could call API or use simplified estimate)
Tool call aggregation across chunks

Unique Challenges:

// Reasoning effort for o-series models
pub reasoning_effort: Option<String>, // "low", "medium", "high"

// Prompt cache key (preview feature)
pub prompt_cache_key: Option<String>,

Migration Approach:

Standard SSE parsing
Token counting via API or tiktoken WASM port
Support reasoning_effort as model-specific config

Google/Gemini (`provider/google.rs`)

Current Implementation Highlights:

Uses google_ai crate
Different API structure from OpenAI/Anthropic
Thinking support similar to Anthropic
Tool signatures in function calls

Extension Requirements:

Different request/response format
Thinking content handling
Tool signature preservation

Unique Challenges:

// Google uses different content structure
enum ContentPart {
    Text { text: String },
    InlineData { mime_type: String, data: String },
    FunctionCall { name: String, args: Value },
    FunctionResponse { name: String, response: Value },
}

Migration Approach:

Implement Google-specific request building
Map Google events to generic completion events
Handle thinking/function call signatures

Ollama (`provider/ollama.rs`)

Current Implementation Highlights:

Local-only, no authentication needed
Dynamic model discovery via API
OpenAI-compatible chat endpoint
Simple streaming format

Extension Requirements:

API URL configuration
Model list fetching
Basic streaming

Why This is a Good First Migration Target:

No authentication complexity
Simple API format
Dynamic model discovery is isolated
Good test case for local provider pattern

Migration Approach:

Configuration for API URL
Model discovery endpoint call
OpenAI-compatible streaming

DeepSeek (`provider/deepseek.rs`)

Current Implementation Highlights:

OpenAI-compatible API with extensions
Reasoner model support
Different handling for reasoning vs standard models

Extension Requirements:

API key authentication
Model-specific request modifications
Reasoning content handling

Migration Approach:

Standard OpenAI-compatible base
Special handling for reasoner model
Temperature disabled for reasoning

OpenRouter (`provider/open_router.rs`)

Current Implementation Highlights:

Aggregates multiple providers
Dynamic model fetching
Reasoning details preservation
Tool call signatures

Extension Requirements:

API key authentication
Model list from API
Reasoning details in responses

Migration Approach:

Model discovery from API
Standard OpenAI-compatible streaming
Preserve reasoning_details in events

LM Studio (`provider/lmstudio.rs`)

Current Implementation Highlights:

Local-only, OpenAI-compatible
Model discovery from API
Simple configuration

Why This is a Good First Migration Target:

No authentication
OpenAI-compatible (reusable streaming code)
Similar to Ollama

Bedrock (`provider/bedrock.rs`)

Current Implementation Highlights:

AWS SDK-based authentication
Multiple authentication methods (IAM, Profile, etc.)
Proxies to Claude, Llama, etc.

Extension Requirements:

AWS credential handling (complex)
AWS Signature V4 signing
Region configuration

Why This Should Stay Built-in (Initially):

AWS credential management is complex
SDK dependency not easily portable to WASM
Security implications of AWS credentials in extensions

Testing Strategy

Unit Tests

// extension_host/src/wasm_host/llm_provider_tests.rs

#[gpui::test]
async fn test_extension_provider_registration(cx: &mut TestAppContext) {
    // Load test extension with LLM provider
    // Verify provider appears in registry
    // Verify models are listed correctly
}

#[gpui::test]
async fn test_extension_streaming_completion(cx: &mut TestAppContext) {
    // Create mock HTTP server
    // Load extension
    // Send completion request
    // Verify streaming events received correctly
}

#[gpui::test]
async fn test_extension_tool_calling(cx: &mut TestAppContext) {
    // Test tool definitions are passed correctly
    // Test tool use events are parsed
    // Test tool results can be sent back
}

#[gpui::test]
async fn test_extension_credential_management(cx: &mut TestAppContext) {
    // Test credential storage
    // Test credential retrieval
    // Test authentication state
}

#[gpui::test]
async fn test_extension_error_handling(cx: &mut TestAppContext) {
    // Test API errors are propagated correctly
    // Test rate limiting is handled
    // Test network errors are handled
}

Integration Tests

// crates/extension_host/src/extension_store_test.rs (additions)

#[gpui::test]
async fn test_llm_extension_lifecycle(cx: &mut TestAppContext) {
    // Install extension with LLM provider
    // Verify provider registered
    // Configure credentials
    // Make completion request
    // Uninstall extension
    // Verify provider unregistered
}

Manual Testing Checklist

Provider Discovery
- Extension provider appears in model selector
- Provider icon displays correctly
- Models list correctly
Authentication
- API key prompt appears when not authenticated
- API key is stored securely
- Environment variable fallback works
- "Reset credentials" works
Completions
- Basic text completion works
- Streaming is smooth (no jank)
- Long responses complete successfully
- Cancellation works
Advanced Features
- Tool calling works (Agent panel)
- Image inputs work (if supported)
- Thinking/reasoning displays correctly
Error Handling
- Invalid API key shows error
- Rate limiting shows appropriate message
- Network errors are handled gracefully
Performance
- First token latency acceptable (<500ms overhead)
- Memory usage reasonable
- No memory leaks on repeated requests

Security Considerations

Credential Handling

Never expose raw credentials to WASM
- Extensions request credentials via import function
- Zed stores credentials in secure storage (keychain/credential manager)
- Extensions receive only "authenticated: true/false" status
Credential scope isolation
- Each extension has its own credential namespace
- Extensions cannot access other extensions' credentials
- Provider ID is prefixed with extension ID
Audit logging
- Log when credentials are accessed (not the values)
- Log when credentials are modified

Network Access

HTTP request validation
- Extensions already have HTTP access via fetch / fetch-stream
- Consider domain allowlisting for LLM providers
- Log outbound requests for debugging
Request/Response inspection
- API keys in headers should be redacted in logs
- Response bodies may contain sensitive data

Extension Sandbox

WASM isolation
- Extensions run in WASM sandbox
- Cannot access filesystem outside work directory
- Cannot access other extensions' data
Resource limits
- Memory limits per extension
- CPU time limits (epoch-based interruption already exists)
- Concurrent request limits

Capability Requirements

# Extensions with LLM providers should declare:
[[capabilities]]
kind = "network:http"
domains = ["api.example.com"]  # Optional domain restriction

[[capabilities]]
kind = "credential:store"

Appendix: Provider-Specific Requirements

A. Anthropic Implementation Details

Request Format:

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 8192,
  "messages": [
    {"role": "user", "content": [{"type": "text", "text": "Hello"}]}
  ],
  "system": [{"type": "text", "text": "You are helpful"}],
  "tools": [...],
  "thinking": {"type": "enabled", "budget_tokens": 10000}
}

SSE Events:

message_start - Contains message ID, model, usage
content_block_start - Starts text/tool_use/thinking block
content_block_delta - Incremental content (text_delta, input_json_delta, thinking_delta)
content_block_stop - Block complete
message_delta - Stop reason, final usage
message_stop - End of message

Special Considerations:

Beta headers for thinking: anthropic-beta: interleaved-thinking-2025-05-14
Cache control markers in messages
Thought signatures on tool uses

B. OpenAI Implementation Details

Request Format:

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "Hello"}
  ],
  "stream": true,
  "tools": [...],
  "max_completion_tokens": 4096
}

SSE Events:

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"tool_calls":[...]}}]}
data: [DONE]

Special Considerations:

reasoning_effort for o-series models
parallel_tool_calls option
Token counting via tiktoken

C. Google/Gemini Implementation Details

Request Format:

{
  "contents": [
    {"role": "user", "parts": [{"text": "Hello"}]}
  ],
  "generationConfig": {
    "maxOutputTokens": 8192,
    "temperature": 0.7
  },
  "tools": [...]
}

Response Format:

{
  "candidates": [{
    "content": {
      "parts": [
        {"text": "Response"},
        {"functionCall": {"name": "...", "args": {...}}}
      ]
    }
  }]
}

Special Considerations:

Different streaming format (not SSE, line-delimited JSON)
Tool signatures in function calls
Thinking support similar to Anthropic

D. OpenAI-Compatible Providers (Ollama, LM Studio, DeepSeek)

These providers can share common implementation:

Shared Code:

// In extension
fn stream_openai_compatible(
    api_url: &str,
    api_key: Option<&str>,
    request: CompletionRequest,
) -> Result<CompletionStream, String> {
    let request_body = build_openai_request(request);
    let stream = http_client::fetch_stream(HttpRequest {
        method: HttpMethod::Post,
        url: format!("{}/v1/chat/completions", api_url),
        headers: build_headers(api_key),
        body: Some(serde_json::to_vec(&request_body)?),
        redirect_policy: RedirectPolicy::NoFollow,
    })?;
    
    Ok(OpenAiStreamParser::new(stream))
}

E. Example Extension: Simple OpenAI-Compatible Provider

// src/my_provider.rs
use zed_extension_api::{self as zed, Result};
use zed_extension_api::http_client::{HttpMethod, HttpRequest, RedirectPolicy};

struct MyLlmExtension {
    api_key: Option<String>,
}

impl zed::Extension for MyLlmExtension {
    fn new() -> Self {
        Self { api_key: None }
    }

    fn llm_providers(&self) -> Vec<zed::LlmProviderInfo> {
        vec![zed::LlmProviderInfo {
            id: "my-provider".into(),
            name: "My LLM Provider".into(),
            icon: Some("sparkle".into()),
        }]
    }

    fn llm_provider_models(&self, provider_id: &str) -> Result<Vec<zed::LlmModelInfo>> {
        Ok(vec![
            zed::LlmModelInfo {
                id: "my-model".into(),
                name: "My Model".into(),
                max_token_count: 128000,
                max_output_tokens: Some(4096),
                capabilities: zed::LlmModelCapabilities {
                    supports_images: true,
                    supports_tools: true,
                    ..Default::default()
                },
                is_default: true,
                is_default_fast: false,
            }
        ])
    }

    fn llm_provider_is_authenticated(&self, _provider_id: &str) -> bool {
        self.api_key.is_some() || std::env::var("MY_API_KEY").is_ok()
    }

    fn llm_provider_authenticate(&mut self, provider_id: &str) -> Result<()> {
        if let Some(key) = zed::llm_get_credential(provider_id)? {
            self.api_key = Some(key);
            return Ok(());
        }
        
        if zed::llm_request_credential(
            provider_id,
            zed::CredentialType::ApiKey,
            "API Key",
            "Enter your API key",
        )? {
            self.api_key = zed::llm_get_credential(provider_id)?;
        }
        
        Ok(())
    }

    fn llm_stream_completion(
        &self,
        provider_id: &str,
        model_id: &str,
        request: zed::LlmCompletionRequest,
    ) -> Result<zed::LlmCompletionStream> {
        let api_key = self.api_key.as_ref()
            .or_else(|| std::env::var("MY_API_KEY").ok().as_ref())
            .ok_or("Not authenticated")?;

        let body = serde_json::json!({
            "model": model_id,
            "messages": self.convert_messages(&request.messages),
            "stream": true,
            "max_tokens": request.max_tokens.unwrap_or(4096),
        });

        let stream = HttpRequest::builder()
            .method(HttpMethod::Post)
            .url("https://api.my-provider.com/v1/chat/completions")
            .header("Authorization", format!("Bearer {}", api_key))
            .header("Content-Type", "application/json")
            .body(serde_json::to_vec(&body)?)
            .build()?
            .fetch_stream()?;

        Ok(zed::LlmCompletionStream::new(OpenAiStreamParser::new(stream)))
    }
}

zed::register_extension!(MyLlmExtension);

Timeline Summary

Phase	Duration	Key Deliverables
1. Foundation	2-3 weeks	WIT interface, basic provider registration
2. Streaming	2-3 weeks	Efficient streaming across WASM boundary
3. Full Features	2-3 weeks	Tools, images, thinking support
4. Credentials & UI	1-2 weeks	Secure credentials, configuration UI
5. Testing & Docs	1-2 weeks	Tests, documentation, examples
6. Migration (optional)	Ongoing	Migrate built-in providers

Total estimated time: 8-13 weeks

Open Questions

Streaming efficiency: Is callback-based streaming feasible in WASM, or should we use polling?
Token counting: Should we require extensions to implement token counting, or provide a fallback estimation?
Configuration UI: Should extensions be able to provide custom UI components, or just JSON schema-driven forms?
Provider priorities: Should extension providers appear before or after built-in providers in the selector?
Backward compatibility: How do we handle extensions built against older WIT versions when adding new LLM features?
Rate limiting: Should the host help with rate limiting, or leave it entirely to extensions?

Conclusion

This plan provides a comprehensive roadmap for implementing Language Model Provider Extensions in Zed. The phased approach allows for incremental delivery of value while building toward full feature parity with built-in providers.

The key architectural decisions are:

WIT-based interface for WASM interop, consistent with existing extension patterns
Streaming via resources to minimize WASM boundary crossing overhead
Host-managed credentials for security
Manifest-based discovery for static model information

The migration analysis shows that simpler providers (Ollama, LM Studio) can be migrated first as proof of concept, while more complex providers (Anthropic, Bedrock) may remain built-in initially.