Language Model Provider Extensions - Implementation Guide
Purpose
This document provides a detailed guide for completing the implementation of Language Model Provider Extensions in Zed. It explains what has been done, what remains, and how to complete the work.
For the full design and rationale, see language_model_provider_extensions_plan.md.
Core Design Principle
Extensions handle ALL provider-specific logic. This means:
- Thought signatures (Anthropic)
- Reasoning effort parameters (OpenAI o-series)
- Cache control markers
- Parallel tool calls
- SSE/streaming format parsing
- Any other provider-specific features
Zed's core should have zero knowledge of these details. The extension API must be generic enough that extensions can implement any provider without Zed changes.
Current Status: STREAMING API COMPLETE ✅
The core plumbing and streaming API are now complete. Extensions can:
- Declare LLM providers in their manifest
- Be queried for providers and models at load time
- Have their providers registered with the
LanguageModelRegistry - Have their providers unregistered when the extension is unloaded
- Stream completions using the new polling-based API
What's NOT done yet:
- Credential UI prompt support (
llm_request_credentialreturns false) - Model refresh mechanism
- A working test extension that demonstrates the feature (requires WASM build)
- End-to-end testing with a real extension
What Has Been Completed
1. WIT Interface Definition ✅
Location: crates/extension_api/wit/since_v0.7.0/
Created all WIT files for v0.7.0:
llm-provider.wit- Core LLM types (ProviderInfo, ModelInfo, CompletionRequest, CompletionEvent, etc.)extension.wit- Updated with LLM exports/imports
Key types in llm-provider.wit:
record provider-info {
id: string,
name: string,
icon: option<string>,
}
record model-info {
id: string,
name: string,
max-token-count: u64,
max-output-tokens: option<u64>,
capabilities: model-capabilities,
is-default: bool,
is-default-fast: bool,
}
variant completion-event {
started,
text(string),
thinking(thinking-content),
redacted-thinking(string),
tool-use(tool-use),
tool-use-json-parse-error(tool-use-json-parse-error),
stop(stop-reason),
usage(token-usage),
reasoning-details(string),
}
Key exports in extension.wit:
export llm-providers: func() -> list<provider-info>;
export llm-provider-models: func(provider-id: string) -> result<list<model-info>, string>;
export llm-provider-is-authenticated: func(provider-id: string) -> bool;
export llm-provider-authenticate: func(provider-id: string) -> result<_, string>;
export llm-stream-completion-start: func(provider-id: string, model-id: string, request: completion-request) -> result<string, string>;
export llm-stream-completion-next: func(stream-id: string) -> result<option<completion-event>, string>;
export llm-stream-completion-close: func(stream-id: string);
Note: The streaming API uses a polling-based approach with explicit stream IDs instead of a resource handle. This avoids complexity with cross-boundary resource ownership in the WASM component model.
Key imports in extension.wit:
import llm-get-credential: func(provider-id: string) -> option<string>;
import llm-store-credential: func(provider-id: string, value: string) -> result<_, string>;
import llm-delete-credential: func(provider-id: string) -> result<_, string>;
import llm-get-env-var: func(name: string) -> option<string>;
2. Extension Manifest Changes ✅
Location: crates/extension/src/extension_manifest.rs
Added these types:
pub struct LanguageModelProviderManifestEntry {
pub name: String,
pub icon: Option<String>,
pub models: Vec<LanguageModelManifestEntry>,
pub auth: Option<LanguageModelAuthConfig>,
}
pub struct LanguageModelManifestEntry {
pub id: String,
pub name: String,
pub max_token_count: u64,
pub max_output_tokens: Option<u64>,
pub supports_images: bool,
pub supports_tools: bool,
pub supports_thinking: bool,
}
pub struct LanguageModelAuthConfig {
pub env_var: Option<String>,
pub credential_label: Option<String>,
}
Added to ExtensionManifest:
pub language_model_providers: BTreeMap<Arc<str>, LanguageModelProviderManifestEntry>,
3. Host-Side Provider/Model Structs ✅
Location: crates/extension_host/src/wasm_host/llm_provider.rs
Created ExtensionLanguageModelProvider implementing LanguageModelProvider:
- Wraps a
WasmExtensionandLlmProviderInfo - Delegates to extension calls for authentication, model listing, etc.
- Returns
ExtensionLanguageModelinstances - Implements
LanguageModelProviderStatefor UI observation
Created ExtensionLanguageModel implementing LanguageModel:
- Wraps extension + model info
- Implements
stream_completionby calling extension'sllm-stream-completion - Converts between Zed's
LanguageModelRequestand WIT'sCompletionRequest - Handles streaming via polling-based approach with explicit stream IDs
Key implementation details:
- The
stream_completionmethod uses a polling loop that callsllm_stream_completion_start, then repeatedly callsllm_stream_completion_nextuntil the stream is complete, and finally callsllm_stream_completion_closeto clean up - Credential storage uses gpui's
cx.read_credentials(),cx.write_credentials(), andcx.delete_credentials()APIs - The
new()method now accepts amodels: Vec<LlmModelInfo>parameter to populate available models at registration time
4. Extension Host Proxy ✅
Location: crates/extension/src/extension_host_proxy.rs
Added ExtensionLanguageModelProviderProxy trait:
pub type LanguageModelProviderRegistration = Box<dyn FnOnce(&mut App) + Send + Sync + 'static>;
pub trait ExtensionLanguageModelProviderProxy: Send + Sync + 'static {
fn register_language_model_provider(
&self,
provider_id: Arc<str>,
register_fn: LanguageModelProviderRegistration,
cx: &mut App,
);
fn unregister_language_model_provider(&self, provider_id: Arc<str>, cx: &mut App);
}
The proxy uses a boxed closure pattern. This allows extension_host to create the ExtensionLanguageModelProvider (which requires WasmExtension), while letting language_models handle the actual registry registration.
5. Proxy Implementation ✅
Location: crates/language_models/src/extension.rs
pub struct ExtensionLanguageModelProxy {
registry: Entity<LanguageModelRegistry>,
}
impl ExtensionLanguageModelProviderProxy for ExtensionLanguageModelProxy {
fn register_language_model_provider(
&self,
_provider_id: Arc<str>,
register_fn: LanguageModelProviderRegistration,
cx: &mut App,
) {
register_fn(cx);
}
fn unregister_language_model_provider(&self, provider_id: Arc<str>, cx: &mut App) {
self.registry.update(cx, |registry, cx| {
registry.unregister_provider(LanguageModelProviderId::from(provider_id), cx);
});
}
}
The proxy is registered during language_models::init().
6. Extension Loading Wiring ✅
Location: crates/extension_host/src/extension_host.rs
In extensions_updated():
Unloading (around line 1217):
for provider_id in extension.manifest.language_model_providers.keys() {
let full_provider_id: Arc<str> = format!("{}:{}", extension_id, provider_id).into();
self.proxy.unregister_language_model_provider(full_provider_id, cx);
}
Loading (around line 1383): After loading a wasm extension, we query for LLM providers and models:
if !extension.manifest.language_model_providers.is_empty() {
let providers_result = wasm_extension
.call(|ext, store| {
async move { ext.call_llm_providers(store).await }.boxed()
})
.await;
if let Ok(Ok(providers)) = providers_result {
for provider_info in providers {
// Query for models...
let models_result = wasm_extension.call(...).await;
// Store provider_info and models for registration
}
}
}
Then during registration (around line 1511):
for (provider_info, models) in llm_providers_with_models {
let provider_id: Arc<str> = format!("{}:{}", manifest.id, provider_info.id).into();
this.proxy.register_language_model_provider(
provider_id,
Box::new(move |cx: &mut App| {
let provider = Arc::new(ExtensionLanguageModelProvider::new(
wasm_ext, pinfo, mods, cx,
));
language_model::LanguageModelRegistry::global(cx).update(
cx,
|registry, cx| {
registry.register_provider(provider, cx);
},
);
}),
cx,
);
}
7. Extension API Updates ✅
Location: crates/extension_api/src/extension_api.rs
- Updated
wit_bindgen::generate!to use./wit/since_v0.7.0 - Added LLM type re-exports (prefixed with
Llmfor clarity) - Added LLM methods to
Extensiontrait with default implementations - Added
wit::Guestimplementations for LLM functions
The default implementations ensure backward compatibility:
fn llm_providers(&self) -> Vec<LlmProviderInfo> {
Vec::new() // Extensions without LLM providers return empty
}
fn llm_provider_models(&self, _provider_id: &str) -> Result<Vec<LlmModelInfo>, String> {
Ok(Vec::new())
}
fn llm_stream_completion_start(...) -> Result<String, String> {
Err("`llm_stream_completion_start` not implemented".to_string())
}
fn llm_stream_completion_next(stream_id: &str) -> Result<Option<LlmCompletionEvent>, String> {
Err("`llm_stream_completion_next` not implemented".to_string())
}
fn llm_stream_completion_close(stream_id: &str) { /* cleanup */ }
8. Test Files Updated ✅
Added language_model_providers: BTreeMap::default() to all test manifests:
crates/extension/src/extension_manifest.rs(test module)crates/extension_host/src/extension_store_test.rscrates/extension_host/src/capability_granter.rs(test module)crates/extension_host/benches/extension_compilation_benchmark.rs
What Remains To Be Done
Task 1: Test the Streaming Completion Flow (HIGH PRIORITY) - ARCHITECTURE UPDATED ✅
The streaming API has been updated to use a polling-based approach instead of a resource handle pattern.
This was necessary because the original design had a fundamental issue: the completion-stream resource
was defined in an imported interface but returned from an exported function, creating ownership ambiguity.
New API:
llm-stream-completion-start- Returns a stream ID (string)llm-stream-completion-next- Poll for the next event using the stream IDllm-stream-completion-close- Clean up the stream when done
Still needs testing:
- Create a test extension that implements a simple LLM provider
- Verify the polling-based streaming works correctly through the WASM boundary
- Test error handling and edge cases
Location to test: crates/extension_host/src/wasm_host/llm_provider.rs - the stream_completion method on ExtensionLanguageModel.
Task 2: Credential UI Prompt Support (MEDIUM PRIORITY)
Location: crates/extension_host/src/wasm_host/wit/since_v0_7_0.rs
The llm_request_credential host function currently returns Ok(Ok(false)):
async fn llm_request_credential(
&mut self,
_provider_id: String,
_credential_type: llm_provider::CredentialType,
_label: String,
_placeholder: String,
) -> wasmtime::Result<Result<bool, String>> {
// TODO: Implement actual UI prompting
Ok(Ok(false))
}
What needs to happen:
- Show a dialog to the user asking for the credential
- Wait for user input
- Return
trueif provided,falseif cancelled - The extension can then use
llm_store_credentialto save it
This requires UI work and async coordination with gpui windows.
Task 3: Handle Model Refresh (LOW PRIORITY - can be follow-up)
Currently models are only queried once at registration time. Options for improvement:
- Add a refresh mechanism that re-queries
call_llm_provider_models - Add a notification mechanism where extensions can signal that models have changed
- Automatic refresh on authentication
Recommendation: Start with refresh-on-authentication as a fast-follow.
Task 4: Create a Test Extension (LOW PRIORITY - but very useful)
Note: Creating a working test extension requires building a WASM component, which needs:
- The
wasm32-wasip1Rust target:rustup target add wasm32-wasip1 - Building with:
cargo build --target wasm32-wasip1 --release - The resulting
.wasmfile must be placed in the extension directory
The existing extensions/test-extension has a pre-built WASM file checked in. To test LLM
provider functionality, either:
- Rebuild the test-extension WASM with LLM provider code
- Create a new extension and build it locally
Example test extension that demonstrates the LLM provider API:
extensions/test-llm-provider/
├── extension.toml
├── Cargo.toml
└── src/
└── lib.rs
extension.toml:
id = "test-llm-provider"
name = "Test LLM Provider"
version = "0.1.0"
schema_version = 1
[language_model_providers.test-provider]
name = "Test Provider"
src/lib.rs:
use zed_extension_api::{self as zed, *};
use std::collections::HashMap;
use std::sync::Mutex;
struct TestExtension {
streams: Mutex<HashMap<String, Vec<LlmCompletionEvent>>>,
next_stream_id: Mutex<u64>,
}
impl zed::Extension for TestExtension {
fn new() -> Self {
Self {
streams: Mutex::new(HashMap::new()),
next_stream_id: Mutex::new(0),
}
}
fn llm_providers(&self) -> Vec<LlmProviderInfo> {
vec![LlmProviderInfo {
id: "test-provider".into(),
name: "Test Provider".into(),
icon: None,
}]
}
fn llm_provider_models(&self, _provider_id: &str) -> Result<Vec<LlmModelInfo>, String> {
Ok(vec![LlmModelInfo {
id: "test-model".into(),
name: "Test Model".into(),
max_token_count: 4096,
max_output_tokens: Some(1024),
capabilities: LlmModelCapabilities {
supports_images: false,
supports_tools: false,
supports_tool_choice_auto: false,
supports_tool_choice_any: false,
supports_tool_choice_none: false,
supports_thinking: false,
tool_input_format: LlmToolInputFormat::JsonSchema,
},
is_default: true,
is_default_fast: true,
}])
}
fn llm_stream_completion_start(
&mut self,
_provider_id: &str,
_model_id: &str,
_request: &LlmCompletionRequest,
) -> Result<String, String> {
// Create a simple response with test events
let events = vec![
LlmCompletionEvent::Started,
LlmCompletionEvent::Text("Hello, ".into()),
LlmCompletionEvent::Text("world!".into()),
LlmCompletionEvent::Stop(LlmStopReason::EndTurn),
];
let mut id = self.next_stream_id.lock().unwrap();
let stream_id = format!("stream-{}", *id);
*id += 1;
self.streams.lock().unwrap().insert(stream_id.clone(), events);
Ok(stream_id)
}
fn llm_stream_completion_next(
&mut self,
stream_id: &str,
) -> Result<Option<LlmCompletionEvent>, String> {
let mut streams = self.streams.lock().unwrap();
if let Some(events) = streams.get_mut(stream_id) {
if events.is_empty() {
Ok(None)
} else {
Ok(Some(events.remove(0)))
}
} else {
Err(format!("Unknown stream: {}", stream_id))
}
}
fn llm_stream_completion_close(&mut self, stream_id: &str) {
self.streams.lock().unwrap().remove(stream_id);
}
}
zed::register_extension!(TestExtension);
File-by-File Checklist
Completed ✅
-
crates/extension_api/wit/since_v0.7.0/llm-provider.wit- LLM types defined -
crates/extension_api/wit/since_v0.7.0/extension.wit- LLM exports/imports added -
crates/extension_api/src/extension_api.rs- Extension trait + Guest impl updated for v0.7.0 -
crates/extension/src/extension_manifest.rs- Manifest types added -
crates/extension/src/extension_host_proxy.rs- Proxy trait added -
crates/extension_host/src/wasm_host/llm_provider.rs- Provider/Model structs created -
crates/extension_host/src/wasm_host/wit.rs- LLM types exported, Extension enum updated -
crates/extension_host/src/wasm_host/wit/since_v0_7_0.rs- Host trait implementations -
crates/extension_host/src/wasm_host/wit/since_v0_6_0.rs- Rewritten to use latest types -
crates/extension_host/src/extension_host.rs- Wired up LLM provider registration/unregistration -
crates/extension_host/Cargo.toml- Dependencies added -
crates/language_models/src/extension.rs- Proxy implementation -
crates/language_models/src/language_models.rs- Proxy registration -
crates/language_models/Cargo.toml- Extension dependency added
Should Implement (Follow-up PRs)
-
llm_request_credentialUI implementation - Model refresh mechanism
- Test extension for validation
- Documentation for extension authors
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ Extension Host │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ extensions_updated() │ │
│ │ │ │
│ │ 1. Load WasmExtension │ │
│ │ 2. Query llm_providers() and llm_provider_models() │ │
│ │ 3. Call proxy.register_language_model_provider() │ │
│ └───────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────────────┐ │
│ │ ExtensionLanguageModelProvider │ │
│ │ - Wraps WasmExtension │ │
│ │ - Implements LanguageModelProvider │ │
│ │ - Creates ExtensionLanguageModel instances │ │
│ └───────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────────────┐ │
│ │ ExtensionLanguageModel │ │
│ │ - Implements LanguageModel │ │
│ │ - stream_completion() calls extension via WASM │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
│ Proxy (boxed closure)
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Language Models Crate │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ ExtensionLanguageModelProxy │ │
│ │ - Implements ExtensionLanguageModelProviderProxy │ │
│ │ - Calls register_fn closure │ │
│ │ - Unregisters from LanguageModelRegistry │ │
│ └───────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────────────┐ │
│ │ LanguageModelRegistry │ │
│ │ - Stores all providers (built-in + extension) │ │
│ │ - Provides models to UI │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Key Code Patterns
1. Provider ID Format
Provider IDs are formatted as {extension_id}:{provider_id} to ensure uniqueness:
let provider_id: Arc<str> = format!("{}:{}", manifest.id, provider_info.id).into();
2. Triple-Nested Result Handling
When calling extension methods, results are nested:
- Outer
Result: from channel operations (anyhow error) - Middle
Result: from WASM call (anyhow error) - Inner
Result<T, String>: from extension logic
let models_result = wasm_extension.call(...).await;
let models: Vec<LlmModelInfo> = match models_result {
Ok(Ok(Ok(models))) => models,
Ok(Ok(Err(e))) => { /* extension returned error */ }
Ok(Err(e)) => { /* WASM call failed */ }
Err(e) => { /* channel operation failed */ }
};
3. Polling-Based Streaming Pattern
The streaming API uses explicit stream IDs with polling instead of resource handles:
// Start the stream and get an ID
let stream_id = ext.call_llm_stream_completion_start(store, provider_id, model_id, request).await?;
// Poll for events in a loop
loop {
match ext.call_llm_stream_completion_next(store, &stream_id).await? {
Ok(Some(event)) => { /* process event */ }
Ok(None) => break, // Stream complete
Err(e) => { /* handle error */ }
}
}
// Clean up
ext.call_llm_stream_completion_close(store, &stream_id).await;
This pattern avoids the complexity of cross-boundary resource ownership in the WASM component model.
4. Default Trait Implementations
All LLM methods in the Extension trait have defaults so existing extensions continue to work:
fn llm_providers(&self) -> Vec<LlmProviderInfo> {
Vec::new() // No providers by default
}
Common Pitfalls
-
Type confusion: WIT bindgen creates NEW types for each version.
Completionfrom v0.6.0 bindgen is different from v0.7.0. This is why we map older interfaces tolatest::. -
Import paths: After
pub use self::zed::extension::*;, types are available without prefix. Types in sub-interfaces (likelsp::CompletionKind) need explicit imports. -
Async closures: Extension calls use
extension.call(|ext, store| async move { ... }.boxed())pattern. The closure must be'static + Send. -
Stream ID management: Extensions must track their active streams using the stream IDs returned from
llm_stream_completion_start. The host will callllm_stream_completion_closewhen done. -
Result nesting:
extension.call(...)wraps the closure's return type inResult<T>, so if the closure returnsResult<Result<X, String>>, you getResult<Result<Result<X, String>>>. Unwrap carefully! -
Proxy type boundaries: The
extensioncrate shouldn't depend onextension_host. The proxy trait uses a boxed closure to pass the registration logic without needing to share types. -
Resource ownership in WIT: Be careful when defining resources in imported interfaces but returning them from exported functions. This creates ownership ambiguity. The streaming API was changed to use polling to avoid this issue.
Testing
All existing tests pass:
cargo test -p extension_host --lib
# 3 tests pass
./script/clippy
# No warnings
To test the full flow manually:
- Create a test extension with LLM provider
- Build and install it
- Check if it appears in the model selector
- Try making a completion request
Relevant Files for Reference
How providers are registered
crates/language_model/src/registry.rs-LanguageModelRegistry::register_provider
How other extension proxies work
crates/extension/src/extension_host_proxy.rs- the proxy patterncrates/project/src/context_server_store/extension.rs- context server proxy implementation
How extensions are loaded
crates/extension_host/src/extension_host.rs-extensions_updatedmethod
WasmExtension call pattern
crates/extension_host/src/wasm_host.rs-WasmExtension::callmethod
Questions for Follow-up
-
Where should configuration UI live? The current implementation uses an empty config view. Should extension providers have configurable settings?
-
How to handle extension reload? Currently, in-flight completions will fail if the extension is unloaded. Should we add graceful handling?
-
Should there be rate limiting? If an extension's provider misbehaves, should Zed throttle or disable it?
-
Icon support: The
provider_info.iconfield exists buticon()on the provider returnsui::IconName::ZedAssistant. Should we add custom icon support?