Docs/privacy documentation refresh (#50522)

Lucas White , Claude Sonnet 4.6 , and Zed Zippy created 1 month ago

Before you mark this PR as ready for review, make sure that you have:
- [x] Added a solid test coverage and/or screenshots from doing manual
testing
- [x] Done a self-review taking into account security and performance
aspects
- [x] Aligned any UI changes with the [UI
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)

Release Notes:

- Updated Privacy and Telemetry docs for improved clarity

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zed Zippy <234243425+zed-zippy[bot]@users.noreply.github.com>

Change summary

docs/src/ai/ai-improvement.md       | 107 ++++++++++++++++++------------
docs/src/ai/privacy-and-security.md |  19 ++--
docs/src/telemetry.md               |  41 ++++++-----
3 files changed, 97 insertions(+), 70 deletions(-)

Detailed changes

docs/src/ai/ai-improvement.md 🔗

@@ -3,73 +3,99 @@ title: AI Improvement and Data Collection - Zed
 description: Zed's opt-in approach to AI data collection for improving the agent panel and edit predictions.
 ---
 
-# Zed AI Improvement
+# Zed AI Features and Privacy
 
-## Agent Panel
+## Overview
 
-### Opt-In
+AI features in Zed include:
 
-When you use the Agent Panel through any of these means:
+- [Agent Panel](./agent-panel.md)
+- [Edit Predictions](./edit-prediction.md)
+- [Inline Assist](./inline-assistant.md)
+- [Text Threads](./text-threads.md)
+- Auto Git Commit Message Generation
 
-- [Zed's hosted models](./subscription.md)
-- [connecting a non-Zed AI service via API key](./llm-providers.md)
-- using an [external agent](./external-agents.md)
+By default, Zed does not store your prompts or code context. This data is sent to your selected AI provider (e.g., Anthropic, OpenAI, Google, or xAI) to generate responses, then discarded. Zed will not use your data to evaluate or improve AI features unless you explicitly share it (see [AI Feedback with Ratings](#ai-feedback-with-ratings)) or you opt in to edit prediction training data collection (see [Edit Predictions](#edit-predictions)).
+
+Zed is model-agnostic by design, and none of this changes based on which provider you choose. You can use your own API keys or Zed's hosted models without any data being retained.
+
+### Data Retention and Training
 
-Zed does not persistently store user content or use user content to evaluate and/or improve our AI features, unless it is explicitly shared with Zed. Each share is opt-in, and sharing once will not cause future content or data to be shared again.
+Zed's Agent Panel can be used via:
 
-> Note that rating responses will send your data related to that response to Zed's servers.
-> **_If you don't want data persisted on Zed's servers, don't rate_**. We will not collect data for improving our Agentic offering without you explicitly rating responses.
+- [Zed's hosted models](./subscription.md)
+- [connecting a non-Zed AI service via API key](./llm-providers.md)
+- using an [external agent](./external-agents.md) via ACP
 
-When using upstream services through Zed's hosted models, we require assurances from our service providers that your user content won't be used for training models.
+When using Zed's hosted models, we require assurances from our service providers that your user content won't be used for training models.
 
 | Provider  | No Training Guarantee                                   | Zero-Data Retention (ZDR)                                                                                                                     |
 | --------- | ------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
 | Anthropic | [Yes](https://www.anthropic.com/legal/commercial-terms) | [Yes](https://privacy.anthropic.com/en/articles/8956058-i-have-a-zero-data-retention-agreement-with-anthropic-what-products-does-it-apply-to) |
 | Google    | [Yes](https://cloud.google.com/terms/service-terms)     | [Yes](https://cloud.google.com/terms/service-terms), see Service Terms sections 17 and 19h                                                    |
 | OpenAI    | [Yes](https://openai.com/enterprise-privacy/)           | [Yes](https://platform.openai.com/docs/guides/your-data)                                                                                      |
+| xAI       | [Yes](https://x.ai/legal/faq-enterprise)                | [Yes](https://x.ai/legal/faq-enterprise)                                                                                                      |
 
 When you use your own API keys or external agents, **Zed does not have control over how your data is used by that service provider.**
 You should reference your agreement with each service provider to understand what terms and conditions apply.
 
-### Data we collect
+### AI Feedback with Ratings
+
+You can provide feedback on Zed's AI features by rating specific AI responses in Zed and sharing details related to those conversations with Zed. Each share is opt-in, and sharing once will not cause future content or data to be shared again.
+
+> **Rating = Data Sharing:** When you rate a response, your entire conversation thread is sent to Zed. This includes messages, AI responses, and thread metadata.
+> **_If you don't want data persisted on Zed's servers, don't rate_**. We will not collect data for improving our AI features without you explicitly rating responses.
 
-For prompts you have explicitly shared with us, Zed may store copies of those prompts and other data about the specific use of the Agent Panel.
+### Data Collected (AI Feedback)
 
-This data includes:
+For conversations you have explicitly shared with us via rating, Zed may store:
 
-- The prompt given to the Agent
-- Any commentary you include
-- Product telemetry about the agentic thread
+- All messages in the thread (your prompts and AI responses)
+- Any commentary you include with your rating
+- Thread metadata (model used, token counts, timestamps)
 - Metadata about your Zed installation
 
-### Data Handling
+If you do not rate responses, Zed will not store Customer Data (code, conversations, responses) related to your usage of the AI features.
+
+Telemetry related to Zed's AI features is collected. This includes metadata such as the AI feature being used and high-level interactions with the feature to understand performance (e.g., Agent response time, edit acceptance/rejection in the Agent panel or edit completions). You can read more in Zed's [telemetry](../telemetry.md) documentation.
 
 Collected data is stored in Snowflake, a private database. We periodically review this data to refine the agent's system prompt and tool use. All data is anonymized and stripped of sensitive information (access tokens, user IDs, email addresses).
 
 ## Edit Predictions
 
-By default, when using Zed Edit Predictions, Zed does not persistently store user content or use user content for training of its models.
+Edit predictions can be powered by **Zed's Zeta model** or by **third-party providers** like GitHub Copilot.
+
+### Zed's Zeta Model (Default)
+
+Zed sends a limited context window to the model to generate predictions:
+
+- A code excerpt around your cursor (not the full file)
+- Recent edits as diffs
+- Relevant excerpts from related open files
 
-### Opt-in
+This data is processed transiently to generate predictions and is not retained afterward.
 
-Users who are working on open source licensed projects may optionally opt-in to providing model improvement feedback. This opt-in occurs on a per-project basis. If you work on multiple open source projects and wish to provide model improvement feedback you will have to opt-in for each individual project.
+### Third-Party Providers
 
-When working on other projects where you haven't opted-in, Zed will not persistently store user content or use user content for training of its models.
+When using third-party providers like GitHub Copilot, **Zed does not control how your data is handled** by that provider. You should consult their Terms and Conditions directly.
 
-You can see exactly how Zed detects open source licenses in: [license_detection.rs](https://github.com/zed-industries/zed/blob/main/crates/edit_prediction/src/license_detection.rs).
+Note: Zed's `disabled_globs` settings will prevent predictions from being requested, but third-party providers may receive file content when files are opened.
 
-### Exclusions
+### Training Data: Opt-In for Open Source Projects
 
-Zed will intentionally exclude certain files from Predictive Edits entirely, even when you have opted-in to model improvement feedback.
+Zed does not collect training data for our edit prediction model unless the following conditions are met:
 
-You can inspect this exclusion list by opening `zed: open default settings` from the command palette:
+1. **You opt in** – Toggle "Training Data Collection" under the **Privacy** section of the edit prediction status bar menu (click the edit prediction icon in the status bar).
+2. **The project is open source** — detected via LICENSE file ([see detection logic](https://github.com/zed-industries/zed/blob/main/crates/edit_prediction/src/license_detection.rs))
+3. **The file isn't excluded** — via `disabled_globs`
+
+### File Exclusions
+
+Certain files are always excluded from edit predictions—regardless of opt-in status:
 
 ```json [settings]
 {
   "edit_predictions": {
-    // A list of globs representing files that edit predictions should be disabled for.
-    // There's a sensible default list of globs already included.
-    // Any addition to this list will be merged with the default list.
     "disabled_globs": [
       "**/.env*",
       "**/*.pem",
@@ -92,22 +118,17 @@ Users may explicitly exclude additional paths and/or file extensions by adding t
 }
 ```
 
-### Data we collect
-
-For open source projects where you have opted-in, Zed may store copies of requests and responses to the Zed AI Prediction service.
-
-This data includes:
+### Data Collected (Edit Prediction Training Data)
 
-- sampled edit prediction examples (cursor context + recent diffs/edits) for offline evaluation
-- the edit prediction
-- a portion of the buffer content around the cursor
-- a few recent edits
-- the current buffer outline
-- diagnostics (errors, warnings, etc) from language servers
+For open source projects where you've opted in, Zed may collect:
 
-### Data Handling
+- Code excerpt around your cursor
+- Recent edit diffs
+- The generated prediction
+- Repository URL and git revision
+- Buffer outline and diagnostics
 
-Collected data is stored in Snowflake, a private database. We periodically select training samples from this data. All data is anonymized and stripped of sensitive information (access tokens, user IDs, email addresses). The training dataset is publicly available at [huggingface.co/datasets/zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta).
+Collected data is stored in Snowflake. We periodically review this data to select training samples for inclusion in our model training dataset. We ensure any included data is anonymized and contains no sensitive information (access tokens, user IDs, email addresses, etc). This training dataset is publicly available at [huggingface.co/datasets/zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta).
 
 ### Model Output
 
@@ -115,4 +136,4 @@ We then use this training dataset to fine-tune [Qwen2.5-Coder-7B](https://huggin
 
 ## Applicable terms
 
-Please see the [Zed Terms of Service](https://zed.dev/terms-of-service) for more.
+Please see the [Zed Terms of Service](https://zed.dev/terms) for more.

docs/src/ai/privacy-and-security.md 🔗

@@ -7,15 +7,17 @@ description: Zed's approach to AI privacy: opt-in data sharing by default, zero-
 
 ## Philosophy
 
-Zed aims to collect only the minimum data necessary to serve and improve our product.
+Zed collects minimal data necessary to serve and improve our product. Features that could share data, like AI and telemetry, are either opt-in or can be disabled.
 
-Data sharing is opt-in by default. Privacy is not a setting to toggle—it's the baseline.
+- **Telemetry**: Zed collects only the data necessary to understand usage and fix issues. Client-side telemetry can be disabled in settings.
 
-As an open-source product, we believe in maximal transparency, and invite you to examine our codebase. If you find issues, we encourage you to share them with us.
+- **AI**: Data sharing for AI improvement is opt-in, and each share is a one-time action; it does not grant permission for future data collection. You can use Zed's AI features without sharing any data with Zed and without authenticating.
 
-Zed, including AI features, works without sharing data with us and without authentication.
+- **Open-Source**: Zed's codebase is public. You can inspect exactly what data is collected and how it's handled. If you find issues, we encourage you to report them.
 
-## Documentation
+- **Secure-by-default**: Designing Zed and our Service with "secure-by-default" as an objective is of utmost importance to us. We take your security and ours very seriously and strive to follow industry best-practice in order to uphold that principle.
+
+## Related Documentation
 
 - [Tool Permissions](./tool-permissions.md): Configure granular rules to control which agent actions are auto-approved, blocked, or require confirmation.
 
@@ -23,16 +25,15 @@ Zed, including AI features, works without sharing data with us and without authe
 
 - [Telemetry](../telemetry.md): How Zed collects general telemetry data.
 
-- [AI Improvement](./ai-improvement.md): Zed's opt-in-only approach to data collection for AI improvement, whether our Agentic offering or Edit Predictions.
+- [Zed AI Features and Privacy](./ai-improvement.md): An overview of Zed's AI features, your data when using AI in Zed, and how to opt-in and help Zed improve these features.
 
 - [Accounts](../authentication.md): When and why you'd need to authenticate into Zed, how to do so, and what scope we need from you.
 
-- [Collab](https://zed.dev/faq#data-and-privacy): How Zed's live collaboration works, and how data flows to provide the experience (we don't store your code).
+- [Collab](https://zed.dev/faq#data-and-privacy): How Zed's live collaboration works and how data flows. Zed does not store your code.
 
 ## Legal Links
 
-- [Terms of Service](https://zed.dev/terms-of-service)
-- [Terms of Use](https://zed.dev/terms)
+- [Terms of Service](https://zed.dev/terms)
 - [Privacy Policy](https://zed.dev/privacy-policy)
 - [Zed's Contributor License and Feedback Agreement](https://zed.dev/cla)
 - [Subprocessors](https://zed.dev/subprocessors)

docs/src/telemetry.md 🔗

@@ -5,7 +5,12 @@ description: "What data Zed collects and how to control telemetry settings."
 
 # Telemetry in Zed
 
-Zed collects anonymous telemetry data to help the team understand how people are using the application and to see what sort of issues they are experiencing.
+Zed collects anonymous telemetry to understand usage patterns and diagnose issues.
+
+Telemetry falls into two categories:
+
+- **Client-side**: Usage metrics and crash reports. You can disable these in settings.
+- **Server-side**: Collected when using hosted services like AI or Collaboration. Required for these features to function.
 
 ## Configuring Telemetry Settings
 
@@ -21,7 +26,7 @@ To enable or disable some or all telemetry types, open Settings ({#kb zed::OpenS
 
 ## Dataflow
 
-Telemetry is sent from the application to our servers. Data is proxied through our servers to enable us to easily switch analytics services. We currently use:
+Telemetry is sent from the application to our servers every 5 minutes (or when 50 events accumulate), then routed to the appropriate service. We currently use:
 
 - [Sentry](https://sentry.io): Crash-monitoring service - stores diagnostic events
 - [Snowflake](https://snowflake.com): Data warehouse - stores both diagnostic and metric events
@@ -32,33 +37,33 @@ Telemetry is sent from the application to our servers. Data is proxied through o
 
 ### Diagnostics
 
-Crash reports consist of a [minidump](https://learn.microsoft.com/en-us/windows/win32/debug/minidump-files) and some extra debug information. Reports are sent on the first application launch after the crash occurred. We've built dashboards that allow us to visualize the frequency and severity of issues experienced by users. Having these reports sent automatically allows us to begin implementing fixes without the user needing to file a report in our issue tracker. The plots in the dashboards also give us an informal measurement of the stability of Zed.
+Crash reports consist of a [minidump](https://learn.microsoft.com/en-us/windows/win32/debug/minidump-files) and debug metadata. Reports are sent on the next launch after a crash, allowing Zed to identify and fix issues without requiring you to file a bug report.
 
-You can see what extra data is sent alongside the minidump in the `Panic` struct in [crates/telemetry_events/src/telemetry_events.rs](https://github.com/zed-industries/zed/blob/main/crates/telemetry_events/src/telemetry_events.rs) in the Zed repo. You can find additional information in the [Debugging Crashes](./development/debugging-crashes.md) documentation.
+You can inspect what data is sent in the `Panic` struct in [crates/telemetry_events/src/telemetry_events.rs](https://github.com/zed-industries/zed/blob/main/crates/telemetry_events/src/telemetry_events.rs). See also: [Debugging Crashes](./development/debugging-crashes.md).
 
-### Client-Side Usage Data {#client-metrics}
+### Client-Side Metrics
 
-To improve Zed and understand how it is being used in the wild, Zed optionally collects usage data like the following:
+Client-side telemetry includes:
 
-- (a) file extensions of opened files;
-- (b) features and tools You use within the Editor;
-- (c) project statistics (e.g., number of files); and
-- (d) frameworks detected in Your projects
+- File extensions of opened files
+- Features and tools used within the editor
+- Project statistics (e.g., number of files)
+- Frameworks detected in your projects
 
-Usage Data does not include any of Your software code or sensitive project details. Metric events are reported over HTTPS, and requests are rate-limited to avoid using significant network bandwidth.
+This data does not include your code or sensitive project details. Events are sent over HTTPS and rate-limited.
 
-Usage Data is associated with a secure random telemetry ID which may be linked to Your email address. This linkage currently serves two purposes: (1) it allows Zed to analyze usage patterns over time while maintaining Your privacy; and (2) it enables Zed to reach out to specific user groups for feedback and improvement suggestions.
+Usage data is tied to a random telemetry ID. If you've authenticated, this ID may be linked to your email so Zed can analyze patterns over time and reach out for feedback.
 
-You can audit the metrics data that Zed has reported by running the command {#action zed::OpenTelemetryLog} from the command palette, or clicking `Help > View Telemetry Log` in the application menu.
+To audit what Zed has reported, run {#action zed::OpenTelemetryLog} from the command palette or click `Help > View Telemetry Log`.
 
-You can see the full list of the event types and exactly the data sent for each by inspecting the `Event` enum and the associated structs in [crates/telemetry_events/src/telemetry_events.rs](https://github.com/zed-industries/zed/blob/main/crates/telemetry_events/src/telemetry_events.rs) in the Zed repository.
+For the full list of event types, see the `Event` enum in [telemetry_events.rs](https://github.com/zed-industries/zed/blob/main/crates/telemetry_events/src/telemetry_events.rs).
 
-### Server-Side Usage Data {#metrics}
+### Server-Side Metrics
 
-When using Zed's hosted services, we may collect, generate, and Process data to allow us to support users and improve our hosted offering. Examples include metadata around rate limiting and billing metrics/token usage. Zed does not persistently store user content or use user content to evaluate and/or improve our AI features, unless it is explicitly shared with Zed, and we have a zero-data retention agreement with Anthropic.
+When using Zed's hosted services, we collect metadata for rate limiting and billing (e.g., token usage). Zed does not store your prompts or code unless you explicitly share them via feedback ratings.
 
-You can see more about our stance on data collection (and that any prompt data shared with Zed is explicitly opt-in) at [AI Improvement](./ai/ai-improvement.md).
+For details on AI data handling, see [Zed AI Features and Privacy](./ai/ai-improvement.md).
 
 ## Concerns and Questions
 
-If you have concerns about telemetry, please feel free to [open an issue](https://github.com/zed-industries/zed/issues/new/choose).
+If you have concerns about telemetry, you can [open an issue](https://github.com/zed-industries/zed/issues/new/choose) or email hi@zed.dev.