model-improvement.md

 1# Zed Model Improvement
 2
 3## Zed Assistant
 4
 5When using the Zed Assistant, Zed does not persistently store user content or use user content for training of its models.
 6
 7When using upstream services through Zed AI, we require similar assurances from our service providers. For example, usage of Anthropic Claude 3.5 via Zed AI in the Assistant is governed by the [Anthropic Commercial Terms](https://www.anthropic.com/legal/commercial-terms) which includes the following:
 8
 9> "Anthropic may not train models on Customer Content from paid Services."
10
11When you directly connect the Zed Assistant with a non Zed AI service (e.g. via API key) Zed does not have access to your user content. Users should reference their agreement with the service provider to understand what terms and conditions apply.
12
13## Zed Edit Predictions
14
15By default, when using Zed Edit Predictions, Zed does not persistently store user content or use user content for training of its models.
16
17### Opt-in
18
19Users who are working on open source licensed projects may optionally opt-in to providing model improvement feedback. This opt-in occurs on a per-project basis. If you work on multiple open source projects and wish to provide model improvement feedback you will have to opt-in for each individual project.
20
21When working on other projects where you haven't opted-in, Zed will not persistently store user content or use user content for training of its models.
22
23You can see exactly how Zed detects open source licenses in: [license_detection.rs](https://github.com/zed-industries/zed/blob/main/crates/zeta/src/license_detection.rs).
24
25### Exclusions
26
27Zed will intentionally exclude certain files from Predictive Edits entirely, even when you have opted-in to model improvement feedback.
28
29You can inspect this exclusion list by opening `zed: open default settings` from the command palette:
30
31```json
32{
33  "edit_predictions": {
34    // A list of globs representing files that edit predictions should be disabled for.
35    // There's a sensible default list of globs already included.
36    // Any addition to this list will be merged with the default list.
37    "disabled_globs": [
38      "**/.env*",
39      "**/*.pem",
40      "**/*.key",
41      "**/*.cert",
42      "**/*.crt",
43      "**/secrets.yml"
44    ]
45  }
46}
47```
48
49Users may explicitly exclude additional paths and/or file extensions by adding them to [`edit_predictions.disabled_globs`](https://zed.dev/docs/configuring-zed#edit-predictions) in their Zed settings.json:
50
51```json
52{
53  "edit_predictions": {
54    "disabled_globs": ["secret_dir/*", "**/*.log"]
55  }
56}
57```
58
59### Data we collect
60
61For open source projects where you have opted-in, Zed may store copies of requests and responses to the Zed AI Prediction service.
62
63This data includes:
64
65- the edit prediction
66- a portion of the buffer content around the cursor
67- a few recent edits
68- the current buffer outline
69- diagnostics (errors, warnings, etc) from language servers
70
71### Data Handling
72
73Collected data is stored in Snowflake, a private database where we track other metrics. We periodically review this data to select training samples for inclusion in our model training dataset. We ensure any included data is anonymized and contains no sensitive information (access tokens, user IDs, email addresses, etc). This training dataset is publicly available at: [huggingface.co/datasets/zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta).
74
75### Model Output
76
77We then use this training dataset to fine-tune [Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) and make the resulting model available at [huggingface.co/zed-industries/zeta](https://huggingface.co/zed-industries/zeta).
78
79## Applicable terms
80
81Please see the [Zed Terms of Service](https://zed.dev/terms-of-service) for more.