languages.md

  1# Language Extensions
  2
  3Language support in Zed has several components:
  4
  5- Language metadata and configuration
  6- Grammar
  7- Queries
  8- Language servers
  9
 10## Language Metadata
 11
 12Each language supported by Zed must be defined in a subdirectory inside the `languages` directory of your extension.
 13
 14This subdirectory must contain a file called `config.toml` file with the following structure:
 15
 16```toml
 17name = "My Language"
 18grammar = "my-language"
 19path_suffixes = ["myl"]
 20line_comments = ["# "]
 21```
 22
 23- `name` is the human readable name that will show up in the Select Language dropdown.
 24- `grammar` is the name of a grammar. Grammars are registered separately, described below.
 25- `path_suffixes` (optional) is an array of file suffixes that should be associated with this language. This supports glob patterns like `config/**/*.toml` where `**` matches 0 or more directories and `*` matches 0 or more characters.
 26- `line_comments` (optional) is an array of strings that are used to identify line comments in the language.
 27
 28<!--
 29TBD: Document `language_name/config.toml` keys
 30
 31- line_comments, block_comment
 32- autoclose_before
 33- brackets (start, end, close, newline, not_in: ["comment", "string"])
 34- tab_size, hard_tabs
 35- word_characters
 36- prettier_parser_name
 37- opt_into_language_servers
 38- first_line_pattern
 39- code_fence_block_name
 40- scope_opt_in_language_servers
 41- increase_indent_pattern, decrease_indent_pattern
 42- collapsed_placeholder
 43-->
 44
 45## Grammar
 46
 47Zed uses the [Tree-sitter](https://tree-sitter.github.io) parsing library to provide built-in language-specific features. There are grammars available for many languages, and you can also [develop your own grammar](https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar). A growing list of Zed features are built using pattern matching over syntax trees with Tree-sitter queries. As mentioned above, every language that is defined in an extension must specify the name of a Tree-sitter grammar that is used for parsing. These grammars are then registered separately in extensions' `extension.toml` file, like this:
 48
 49```toml
 50[grammars.gleam]
 51repository = "https://github.com/gleam-lang/tree-sitter-gleam"
 52commit = "58b7cac8fc14c92b0677c542610d8738c373fa81"
 53```
 54
 55The `repository` field must specify a repository where the Tree-sitter grammar should be loaded from, and the `commit` field must contain the SHA of the Git commit to use. An extension can provide multiple grammars by referencing multiple tree-sitter repositories.
 56
 57## Tree-sitter Queries
 58
 59Zed uses the syntax tree produced by the [Tree-sitter](https://tree-sitter.github.io) query language to implement
 60several features:
 61
 62- Syntax highlighting
 63- Bracket matching
 64- Code outline/structure
 65- Auto-indentation
 66- Code injections
 67- Syntax overrides
 68- Text redactions
 69- Runnable code detection
 70
 71The following sections elaborate on how [Tree-sitter queries](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax) enable these
 72features in Zed, using [JSON syntax](https://www.json.org/json-en.html) as a guiding example.
 73
 74### Syntax highlighting
 75
 76In Tree-sitter, the `highlights.scm` file defines syntax highlighting rules for a particular syntax.
 77
 78Here's an example from a `highlights.scm` for JSON:
 79
 80```scheme
 81(string) @string
 82
 83(pair
 84  key: (string) @property.json_key)
 85
 86(number) @number
 87```
 88
 89This query marks strings, object keys, and numbers for highlighting. The following is a comprehensive list of captures supported by themes:
 90
 91| Capture                  | Description                            |
 92| ------------------------ | -------------------------------------- |
 93| @attribute               | Captures attributes                    |
 94| @boolean                 | Captures boolean values                |
 95| @comment                 | Captures comments                      |
 96| @comment.doc             | Captures documentation comments        |
 97| @constant                | Captures constants                     |
 98| @constructor             | Captures constructors                  |
 99| @embedded                | Captures embedded content              |
100| @emphasis                | Captures emphasized text               |
101| @emphasis.strong         | Captures strongly emphasized text      |
102| @enum                    | Captures enumerations                  |
103| @function                | Captures functions                     |
104| @hint                    | Captures hints                         |
105| @keyword                 | Captures keywords                      |
106| @label                   | Captures labels                        |
107| @link_text               | Captures link text                     |
108| @link_uri                | Captures link URIs                     |
109| @number                  | Captures numeric values                |
110| @operator                | Captures operators                     |
111| @predictive              | Captures predictive text               |
112| @preproc                 | Captures preprocessor directives       |
113| @primary                 | Captures primary elements              |
114| @property                | Captures properties                    |
115| @punctuation             | Captures punctuation                   |
116| @punctuation.bracket     | Captures brackets                      |
117| @punctuation.delimiter   | Captures delimiters                    |
118| @punctuation.list_marker | Captures list markers                  |
119| @punctuation.special     | Captures special punctuation           |
120| @string                  | Captures string literals               |
121| @string.escape           | Captures escaped characters in strings |
122| @string.regex            | Captures regular expressions           |
123| @string.special          | Captures special strings               |
124| @string.special.symbol   | Captures special symbols               |
125| @tag                     | Captures tags                          |
126| @tag.doctype             | Captures doctypes (e.g., in HTML)      |
127| @text.literal            | Captures literal text                  |
128| @title                   | Captures titles                        |
129| @type                    | Captures types                         |
130| @variable                | Captures variables                     |
131| @variable.special        | Captures special variables             |
132| @variant                 | Captures variants                      |
133
134### Bracket matching
135
136The `brackets.scm` file defines matching brackets.
137
138Here's an example from a `brackets.scm` file for JSON:
139
140```scheme
141("[" @open "]" @close)
142("{" @open "}" @close)
143("\"" @open "\"" @close)
144```
145
146This query identifies opening and closing brackets, braces, and quotation marks.
147
148| Capture | Description                                   |
149| ------- | --------------------------------------------- |
150| @open   | Captures opening brackets, braces, and quotes |
151| @close  | Captures closing brackets, braces, and quotes |
152
153### Code outline/structure
154
155The `outline.scm` file defines the structure for the code outline.
156
157Here's an example from an `outline.scm` file for JSON:
158
159```scheme
160(pair
161  key: (string (string_content) @name)) @item
162```
163
164This query captures object keys for the outline structure.
165
166| Capture        | Description                                                                          |
167| -------------- | ------------------------------------------------------------------------------------ |
168| @name          | Captures the content of object keys                                                  |
169| @item          | Captures the entire key-value pair                                                   |
170| @context       | Captures elements that provide context for the outline item                          |
171| @context.extra | Captures additional contextual information for the outline item                      |
172| @annotation    | Captures nodes that annotate outline item (doc comments, attributes, decorators)[^1] |
173
174[^1]: These annotations are used by Assistant when generating code modification steps.
175
176### Auto-indentation
177
178The `indents.scm` file defines indentation rules.
179
180Here's an example from an `indents.scm` file for JSON:
181
182```scheme
183(array "]" @end) @indent
184(object "}" @end) @indent
185```
186
187This query marks the end of arrays and objects for indentation purposes.
188
189| Capture | Description                                        |
190| ------- | -------------------------------------------------- |
191| @end    | Captures closing brackets and braces               |
192| @indent | Captures entire arrays and objects for indentation |
193
194### Code injections
195
196The `injections.scm` file defines rules for embedding one language within another, such as code blocks in Markdown or SQL queries in Python strings.
197
198Here's an example from an `injections.scm` file for Markdown:
199
200```scheme
201(fenced_code_block
202  (info_string
203    (language) @language)
204  (code_fence_content) @content)
205
206((inline) @content
207 (#set! "language" "markdown-inline"))
208```
209
210This query identifies fenced code blocks, capturing the language specified in the info string and the content within the block. It also captures inline content and sets its language to "markdown-inline".
211
212| Capture   | Description                                                |
213| --------- | ---------------------------------------------------------- |
214| @language | Captures the language identifier for a code block          |
215| @content  | Captures the content to be treated as a different language |
216
217Note that we couldn't use JSON as an example here because it doesn't support language injections.
218
219### Syntax overrides
220
221The `overrides.scm` file defines syntactic _scopes_ that can be used to override certain editor settings within specific language constructs.
222
223For example, there is a language-specific setting called `word_characters` that controls which non-alphabetic characters are considered part of a word, for filtering autocomplete suggestions. In JavaScript, "$" and "#" are considered word characters. But when your cursor is within a _string_ in JavaScript, "-" is _also_ considered a word character. To achieve this, the JavaScript `overrides.scm` file contains the following pattern:
224
225```scheme
226[
227  (string)
228  (template_string)
229] @string
230```
231
232And the JavaScript `config.toml` contains this setting:
233
234```toml
235word_characters = ["#", "$"]
236
237[overrides.string]
238word_characters = ["-"]
239```
240
241You can also disable certain auto-closing brackets in a specific scope. For example, to prevent auto-closing `'` within strings, you could put the following in the JavaScript `config.toml`:
242
243```toml
244brackets = [
245  { start = "'", end = "'", close = true, newline = false, not_in = ["string"] },
246  # other pairs...
247]
248```
249
250#### Range inclusivity
251
252By default, the ranges defined in `overrides.scm` are _exclusive_. So in the case above, if you cursor was _outside_ the quotation marks delimiting the string, the `string` scope would not take effect. Sometimes, you may want to make the range _inclusive_. You can do this by adding the `.inclusive` suffix to the capture name in the query.
253
254For example, in JavaScript, we also disable auto-closing of single quotes within comments. And the comment scope must extend all the way to the newline after a line comment. To achieve this, the JavaScript `overrides.scm` contains the following pattern:
255
256```scheme
257(comment) @comment.inclusive
258```
259
260### Text redactions
261
262The `redactions.scm` file defines text redaction rules. When collaborating and sharing your screen, it makes sure that certain syntax nodes are rendered in a redacted mode to avoid them from leaking.
263
264Here's an example from a `redactions.scm` file for JSON:
265
266```scheme
267(pair value: (number) @redact)
268(pair value: (string) @redact)
269(array (number) @redact)
270(array (string) @redact)
271```
272
273This query marks number and string values in key-value pairs and arrays for redaction.
274
275| Capture | Description                    |
276| ------- | ------------------------------ |
277| @redact | Captures values to be redacted |
278
279### Runnable code detection
280
281The `runnables.scm` file defines rules for detecting runnable code.
282
283Here's an example from an `runnables.scm` file for JSON:
284
285```scheme
286(
287    (document
288        (object
289            (pair
290                key: (string
291                    (string_content) @_name
292                    (#eq? @_name "scripts")
293                )
294                value: (object
295                    (pair
296                        key: (string (string_content) @run @script)
297                    )
298                )
299            )
300        )
301    )
302    (#set! tag package-script)
303    (#set! tag composer-script)
304)
305```
306
307This query detects runnable scripts in package.json and composer.json files.
308
309The `@run` capture specifies where the run button should appear in the editor. Other captures, except those prefixed with an underscore, are exposed as environment variables with a prefix of `ZED_CUSTOM_$(capture_name)` when running the code.
310
311| Capture | Description                                            |
312| ------- | ------------------------------------------------------ |
313| @\_name | Captures the "scripts" key                             |
314| @run    | Captures the script name                               |
315| @script | Also captures the script name (for different purposes) |
316
317<!--
318TBD: `#set! tag`
319-->
320
321## Language Servers
322
323Zed uses the [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) to provide advanced language support.
324
325An extension may provide any number of language servers. To provide a language server from your extension, add an entry to your `extension.toml` with the name of your language server and the language it applies to:
326
327```toml
328[language_servers.my-language]
329name = "My Language LSP"
330language = "My Language"
331```
332
333Then, in the Rust code for your extension, implement the `language_server_command` method on your extension:
334
335```rust
336impl zed::Extension for MyExtension {
337    fn language_server_command(
338        &mut self,
339        language_server_id: &LanguageServerId,
340        worktree: &zed::Worktree,
341    ) -> Result<zed::Command> {
342        Ok(zed::Command {
343            command: get_path_to_language_server_executable()?,
344            args: get_args_for_language_server()?,
345            env: get_env_for_language_server()?,
346        })
347    }
348}
349```
350
351You can customize the handling of the language server using several optional methods in the `Extension` trait. For example, you can control how completions are styled using the `label_for_completion` method. For a complete list of methods, see the [API docs for the Zed extension API](https://docs.rs/zed_extension_api).