languages.md

  1# Language Extensions
  2
  3Language support in Zed has several components:
  4
  5- Language metadata and configuration
  6- Grammar
  7- Queries
  8- Language servers
  9
 10## Language Metadata
 11
 12Each language supported by Zed must be defined in a subdirectory inside the `languages` directory of your extension.
 13
 14This subdirectory must contain a file called `config.toml` file with the following structure:
 15
 16```toml
 17name = "My Language"
 18grammar = "my-language"
 19path_suffixes = ["myl"]
 20line_comments = ["# "]
 21```
 22
 23- `name` (required) is the human readable name that will show up in the Select Language dropdown.
 24- `grammar` (required) is the name of a grammar. Grammars are registered separately, described below.
 25- `path_suffixes` is an array of file suffixes that should be associated with this language. Unlike `file_types` in settings, this does not support glob patterns.
 26- `line_comments` is an array of strings that are used to identify line comments in the language. This is used for the `editor::ToggleComments` keybind: {#kb editor::ToggleComments} for toggling lines of code.
 27- `tab_size` defines the indentation/tab size used for this language (default is `4`).
 28- `hard_tabs` whether to indent with tabs (`true`) or spaces (`false`, the default).
 29- `first_line_pattern` is a regular expression, that in addition to `path_suffixes` (above) or `file_types` in settings can be used to match files which should use this language. For example Zed uses this to identify Shell Scripts by matching the [shebangs lines](https://github.com/zed-industries/zed/blob/main/crates/languages/src/bash/config.toml) in the first line of a script.
 30- `debuggers` is an array of strings that are used to identify debuggers in the language. When launching a debugger's `New Process Modal`, Zed will order available debuggers by the order of entries in this array.
 31
 32<!--
 33TBD: Document `language_name/config.toml` keys
 34
 35- autoclose_before
 36- brackets (start, end, close, newline, not_in: ["comment", "string"])
 37- word_characters
 38- prettier_parser_name
 39- opt_into_language_servers
 40- code_fence_block_name
 41- scope_opt_in_language_servers
 42- increase_indent_pattern, decrease_indent_pattern
 43- collapsed_placeholder
 44- auto_indent_on_paste, auto_indent_using_last_non_empty_line
 45- overrides: `[overrides.element]`, `[overrides.string]`
 46-->
 47
 48## Grammar
 49
 50Zed uses the [Tree-sitter](https://tree-sitter.github.io) parsing library to provide built-in language-specific features. There are grammars available for many languages, and you can also [develop your own grammar](https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar). A growing list of Zed features are built using pattern matching over syntax trees with Tree-sitter queries. As mentioned above, every language that is defined in an extension must specify the name of a Tree-sitter grammar that is used for parsing. These grammars are then registered separately in extensions' `extension.toml` file, like this:
 51
 52```toml
 53[grammars.gleam]
 54repository = "https://github.com/gleam-lang/tree-sitter-gleam"
 55rev = "58b7cac8fc14c92b0677c542610d8738c373fa81"
 56```
 57
 58The `repository` field must specify a repository where the Tree-sitter grammar should be loaded from, and the `rev` field must contain a Git revision to use, such as the SHA of a Git commit. If you're developing an extension locally and want to load a grammar from the local filesystem, you can use a `file://` URL for `repository`. An extension can provide multiple grammars by referencing multiple tree-sitter repositories.
 59
 60## Tree-sitter Queries
 61
 62Zed uses the syntax tree produced by the [Tree-sitter](https://tree-sitter.github.io) query language to implement
 63several features:
 64
 65- Syntax highlighting
 66- Bracket matching
 67- Code outline/structure
 68- Auto-indentation
 69- Code injections
 70- Syntax overrides
 71- Text redactions
 72- Runnable code detection
 73- Selecting classes, functions, etc.
 74
 75The following sections elaborate on how [Tree-sitter queries](https://tree-sitter.github.io/tree-sitter/using-parsers/queries/index.html) enable these
 76features in Zed, using [JSON syntax](https://www.json.org/json-en.html) as a guiding example.
 77
 78### Syntax highlighting
 79
 80In Tree-sitter, the `highlights.scm` file defines syntax highlighting rules for a particular syntax.
 81
 82Here's an example from a `highlights.scm` for JSON:
 83
 84```scheme
 85(string) @string
 86
 87(pair
 88  key: (string) @property.json_key)
 89
 90(number) @number
 91```
 92
 93This query marks strings, object keys, and numbers for highlighting. The following is a comprehensive list of captures supported by themes:
 94
 95| Capture                  | Description                            |
 96| ------------------------ | -------------------------------------- |
 97| @attribute               | Captures attributes                    |
 98| @boolean                 | Captures boolean values                |
 99| @comment                 | Captures comments                      |
100| @comment.doc             | Captures documentation comments        |
101| @constant                | Captures constants                     |
102| @constant.builtin        | Captures built-in constants            |
103| @constructor             | Captures constructors                  |
104| @embedded                | Captures embedded content              |
105| @emphasis                | Captures emphasized text               |
106| @emphasis.strong         | Captures strongly emphasized text      |
107| @enum                    | Captures enumerations                  |
108| @function                | Captures functions                     |
109| @hint                    | Captures hints                         |
110| @keyword                 | Captures keywords                      |
111| @label                   | Captures labels                        |
112| @link_text               | Captures link text                     |
113| @link_uri                | Captures link URIs                     |
114| @number                  | Captures numeric values                |
115| @operator                | Captures operators                     |
116| @predictive              | Captures predictive text               |
117| @preproc                 | Captures preprocessor directives       |
118| @primary                 | Captures primary elements              |
119| @property                | Captures properties                    |
120| @punctuation             | Captures punctuation                   |
121| @punctuation.bracket     | Captures brackets                      |
122| @punctuation.delimiter   | Captures delimiters                    |
123| @punctuation.list_marker | Captures list markers                  |
124| @punctuation.special     | Captures special punctuation           |
125| @string                  | Captures string literals               |
126| @string.escape           | Captures escaped characters in strings |
127| @string.regex            | Captures regular expressions           |
128| @string.special          | Captures special strings               |
129| @string.special.symbol   | Captures special symbols               |
130| @tag                     | Captures tags                          |
131| @tag.doctype             | Captures doctypes (e.g., in HTML)      |
132| @text.literal            | Captures literal text                  |
133| @title                   | Captures titles                        |
134| @type                    | Captures types                         |
135| @type.builtin            | Captures built-in types                |
136| @variable                | Captures variables                     |
137| @variable.special        | Captures special variables             |
138| @variable.parameter      | Captures function/method parameters    |
139| @variant                 | Captures variants                      |
140
141### Bracket matching
142
143The `brackets.scm` file defines matching brackets.
144
145Here's an example from a `brackets.scm` file for JSON:
146
147```scheme
148("[" @open "]" @close)
149("{" @open "}" @close)
150("\"" @open "\"" @close)
151```
152
153This query identifies opening and closing brackets, braces, and quotation marks.
154
155| Capture | Description                                   |
156| ------- | --------------------------------------------- |
157| @open   | Captures opening brackets, braces, and quotes |
158| @close  | Captures closing brackets, braces, and quotes |
159
160Zed uses these to highlight matching brackets: painting each bracket pair with a different color ("rainbow brackets") and highlighting the brackets if the cursor is inside the bracket pair.
161
162To opt out of rainbow brackets colorization, add the following to the corresponding `brackets.scm` entry:
163
164```scheme
165(("\"" @open "\"" @close) (#set! rainbow.exclude))
166```
167
168### Code outline/structure
169
170The `outline.scm` file defines the structure for the code outline.
171
172Here's an example from an `outline.scm` file for JSON:
173
174```scheme
175(pair
176  key: (string (string_content) @name)) @item
177```
178
179This query captures object keys for the outline structure.
180
181| Capture        | Description                                                                          |
182| -------------- | ------------------------------------------------------------------------------------ |
183| @name          | Captures the content of object keys                                                  |
184| @item          | Captures the entire key-value pair                                                   |
185| @context       | Captures elements that provide context for the outline item                          |
186| @context.extra | Captures additional contextual information for the outline item                      |
187| @annotation    | Captures nodes that annotate outline item (doc comments, attributes, decorators)[^1] |
188
189[^1]: These annotations are used by Assistant when generating code modification steps.
190
191### Auto-indentation
192
193The `indents.scm` file defines indentation rules.
194
195Here's an example from an `indents.scm` file for JSON:
196
197```scheme
198(array "]" @end) @indent
199(object "}" @end) @indent
200```
201
202This query marks the end of arrays and objects for indentation purposes.
203
204| Capture | Description                                        |
205| ------- | -------------------------------------------------- |
206| @end    | Captures closing brackets and braces               |
207| @indent | Captures entire arrays and objects for indentation |
208
209### Code injections
210
211The `injections.scm` file defines rules for embedding one language within another, such as code blocks in Markdown or SQL queries in Python strings.
212
213Here's an example from an `injections.scm` file for Markdown:
214
215```scheme
216(fenced_code_block
217  (info_string
218    (language) @injection.language)
219  (code_fence_content) @injection.content)
220
221((inline) @content
222 (#set! injection.language "markdown-inline"))
223```
224
225This query identifies fenced code blocks, capturing the language specified in the info string and the content within the block. It also captures inline content and sets its language to "markdown-inline".
226
227| Capture             | Description                                                |
228| ------------------- | ---------------------------------------------------------- |
229| @injection.language | Captures the language identifier for a code block          |
230| @injection.content  | Captures the content to be treated as a different language |
231
232Note that we couldn't use JSON as an example here because it doesn't support language injections.
233
234### Syntax overrides
235
236The `overrides.scm` file defines syntactic _scopes_ that can be used to override certain editor settings within specific language constructs.
237
238For example, there is a language-specific setting called `word_characters` that controls which non-alphabetic characters are considered part of a word, for example when you double click to select a variable. In JavaScript, "$" and "#" are considered word characters.
239
240There is also a language-specific setting called `completion_query_characters` that controls which characters trigger autocomplete suggestions. In JavaScript, when your cursor is within a _string_, "-" is should be considered a completion query character. To achieve this, the JavaScript `overrides.scm` file contains the following pattern:
241
242```scheme
243[
244  (string)
245  (template_string)
246] @string
247```
248
249And the JavaScript `config.toml` contains this setting:
250
251```toml
252word_characters = ["#", "$"]
253
254[overrides.string]
255completion_query_characters = ["-"]
256```
257
258You can also disable certain auto-closing brackets in a specific scope. For example, to prevent auto-closing `'` within strings, you could put the following in the JavaScript `config.toml`:
259
260```toml
261brackets = [
262  { start = "'", end = "'", close = true, newline = false, not_in = ["string"] },
263  # other pairs...
264]
265```
266
267#### Range inclusivity
268
269By default, the ranges defined in `overrides.scm` are _exclusive_. So in the case above, if you cursor was _outside_ the quotation marks delimiting the string, the `string` scope would not take effect. Sometimes, you may want to make the range _inclusive_. You can do this by adding the `.inclusive` suffix to the capture name in the query.
270
271For example, in JavaScript, we also disable auto-closing of single quotes within comments. And the comment scope must extend all the way to the newline after a line comment. To achieve this, the JavaScript `overrides.scm` contains the following pattern:
272
273```scheme
274(comment) @comment.inclusive
275```
276
277### Text objects
278
279The `textobjects.scm` file defines rules for navigating by text objects. This was added in Zed v0.165 and is currently used only in Vim mode.
280
281Vim provides two levels of granularity for navigating around files. Section-by-section with `[]` etc., and method-by-method with `]m` etc. Even languages that don't support functions and classes can work well by defining similar concepts. For example CSS defines a rule-set as a method, and a media-query as a class.
282
283For languages with closures, these typically should not count as functions in Zed. This is best-effort however, as languages like JavaScript do not syntactically differentiate syntactically between closures and top-level function declarations.
284
285For languages with declarations like C, provide queries that match `@class.around` or `@function.around`. The `if` and `ic` text objects will default to these if there is no inside.
286
287If you are not sure what to put in textobjects.scm, both [nvim-treesitter-textobjects](https://github.com/nvim-treesitter/nvim-treesitter-textobjects), and the [Helix editor](https://github.com/helix-editor/helix) have queries for many languages. You can refer to the Zed [built-in languages](https://github.com/zed-industries/zed/tree/main/crates/languages/src) to see how to adapt these.
288
289| Capture          | Description                                                             | Vim mode                                         |
290| ---------------- | ----------------------------------------------------------------------- | ------------------------------------------------ |
291| @function.around | An entire function definition or equivalent small section of a file.    | `[m`, `]m`, `[M`,`]M` motions. `af` text object  |
292| @function.inside | The function body (the stuff within the braces).                        | `if` text object                                 |
293| @class.around    | An entire class definition or equivalent large section of a file.       | `[[`, `]]`, `[]`, `][` motions. `ac` text object |
294| @class.inside    | The contents of a class definition.                                     | `ic` text object                                 |
295| @comment.around  | An entire comment (e.g. all adjacent line comments, or a block comment) | `gc` text object                                 |
296| @comment.inside  | The contents of a comment                                               | `igc` text object (rarely supported)             |
297
298For example:
299
300```scheme
301; include only the content of the method in the function
302(method_definition
303    body: (_
304        "{"
305        (_)* @function.inside
306        "}")) @function.around
307
308; match function.around for declarations with no body
309(function_signature_item) @function.around
310
311; join all adjacent comments into one
312(comment)+ @comment.around
313```
314
315### Text redactions
316
317The `redactions.scm` file defines text redaction rules. When collaborating and sharing your screen, it makes sure that certain syntax nodes are rendered in a redacted mode to avoid them from leaking.
318
319Here's an example from a `redactions.scm` file for JSON:
320
321```scheme
322(pair value: (number) @redact)
323(pair value: (string) @redact)
324(array (number) @redact)
325(array (string) @redact)
326```
327
328This query marks number and string values in key-value pairs and arrays for redaction.
329
330| Capture | Description                    |
331| ------- | ------------------------------ |
332| @redact | Captures values to be redacted |
333
334### Runnable code detection
335
336The `runnables.scm` file defines rules for detecting runnable code.
337
338Here's an example from a `runnables.scm` file for JSON:
339
340```scheme
341(
342    (document
343        (object
344            (pair
345                key: (string
346                    (string_content) @_name
347                    (#eq? @_name "scripts")
348                )
349                value: (object
350                    (pair
351                        key: (string (string_content) @run @script)
352                    )
353                )
354            )
355        )
356    )
357    (#set! tag package-script)
358    (#set! tag composer-script)
359)
360```
361
362This query detects runnable scripts in package.json and composer.json files.
363
364The `@run` capture specifies where the run button should appear in the editor. Other captures, except those prefixed with an underscore, are exposed as environment variables with a prefix of `ZED_CUSTOM_$(capture_name)` when running the code.
365
366| Capture | Description                                            |
367| ------- | ------------------------------------------------------ |
368| @\_name | Captures the "scripts" key                             |
369| @run    | Captures the script name                               |
370| @script | Also captures the script name (for different purposes) |
371
372<!--
373TBD: `#set! tag`
374-->
375
376## Language Servers
377
378Zed uses the [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) to provide advanced language support.
379
380An extension may provide any number of language servers. To provide a language server from your extension, add an entry to your `extension.toml` with the name of your language server and the language(s) it applies to. The entry in the list of `languages` has to match the `name` field from the `config.toml` file for that language:
381
382```toml
383[language_servers.my-language-server]
384name = "My Language LSP"
385languages = ["My Language"]
386```
387
388Then, in the Rust code for your extension, implement the `language_server_command` method on your extension:
389
390```rust
391impl zed::Extension for MyExtension {
392    fn language_server_command(
393        &mut self,
394        language_server_id: &LanguageServerId,
395        worktree: &zed::Worktree,
396    ) -> Result<zed::Command> {
397        Ok(zed::Command {
398            command: get_path_to_language_server_executable()?,
399            args: get_args_for_language_server()?,
400            env: get_env_for_language_server()?,
401        })
402    }
403}
404```
405
406You can customize the handling of the language server using several optional methods in the `Extension` trait. For example, you can control how completions are styled using the `label_for_completion` method. For a complete list of methods, see the [API docs for the Zed extension API](https://docs.rs/zed_extension_api).
407
408### Syntax Highlighting with Semantic Tokens
409
410Zed supports syntax highlighting using semantic tokens from the attached language servers. This is currently disabled by default, but can be enabled in your settings file:
411
412```json [settings]
413{
414  // Enable semantic tokens globally, backin with tree-sitter highlights for each language:
415  "semantic_tokens": "combined",
416  // Or, specify per-language:
417  "languages": {
418    "Rust": {
419      // No tree-sitter, only LSP semantic tokens:
420      "semantic_tokens": "full"
421    }
422  }
423}
424```
425
426The `semantic_tokens` setting accepts the following values:
427
428- `"off"` (default): Do not request semantic tokens from language servers.
429- `"combined"`: Use LSP semantic tokens together with tree-sitter highlighting.
430- `"full"`: Use LSP semantic tokens exclusively, replacing tree-sitter highlighting.
431
432#### Customizing Semantic Token Styles
433
434Zed supports customizing the styles used for semantic tokens. You can define rules in your settings file, which customize how semantic tokens get mapped to styles in your theme.
435
436```json [settings]
437{
438  "global_lsp_settings": {
439    "semantic_token_rules": [
440      {
441        // Highlight macros as keywords.
442        "token_type": "macro",
443        "style": ["syntax.keyword"]
444      },
445      {
446        // Highlight unresolved references in bold red.
447        "token_type": "unresolvedReference",
448        "foreground_color": "#c93f3f",
449        "font_weight": "bold"
450      },
451      {
452        // Underline all mutable variables/references/etc.
453        "token_modifiers": ["mutable"],
454        "underline": true
455      }
456    ]
457  }
458}
459```
460
461All rules that match a given `token_type` and `token_modifiers` are applied. Earlier rules take precedence. If no rules match, the token is not highlighted. User-defined rules take priority over the default rules.
462
463Each rule in the `semantic_token_rules` array is defined as follows:
464
465- `token_type`: The semantic token type as defined by the [LSP specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens). If omitted, the rule matches all token types.
466- `token_modifiers`: A list of semantic token modifiers to match. All modifiers must be present to match.
467- `style`: A list of styles from the current syntax theme to use. The first style found is used. Any settings below override that style.
468- `foreground_color`: The foreground color to use for the token type, in hex format (e.g., `"#ff0000"`).
469- `background_color`: The background color to use for the token type, in hex format (e.g., `"#ff0000"`).
470- `underline`: A boolean or color to underline with, in hex format. If `true`, then the token will be underlined with the text color.
471- `strikethrough`: A boolean or color to strikethrough with, in hex format. If `true`, then the token have a strikethrough with the text color.
472- `font_weight`: One of `"normal"`, `"bold"`.
473- `font_style`: One of `"normal"`, `"italic"`.
474
475### Multi-Language Support
476
477If your language server supports additional languages, you can use `language_ids` to map Zed `languages` to the desired [LSP-specific `languageId`](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocumentItem) identifiers:
478
479```toml
480
481[language-servers.my-language-server]
482name = "Whatever LSP"
483languages = ["JavaScript", "HTML", "CSS"]
484
485[language-servers.my-language-server.language_ids]
486"JavaScript" = "javascript"
487"TSX" = "typescriptreact"
488"HTML" = "html"
489"CSS" = "css"
490```