languages.md

  1# Language Extensions
  2
  3Language support in Zed has several components:
  4
  5- Language metadata and configuration
  6- Grammar
  7- Queries
  8- Language servers
  9
 10## Language Metadata
 11
 12Each language supported by Zed must be defined in a subdirectory inside the `languages` directory of your extension.
 13
 14This subdirectory must contain a file called `config.toml` file with the following structure:
 15
 16```toml
 17name = "My Language"
 18grammar = "my-language"
 19path_suffixes = ["myl"]
 20line_comments = ["# "]
 21```
 22
 23- `name` (required) is the human readable name that will show up in the Select Language dropdown.
 24- `grammar` (required) is the name of a grammar. Grammars are registered separately, described below.
 25- `path_suffixes` is an array of file suffixes that should be associated with this language. Unlike `file_types` in settings, this does not support glob patterns.
 26- `line_comments` is an array of strings that are used to identify line comments in the language. This is used for the `editor::ToggleComments` keybind: {#kb editor::ToggleComments} for toggling lines of code.
 27- `tab_size` defines the indentation/tab size used for this language (default is `4`).
 28- `hard_tabs` whether to indent with tabs (`true`) or spaces (`false`, the default).
 29- `first_line_pattern` is a regular expression, that in addition to `path_suffixes` (above) or `file_types` in settings can be used to match files which should use this language. For example Zed uses this to identify Shell Scripts by matching the [shebangs lines](https://github.com/zed-industries/zed/blob/main/crates/languages/src/bash/config.toml) in the first line of a script.
 30- `modeline_aliases` is an array of additional Emacs modes or Vim filetypes to map modeline settings to Zed language.
 31- `debuggers` is an array of strings that are used to identify debuggers in the language. When launching a debugger's `New Process Modal`, Zed will order available debuggers by the order of entries in this array.
 32
 33<!--
 34TBD: Document `language_name/config.toml` keys
 35
 36- autoclose_before
 37- brackets (start, end, close, newline, not_in: ["comment", "string"])
 38- word_characters
 39- prettier_parser_name
 40- opt_into_language_servers
 41- code_fence_block_name
 42- scope_opt_in_language_servers
 43- increase_indent_pattern, decrease_indent_pattern
 44- collapsed_placeholder
 45- auto_indent_on_paste, auto_indent_using_last_non_empty_line
 46- overrides: `[overrides.element]`, `[overrides.string]`
 47-->
 48
 49## Grammar
 50
 51Zed uses the [Tree-sitter](https://tree-sitter.github.io) parsing library to provide built-in language-specific features. There are grammars available for many languages, and you can also [develop your own grammar](https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar). A growing list of Zed features are built using pattern matching over syntax trees with Tree-sitter queries. As mentioned above, every language that is defined in an extension must specify the name of a Tree-sitter grammar that is used for parsing. These grammars are then registered separately in extensions' `extension.toml` file, like this:
 52
 53```toml
 54[grammars.gleam]
 55repository = "https://github.com/gleam-lang/tree-sitter-gleam"
 56rev = "58b7cac8fc14c92b0677c542610d8738c373fa81"
 57```
 58
 59The `repository` field must specify a repository where the Tree-sitter grammar should be loaded from, and the `rev` field must contain a Git revision to use, such as the SHA of a Git commit. If you're developing an extension locally and want to load a grammar from the local filesystem, you can use a `file://` URL for `repository`. An extension can provide multiple grammars by referencing multiple tree-sitter repositories.
 60
 61## Tree-sitter Queries
 62
 63Zed uses the syntax tree produced by the [Tree-sitter](https://tree-sitter.github.io) query language to implement
 64several features:
 65
 66- Syntax highlighting
 67- Bracket matching
 68- Code outline/structure
 69- Auto-indentation
 70- Code injections
 71- Syntax overrides
 72- Text redactions
 73- Runnable code detection
 74- Selecting classes, functions, etc.
 75
 76The following sections elaborate on how [Tree-sitter queries](https://tree-sitter.github.io/tree-sitter/using-parsers/queries/index.html) enable these
 77features in Zed, using [JSON syntax](https://www.json.org/json-en.html) as a guiding example.
 78
 79### Syntax highlighting
 80
 81In Tree-sitter, the `highlights.scm` file defines syntax highlighting rules for a particular syntax.
 82
 83Here's an example from a `highlights.scm` for JSON:
 84
 85```scheme
 86(string) @string
 87
 88(pair
 89  key: (string) @property.json_key)
 90
 91(number) @number
 92```
 93
 94This query marks strings, object keys, and numbers for highlighting. The following is a comprehensive list of captures supported by themes:
 95
 96| Capture                  | Description                            |
 97| ------------------------ | -------------------------------------- |
 98| @attribute               | Captures attributes                    |
 99| @boolean                 | Captures boolean values                |
100| @comment                 | Captures comments                      |
101| @comment.doc             | Captures documentation comments        |
102| @constant                | Captures constants                     |
103| @constructor             | Captures constructors                  |
104| @embedded                | Captures embedded content              |
105| @emphasis                | Captures emphasized text               |
106| @emphasis.strong         | Captures strongly emphasized text      |
107| @enum                    | Captures enumerations                  |
108| @function                | Captures functions                     |
109| @hint                    | Captures hints                         |
110| @keyword                 | Captures keywords                      |
111| @label                   | Captures labels                        |
112| @link_text               | Captures link text                     |
113| @link_uri                | Captures link URIs                     |
114| @number                  | Captures numeric values                |
115| @operator                | Captures operators                     |
116| @predictive              | Captures predictive text               |
117| @preproc                 | Captures preprocessor directives       |
118| @primary                 | Captures primary elements              |
119| @property                | Captures properties                    |
120| @punctuation             | Captures punctuation                   |
121| @punctuation.bracket     | Captures brackets                      |
122| @punctuation.delimiter   | Captures delimiters                    |
123| @punctuation.list_marker | Captures list markers                  |
124| @punctuation.special     | Captures special punctuation           |
125| @string                  | Captures string literals               |
126| @string.escape           | Captures escaped characters in strings |
127| @string.regex            | Captures regular expressions           |
128| @string.special          | Captures special strings               |
129| @string.special.symbol   | Captures special symbols               |
130| @tag                     | Captures tags                          |
131| @tag.doctype             | Captures doctypes (e.g., in HTML)      |
132| @text.literal            | Captures literal text                  |
133| @title                   | Captures titles                        |
134| @type                    | Captures types                         |
135| @variable                | Captures variables                     |
136| @variable.special        | Captures special variables             |
137| @variant                 | Captures variants                      |
138
139### Bracket matching
140
141The `brackets.scm` file defines matching brackets.
142
143Here's an example from a `brackets.scm` file for JSON:
144
145```scheme
146("[" @open "]" @close)
147("{" @open "}" @close)
148("\"" @open "\"" @close)
149```
150
151This query identifies opening and closing brackets, braces, and quotation marks.
152
153| Capture | Description                                   |
154| ------- | --------------------------------------------- |
155| @open   | Captures opening brackets, braces, and quotes |
156| @close  | Captures closing brackets, braces, and quotes |
157
158Zed uses these to highlight matching brackets: painting each bracket pair with a different color ("rainbow brackets") and highlighting the brackets if the cursor is inside the bracket pair.
159
160To opt out of rainbow brackets colorization, add the following to the corresponding `brackets.scm` entry:
161
162```scheme
163(("\"" @open "\"" @close) (#set! rainbow.exclude))
164```
165
166### Code outline/structure
167
168The `outline.scm` file defines the structure for the code outline.
169
170Here's an example from an `outline.scm` file for JSON:
171
172```scheme
173(pair
174  key: (string (string_content) @name)) @item
175```
176
177This query captures object keys for the outline structure.
178
179| Capture        | Description                                                                          |
180| -------------- | ------------------------------------------------------------------------------------ |
181| @name          | Captures the content of object keys                                                  |
182| @item          | Captures the entire key-value pair                                                   |
183| @context       | Captures elements that provide context for the outline item                          |
184| @context.extra | Captures additional contextual information for the outline item                      |
185| @annotation    | Captures nodes that annotate outline item (doc comments, attributes, decorators)[^1] |
186
187[^1]: These annotations are used by Assistant when generating code modification steps.
188
189### Auto-indentation
190
191The `indents.scm` file defines indentation rules.
192
193Here's an example from an `indents.scm` file for JSON:
194
195```scheme
196(array "]" @end) @indent
197(object "}" @end) @indent
198```
199
200This query marks the end of arrays and objects for indentation purposes.
201
202| Capture | Description                                        |
203| ------- | -------------------------------------------------- |
204| @end    | Captures closing brackets and braces               |
205| @indent | Captures entire arrays and objects for indentation |
206
207### Code injections
208
209The `injections.scm` file defines rules for embedding one language within another, such as code blocks in Markdown or SQL queries in Python strings.
210
211Here's an example from an `injections.scm` file for Markdown:
212
213```scheme
214(fenced_code_block
215  (info_string
216    (language) @injection.language)
217  (code_fence_content) @injection.content)
218
219((inline) @content
220 (#set! injection.language "markdown-inline"))
221```
222
223This query identifies fenced code blocks, capturing the language specified in the info string and the content within the block. It also captures inline content and sets its language to "markdown-inline".
224
225| Capture             | Description                                                |
226| ------------------- | ---------------------------------------------------------- |
227| @injection.language | Captures the language identifier for a code block          |
228| @injection.content  | Captures the content to be treated as a different language |
229
230Note that we couldn't use JSON as an example here because it doesn't support language injections.
231
232### Syntax overrides
233
234The `overrides.scm` file defines syntactic _scopes_ that can be used to override certain editor settings within specific language constructs.
235
236For example, there is a language-specific setting called `word_characters` that controls which non-alphabetic characters are considered part of a word, for example when you double click to select a variable. In JavaScript, "$" and "#" are considered word characters.
237
238There is also a language-specific setting called `completion_query_characters` that controls which characters trigger autocomplete suggestions. In JavaScript, when your cursor is within a _string_, "-" is should be considered a completion query character. To achieve this, the JavaScript `overrides.scm` file contains the following pattern:
239
240```scheme
241[
242  (string)
243  (template_string)
244] @string
245```
246
247And the JavaScript `config.toml` contains this setting:
248
249```toml
250word_characters = ["#", "$"]
251
252[overrides.string]
253completion_query_characters = ["-"]
254```
255
256You can also disable certain auto-closing brackets in a specific scope. For example, to prevent auto-closing `'` within strings, you could put the following in the JavaScript `config.toml`:
257
258```toml
259brackets = [
260  { start = "'", end = "'", close = true, newline = false, not_in = ["string"] },
261  # other pairs...
262]
263```
264
265#### Range inclusivity
266
267By default, the ranges defined in `overrides.scm` are _exclusive_. So in the case above, if you cursor was _outside_ the quotation marks delimiting the string, the `string` scope would not take effect. Sometimes, you may want to make the range _inclusive_. You can do this by adding the `.inclusive` suffix to the capture name in the query.
268
269For example, in JavaScript, we also disable auto-closing of single quotes within comments. And the comment scope must extend all the way to the newline after a line comment. To achieve this, the JavaScript `overrides.scm` contains the following pattern:
270
271```scheme
272(comment) @comment.inclusive
273```
274
275### Text objects
276
277The `textobjects.scm` file defines rules for navigating by text objects. This was added in Zed v0.165 and is currently used only in Vim mode.
278
279Vim provides two levels of granularity for navigating around files. Section-by-section with `[]` etc., and method-by-method with `]m` etc. Even languages that don't support functions and classes can work well by defining similar concepts. For example CSS defines a rule-set as a method, and a media-query as a class.
280
281For languages with closures, these typically should not count as functions in Zed. This is best-effort however, as languages like JavaScript do not syntactically differentiate syntactically between closures and top-level function declarations.
282
283For languages with declarations like C, provide queries that match `@class.around` or `@function.around`. The `if` and `ic` text objects will default to these if there is no inside.
284
285If you are not sure what to put in textobjects.scm, both [nvim-treesitter-textobjects](https://github.com/nvim-treesitter/nvim-treesitter-textobjects), and the [Helix editor](https://github.com/helix-editor/helix) have queries for many languages. You can refer to the Zed [built-in languages](https://github.com/zed-industries/zed/tree/main/crates/languages/src) to see how to adapt these.
286
287| Capture          | Description                                                             | Vim mode                                         |
288| ---------------- | ----------------------------------------------------------------------- | ------------------------------------------------ |
289| @function.around | An entire function definition or equivalent small section of a file.    | `[m`, `]m`, `[M`,`]M` motions. `af` text object  |
290| @function.inside | The function body (the stuff within the braces).                        | `if` text object                                 |
291| @class.around    | An entire class definition or equivalent large section of a file.       | `[[`, `]]`, `[]`, `][` motions. `ac` text object |
292| @class.inside    | The contents of a class definition.                                     | `ic` text object                                 |
293| @comment.around  | An entire comment (e.g. all adjacent line comments, or a block comment) | `gc` text object                                 |
294| @comment.inside  | The contents of a comment                                               | `igc` text object (rarely supported)             |
295
296For example:
297
298```scheme
299; include only the content of the method in the function
300(method_definition
301    body: (_
302        "{"
303        (_)* @function.inside
304        "}")) @function.around
305
306; match function.around for declarations with no body
307(function_signature_item) @function.around
308
309; join all adjacent comments into one
310(comment)+ @comment.around
311```
312
313### Text redactions
314
315The `redactions.scm` file defines text redaction rules. When collaborating and sharing your screen, it makes sure that certain syntax nodes are rendered in a redacted mode to avoid them from leaking.
316
317Here's an example from a `redactions.scm` file for JSON:
318
319```scheme
320(pair value: (number) @redact)
321(pair value: (string) @redact)
322(array (number) @redact)
323(array (string) @redact)
324```
325
326This query marks number and string values in key-value pairs and arrays for redaction.
327
328| Capture | Description                    |
329| ------- | ------------------------------ |
330| @redact | Captures values to be redacted |
331
332### Runnable code detection
333
334The `runnables.scm` file defines rules for detecting runnable code.
335
336Here's an example from a `runnables.scm` file for JSON:
337
338```scheme
339(
340    (document
341        (object
342            (pair
343                key: (string
344                    (string_content) @_name
345                    (#eq? @_name "scripts")
346                )
347                value: (object
348                    (pair
349                        key: (string (string_content) @run @script)
350                    )
351                )
352            )
353        )
354    )
355    (#set! tag package-script)
356    (#set! tag composer-script)
357)
358```
359
360This query detects runnable scripts in package.json and composer.json files.
361
362The `@run` capture specifies where the run button should appear in the editor. Other captures, except those prefixed with an underscore, are exposed as environment variables with a prefix of `ZED_CUSTOM_$(capture_name)` when running the code.
363
364| Capture | Description                                            |
365| ------- | ------------------------------------------------------ |
366| @\_name | Captures the "scripts" key                             |
367| @run    | Captures the script name                               |
368| @script | Also captures the script name (for different purposes) |
369
370<!--
371TBD: `#set! tag`
372-->
373
374## Language Servers
375
376Zed uses the [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) to provide advanced language support.
377
378An extension may provide any number of language servers. To provide a language server from your extension, add an entry to your `extension.toml` with the name of your language server and the language(s) it applies to. The entry in the list of `languages` has to match the `name` field from the `config.toml` file for that language:
379
380```toml
381[language_servers.my-language-server]
382name = "My Language LSP"
383languages = ["My Language"]
384```
385
386Then, in the Rust code for your extension, implement the `language_server_command` method on your extension:
387
388```rust
389impl zed::Extension for MyExtension {
390    fn language_server_command(
391        &mut self,
392        language_server_id: &LanguageServerId,
393        worktree: &zed::Worktree,
394    ) -> Result<zed::Command> {
395        Ok(zed::Command {
396            command: get_path_to_language_server_executable()?,
397            args: get_args_for_language_server()?,
398            env: get_env_for_language_server()?,
399        })
400    }
401}
402```
403
404You can customize the handling of the language server using several optional methods in the `Extension` trait. For example, you can control how completions are styled using the `label_for_completion` method. For a complete list of methods, see the [API docs for the Zed extension API](https://docs.rs/zed_extension_api).
405
406### Multi-Language Support
407
408If your language server supports additional languages, you can use `language_ids` to map Zed `languages` to the desired [LSP-specific `languageId`](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocumentItem) identifiers:
409
410```toml
411
412[language-servers.my-language-server]
413name = "Whatever LSP"
414languages = ["JavaScript", "HTML", "CSS"]
415
416[language-servers.my-language-server.language_ids]
417"JavaScript" = "javascript"
418"TSX" = "typescriptreact"
419"HTML" = "html"
420"CSS" = "css"
421```