languages.md

  1# Language Extensions
  2
  3Language support in Zed has several components:
  4
  5- Language metadata and configuration
  6- Grammar
  7- Queries
  8- Language servers
  9
 10## Language Metadata
 11
 12Each language supported by Zed must be defined in a subdirectory inside the `languages` directory of your extension.
 13
 14This subdirectory must contain a file called `config.toml` file with the following structure:
 15
 16```toml
 17name = "My Language"
 18grammar = "my-language"
 19path_suffixes = ["myl"]
 20line_comments = ["# "]
 21```
 22
 23- `name` (required) is the human readable name that will show up in the Select Language dropdown.
 24- `grammar` (required) is the name of a grammar. Grammars are registered separately, described below.
 25- `path_suffixes` is an array of file suffixes that should be associated with this language. Unlike `file_types` in settings, this does not support glob patterns.
 26- `line_comments` is an array of strings that are used to identify line comments in the language. This is used for the `editor::ToggleComments` keybind: `{#kb editor::ToggleComments}` for toggling lines of code.
 27- `tab_size` defines the indentation/tab size used for this language (default is `4`).
 28- `hard_tabs` whether to indent with tabs (`true`) or spaces (`false`, the default).
 29- `first_line_pattern` is a regular expression, that in addition to `path_suffixes` (above) or `file_types` in settings can be used to match files which should use this language. For example Zed uses this to identify Shell Scripts by matching the [shebangs lines](https://github.com/zed-industries/zed/blob/main/crates/languages/src/bash/config.toml) in the first line of a script.
 30
 31<!--
 32TBD: Document `language_name/config.toml` keys
 33
 34- autoclose_before
 35- brackets (start, end, close, newline, not_in: ["comment", "string"])
 36- word_characters
 37- prettier_parser_name
 38- opt_into_language_servers
 39- code_fence_block_name
 40- scope_opt_in_language_servers
 41- increase_indent_pattern, decrease_indent_pattern
 42- collapsed_placeholder
 43- auto_indent_on_paste, auto_indent_using_last_non_empty_line
 44- overrides: `[overrides.element]`, `[overrides.string]`
 45-->
 46
 47## Grammar
 48
 49Zed uses the [Tree-sitter](https://tree-sitter.github.io) parsing library to provide built-in language-specific features. There are grammars available for many languages, and you can also [develop your own grammar](https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar). A growing list of Zed features are built using pattern matching over syntax trees with Tree-sitter queries. As mentioned above, every language that is defined in an extension must specify the name of a Tree-sitter grammar that is used for parsing. These grammars are then registered separately in extensions' `extension.toml` file, like this:
 50
 51```toml
 52[grammars.gleam]
 53repository = "https://github.com/gleam-lang/tree-sitter-gleam"
 54rev = "58b7cac8fc14c92b0677c542610d8738c373fa81"
 55```
 56
 57The `repository` field must specify a repository where the Tree-sitter grammar should be loaded from, and the `rev` field must contain a Git revision to use, such as the SHA of a Git commit. An extension can provide multiple grammars by referencing multiple tree-sitter repositories.
 58
 59## Tree-sitter Queries
 60
 61Zed uses the syntax tree produced by the [Tree-sitter](https://tree-sitter.github.io) query language to implement
 62several features:
 63
 64- Syntax highlighting
 65- Bracket matching
 66- Code outline/structure
 67- Auto-indentation
 68- Code injections
 69- Syntax overrides
 70- Text redactions
 71- Runnable code detection
 72- Selecting classes, functions, etc.
 73
 74The following sections elaborate on how [Tree-sitter queries](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax) enable these
 75features in Zed, using [JSON syntax](https://www.json.org/json-en.html) as a guiding example.
 76
 77### Syntax highlighting
 78
 79In Tree-sitter, the `highlights.scm` file defines syntax highlighting rules for a particular syntax.
 80
 81Here's an example from a `highlights.scm` for JSON:
 82
 83```scheme
 84(string) @string
 85
 86(pair
 87  key: (string) @property.json_key)
 88
 89(number) @number
 90```
 91
 92This query marks strings, object keys, and numbers for highlighting. The following is a comprehensive list of captures supported by themes:
 93
 94| Capture                  | Description                            |
 95| ------------------------ | -------------------------------------- |
 96| @attribute               | Captures attributes                    |
 97| @boolean                 | Captures boolean values                |
 98| @comment                 | Captures comments                      |
 99| @comment.doc             | Captures documentation comments        |
100| @constant                | Captures constants                     |
101| @constructor             | Captures constructors                  |
102| @embedded                | Captures embedded content              |
103| @emphasis                | Captures emphasized text               |
104| @emphasis.strong         | Captures strongly emphasized text      |
105| @enum                    | Captures enumerations                  |
106| @function                | Captures functions                     |
107| @hint                    | Captures hints                         |
108| @keyword                 | Captures keywords                      |
109| @label                   | Captures labels                        |
110| @link_text               | Captures link text                     |
111| @link_uri                | Captures link URIs                     |
112| @number                  | Captures numeric values                |
113| @operator                | Captures operators                     |
114| @predictive              | Captures predictive text               |
115| @preproc                 | Captures preprocessor directives       |
116| @primary                 | Captures primary elements              |
117| @property                | Captures properties                    |
118| @punctuation             | Captures punctuation                   |
119| @punctuation.bracket     | Captures brackets                      |
120| @punctuation.delimiter   | Captures delimiters                    |
121| @punctuation.list_marker | Captures list markers                  |
122| @punctuation.special     | Captures special punctuation           |
123| @string                  | Captures string literals               |
124| @string.escape           | Captures escaped characters in strings |
125| @string.regex            | Captures regular expressions           |
126| @string.special          | Captures special strings               |
127| @string.special.symbol   | Captures special symbols               |
128| @tag                     | Captures tags                          |
129| @tag.doctype             | Captures doctypes (e.g., in HTML)      |
130| @text.literal            | Captures literal text                  |
131| @title                   | Captures titles                        |
132| @type                    | Captures types                         |
133| @variable                | Captures variables                     |
134| @variable.special        | Captures special variables             |
135| @variant                 | Captures variants                      |
136
137### Bracket matching
138
139The `brackets.scm` file defines matching brackets.
140
141Here's an example from a `brackets.scm` file for JSON:
142
143```scheme
144("[" @open "]" @close)
145("{" @open "}" @close)
146("\"" @open "\"" @close)
147```
148
149This query identifies opening and closing brackets, braces, and quotation marks.
150
151| Capture | Description                                   |
152| ------- | --------------------------------------------- |
153| @open   | Captures opening brackets, braces, and quotes |
154| @close  | Captures closing brackets, braces, and quotes |
155
156### Code outline/structure
157
158The `outline.scm` file defines the structure for the code outline.
159
160Here's an example from an `outline.scm` file for JSON:
161
162```scheme
163(pair
164  key: (string (string_content) @name)) @item
165```
166
167This query captures object keys for the outline structure.
168
169| Capture        | Description                                                                          |
170| -------------- | ------------------------------------------------------------------------------------ |
171| @name          | Captures the content of object keys                                                  |
172| @item          | Captures the entire key-value pair                                                   |
173| @context       | Captures elements that provide context for the outline item                          |
174| @context.extra | Captures additional contextual information for the outline item                      |
175| @annotation    | Captures nodes that annotate outline item (doc comments, attributes, decorators)[^1] |
176
177[^1]: These annotations are used by Assistant when generating code modification steps.
178
179### Auto-indentation
180
181The `indents.scm` file defines indentation rules.
182
183Here's an example from an `indents.scm` file for JSON:
184
185```scheme
186(array "]" @end) @indent
187(object "}" @end) @indent
188```
189
190This query marks the end of arrays and objects for indentation purposes.
191
192| Capture | Description                                        |
193| ------- | -------------------------------------------------- |
194| @end    | Captures closing brackets and braces               |
195| @indent | Captures entire arrays and objects for indentation |
196
197### Code injections
198
199The `injections.scm` file defines rules for embedding one language within another, such as code blocks in Markdown or SQL queries in Python strings.
200
201Here's an example from an `injections.scm` file for Markdown:
202
203```scheme
204(fenced_code_block
205  (info_string
206    (language) @injection.language)
207  (code_fence_content) @injection.content)
208
209((inline) @content
210 (#set! injection.language "markdown-inline"))
211```
212
213This query identifies fenced code blocks, capturing the language specified in the info string and the content within the block. It also captures inline content and sets its language to "markdown-inline".
214
215| Capture             | Description                                                |
216| ------------------- | ---------------------------------------------------------- |
217| @injection.language | Captures the language identifier for a code block          |
218| @injection.content  | Captures the content to be treated as a different language |
219
220Note that we couldn't use JSON as an example here because it doesn't support language injections.
221
222### Syntax overrides
223
224The `overrides.scm` file defines syntactic _scopes_ that can be used to override certain editor settings within specific language constructs.
225
226For example, there is a language-specific setting called `word_characters` that controls which non-alphabetic characters are considered part of a word, for filtering autocomplete suggestions. In JavaScript, "$" and "#" are considered word characters. But when your cursor is within a _string_ in JavaScript, "-" is _also_ considered a word character. To achieve this, the JavaScript `overrides.scm` file contains the following pattern:
227
228```scheme
229[
230  (string)
231  (template_string)
232] @string
233```
234
235And the JavaScript `config.toml` contains this setting:
236
237```toml
238word_characters = ["#", "$"]
239
240[overrides.string]
241word_characters = ["-"]
242```
243
244You can also disable certain auto-closing brackets in a specific scope. For example, to prevent auto-closing `'` within strings, you could put the following in the JavaScript `config.toml`:
245
246```toml
247brackets = [
248  { start = "'", end = "'", close = true, newline = false, not_in = ["string"] },
249  # other pairs...
250]
251```
252
253#### Range inclusivity
254
255By default, the ranges defined in `overrides.scm` are _exclusive_. So in the case above, if you cursor was _outside_ the quotation marks delimiting the string, the `string` scope would not take effect. Sometimes, you may want to make the range _inclusive_. You can do this by adding the `.inclusive` suffix to the capture name in the query.
256
257For example, in JavaScript, we also disable auto-closing of single quotes within comments. And the comment scope must extend all the way to the newline after a line comment. To achieve this, the JavaScript `overrides.scm` contains the following pattern:
258
259```scheme
260(comment) @comment.inclusive
261```
262
263### Text objects
264
265The `textobjects.scm` file defines rules for navigating by text objects. This was added in Zed v0.165 and is currently used only in Vim mode.
266
267Vim provides two levels of granularity for navigating around files. Section-by-section with `[]` etc., and method-by-method with `]m` etc. Even languages that don't support functions and classes can work well by defining similar concepts. For example CSS defines a rule-set as a method, and a media-query as a class.
268
269For languages with closures, these typically should not count as functions in Zed. This is best-effort however, as languages like Javascript do not syntactically differentiate syntactically between closures and top-level function declarations.
270
271For languages with declarations like C, provide queries that match `@class.around` or `@function.around`. The `if` and `ic` text objects will default to these if there is no inside.
272
273If you are not sure what to put in textobjects.scm, both [nvim-treesitter-textobjects](https://github.com/nvim-treesitter/nvim-treesitter-textobjects), and the [Helix editor](https://github.com/helix-editor/helix) have queries for many languages. You can refer to the Zed [built-in languages](https://github.com/zed-industries/zed/tree/main/crates/languages/src) to see how to adapt these.
274
275| Capture          | Description                                                             | Vim mode                                         |
276| ---------------- | ----------------------------------------------------------------------- | ------------------------------------------------ |
277| @function.around | An entire function definition or equivalent small section of a file.    | `[m`, `]m`, `[M`,`]M` motions. `af` text object  |
278| @function.inside | The function body (the stuff within the braces).                        | `if` text object                                 |
279| @class.around    | An entire class definition or equivalent large section of a file.       | `[[`, `]]`, `[]`, `][` motions. `ac` text object |
280| @class.inside    | The contents of a class definition.                                     | `ic` text object                                 |
281| @comment.around  | An entire comment (e.g. all adjacent line comments, or a block comment) | `gc` text object                                 |
282| @comment.inside  | The contents of a comment                                               | `igc` text object (rarely supported)             |
283
284For example:
285
286```scheme
287; include only the content of the method in the function
288(method_definition
289    body: (_
290        "{"
291        (_)* @function.inside
292        "}")) @function.around
293
294; match function.around for declarations with no body
295(function_signature_item) @function.around
296
297; join all adjacent comments into one
298(comment)+ @comment.around
299```
300
301### Text redactions
302
303The `redactions.scm` file defines text redaction rules. When collaborating and sharing your screen, it makes sure that certain syntax nodes are rendered in a redacted mode to avoid them from leaking.
304
305Here's an example from a `redactions.scm` file for JSON:
306
307```scheme
308(pair value: (number) @redact)
309(pair value: (string) @redact)
310(array (number) @redact)
311(array (string) @redact)
312```
313
314This query marks number and string values in key-value pairs and arrays for redaction.
315
316| Capture | Description                    |
317| ------- | ------------------------------ |
318| @redact | Captures values to be redacted |
319
320### Runnable code detection
321
322The `runnables.scm` file defines rules for detecting runnable code.
323
324Here's an example from an `runnables.scm` file for JSON:
325
326```scheme
327(
328    (document
329        (object
330            (pair
331                key: (string
332                    (string_content) @_name
333                    (#eq? @_name "scripts")
334                )
335                value: (object
336                    (pair
337                        key: (string (string_content) @run @script)
338                    )
339                )
340            )
341        )
342    )
343    (#set! tag package-script)
344    (#set! tag composer-script)
345)
346```
347
348This query detects runnable scripts in package.json and composer.json files.
349
350The `@run` capture specifies where the run button should appear in the editor. Other captures, except those prefixed with an underscore, are exposed as environment variables with a prefix of `ZED_CUSTOM_$(capture_name)` when running the code.
351
352| Capture | Description                                            |
353| ------- | ------------------------------------------------------ |
354| @\_name | Captures the "scripts" key                             |
355| @run    | Captures the script name                               |
356| @script | Also captures the script name (for different purposes) |
357
358<!--
359TBD: `#set! tag`
360-->
361
362## Language Servers
363
364Zed uses the [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) to provide advanced language support.
365
366An extension may provide any number of language servers. To provide a language server from your extension, add an entry to your `extension.toml` with the name of your language server and the language(s) it applies to:
367
368```toml
369[language_servers.my-language]
370name = "My Language LSP"
371languages = ["My Language"]
372```
373
374Then, in the Rust code for your extension, implement the `language_server_command` method on your extension:
375
376```rust
377impl zed::Extension for MyExtension {
378    fn language_server_command(
379        &mut self,
380        language_server_id: &LanguageServerId,
381        worktree: &zed::Worktree,
382    ) -> Result<zed::Command> {
383        Ok(zed::Command {
384            command: get_path_to_language_server_executable()?,
385            args: get_args_for_language_server()?,
386            env: get_env_for_language_server()?,
387        })
388    }
389}
390```
391
392You can customize the handling of the language server using several optional methods in the `Extension` trait. For example, you can control how completions are styled using the `label_for_completion` method. For a complete list of methods, see the [API docs for the Zed extension API](https://docs.rs/zed_extension_api).