languages.md

  1# Language Extensions
  2
  3Language support in Zed has several components:
  4
  5- Language metadata and configuration
  6- Grammar
  7- Queries
  8- Language servers
  9
 10## Language Metadata
 11
 12Each language supported by Zed must be defined in a subdirectory inside the `languages` directory of your extension.
 13
 14This subdirectory must contain a file called `config.toml` file with the following structure:
 15
 16```toml
 17name = "My Language"
 18grammar = "my-language"
 19path_suffixes = ["myl"]
 20line_comments = ["# "]
 21```
 22
 23- `name` (required) is the human readable name that will show up in the Select Language dropdown.
 24- `grammar` (required) is the name of a grammar. Grammars are registered separately, described below.
 25- `path_suffixes` is an array of file suffixes that should be associated with this language. Unlike `file_types` in settings, this does not support glob patterns.
 26- `line_comments` is an array of strings that are used to identify line comments in the language. This is used for the `editor::ToggleComments` keybind: `{#kb editor::ToggleComments}` for toggling lines of code.
 27- `tab_size` defines the indentation/tab size used for this language (default is `4`).
 28- `hard_tabs` whether to indent with tabs (`true`) or spaces (`false`, the default).
 29- `first_line_pattern` is a regular expression, that in addition to `path_suffixes` (above) or `file_types` in settings can be used to match files which should use this language. For example Zed uses this to identify Shell Scripts by matching the [shebangs lines](https://github.com/zed-industries/zed/blob/main/crates/languages/src/bash/config.toml) in the first line of a script.
 30
 31<!--
 32TBD: Document `language_name/config.toml` keys
 33
 34- autoclose_before
 35- brackets (start, end, close, newline, not_in: ["comment", "string"])
 36- word_characters
 37- prettier_parser_name
 38- opt_into_language_servers
 39- code_fence_block_name
 40- scope_opt_in_language_servers
 41- increase_indent_pattern, decrease_indent_pattern
 42- collapsed_placeholder
 43- auto_indent_on_paste, auto_indent_using_last_non_empty_line
 44- overrides: `[overrides.element]`, `[overrides.string]`
 45-->
 46
 47## Grammar
 48
 49Zed uses the [Tree-sitter](https://tree-sitter.github.io) parsing library to provide built-in language-specific features. There are grammars available for many languages, and you can also [develop your own grammar](https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar). A growing list of Zed features are built using pattern matching over syntax trees with Tree-sitter queries. As mentioned above, every language that is defined in an extension must specify the name of a Tree-sitter grammar that is used for parsing. These grammars are then registered separately in extensions' `extension.toml` file, like this:
 50
 51```toml
 52[grammars.gleam]
 53repository = "https://github.com/gleam-lang/tree-sitter-gleam"
 54commit = "58b7cac8fc14c92b0677c542610d8738c373fa81"
 55```
 56
 57The `repository` field must specify a repository where the Tree-sitter grammar should be loaded from, and the `commit` field must contain the SHA of the Git commit to use. An extension can provide multiple grammars by referencing multiple tree-sitter repositories.
 58
 59## Tree-sitter Queries
 60
 61Zed uses the syntax tree produced by the [Tree-sitter](https://tree-sitter.github.io) query language to implement
 62several features:
 63
 64- Syntax highlighting
 65- Bracket matching
 66- Code outline/structure
 67- Auto-indentation
 68- Code injections
 69- Syntax overrides
 70- Text redactions
 71- Runnable code detection
 72
 73The following sections elaborate on how [Tree-sitter queries](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax) enable these
 74features in Zed, using [JSON syntax](https://www.json.org/json-en.html) as a guiding example.
 75
 76### Syntax highlighting
 77
 78In Tree-sitter, the `highlights.scm` file defines syntax highlighting rules for a particular syntax.
 79
 80Here's an example from a `highlights.scm` for JSON:
 81
 82```scheme
 83(string) @string
 84
 85(pair
 86  key: (string) @property.json_key)
 87
 88(number) @number
 89```
 90
 91This query marks strings, object keys, and numbers for highlighting. The following is a comprehensive list of captures supported by themes:
 92
 93| Capture                  | Description                            |
 94| ------------------------ | -------------------------------------- |
 95| @attribute               | Captures attributes                    |
 96| @boolean                 | Captures boolean values                |
 97| @comment                 | Captures comments                      |
 98| @comment.doc             | Captures documentation comments        |
 99| @constant                | Captures constants                     |
100| @constructor             | Captures constructors                  |
101| @embedded                | Captures embedded content              |
102| @emphasis                | Captures emphasized text               |
103| @emphasis.strong         | Captures strongly emphasized text      |
104| @enum                    | Captures enumerations                  |
105| @function                | Captures functions                     |
106| @hint                    | Captures hints                         |
107| @keyword                 | Captures keywords                      |
108| @label                   | Captures labels                        |
109| @link_text               | Captures link text                     |
110| @link_uri                | Captures link URIs                     |
111| @number                  | Captures numeric values                |
112| @operator                | Captures operators                     |
113| @predictive              | Captures predictive text               |
114| @preproc                 | Captures preprocessor directives       |
115| @primary                 | Captures primary elements              |
116| @property                | Captures properties                    |
117| @punctuation             | Captures punctuation                   |
118| @punctuation.bracket     | Captures brackets                      |
119| @punctuation.delimiter   | Captures delimiters                    |
120| @punctuation.list_marker | Captures list markers                  |
121| @punctuation.special     | Captures special punctuation           |
122| @string                  | Captures string literals               |
123| @string.escape           | Captures escaped characters in strings |
124| @string.regex            | Captures regular expressions           |
125| @string.special          | Captures special strings               |
126| @string.special.symbol   | Captures special symbols               |
127| @tag                     | Captures tags                          |
128| @tag.doctype             | Captures doctypes (e.g., in HTML)      |
129| @text.literal            | Captures literal text                  |
130| @title                   | Captures titles                        |
131| @type                    | Captures types                         |
132| @variable                | Captures variables                     |
133| @variable.special        | Captures special variables             |
134| @variant                 | Captures variants                      |
135
136### Bracket matching
137
138The `brackets.scm` file defines matching brackets.
139
140Here's an example from a `brackets.scm` file for JSON:
141
142```scheme
143("[" @open "]" @close)
144("{" @open "}" @close)
145("\"" @open "\"" @close)
146```
147
148This query identifies opening and closing brackets, braces, and quotation marks.
149
150| Capture | Description                                   |
151| ------- | --------------------------------------------- |
152| @open   | Captures opening brackets, braces, and quotes |
153| @close  | Captures closing brackets, braces, and quotes |
154
155### Code outline/structure
156
157The `outline.scm` file defines the structure for the code outline.
158
159Here's an example from an `outline.scm` file for JSON:
160
161```scheme
162(pair
163  key: (string (string_content) @name)) @item
164```
165
166This query captures object keys for the outline structure.
167
168| Capture        | Description                                                                          |
169| -------------- | ------------------------------------------------------------------------------------ |
170| @name          | Captures the content of object keys                                                  |
171| @item          | Captures the entire key-value pair                                                   |
172| @context       | Captures elements that provide context for the outline item                          |
173| @context.extra | Captures additional contextual information for the outline item                      |
174| @annotation    | Captures nodes that annotate outline item (doc comments, attributes, decorators)[^1] |
175
176[^1]: These annotations are used by Assistant when generating code modification steps.
177
178### Auto-indentation
179
180The `indents.scm` file defines indentation rules.
181
182Here's an example from an `indents.scm` file for JSON:
183
184```scheme
185(array "]" @end) @indent
186(object "}" @end) @indent
187```
188
189This query marks the end of arrays and objects for indentation purposes.
190
191| Capture | Description                                        |
192| ------- | -------------------------------------------------- |
193| @end    | Captures closing brackets and braces               |
194| @indent | Captures entire arrays and objects for indentation |
195
196### Code injections
197
198The `injections.scm` file defines rules for embedding one language within another, such as code blocks in Markdown or SQL queries in Python strings.
199
200Here's an example from an `injections.scm` file for Markdown:
201
202```scheme
203(fenced_code_block
204  (info_string
205    (language) @language)
206  (code_fence_content) @content)
207
208((inline) @content
209 (#set! "language" "markdown-inline"))
210```
211
212This query identifies fenced code blocks, capturing the language specified in the info string and the content within the block. It also captures inline content and sets its language to "markdown-inline".
213
214| Capture   | Description                                                |
215| --------- | ---------------------------------------------------------- |
216| @language | Captures the language identifier for a code block          |
217| @content  | Captures the content to be treated as a different language |
218
219Note that we couldn't use JSON as an example here because it doesn't support language injections.
220
221### Syntax overrides
222
223The `overrides.scm` file defines syntactic _scopes_ that can be used to override certain editor settings within specific language constructs.
224
225For example, there is a language-specific setting called `word_characters` that controls which non-alphabetic characters are considered part of a word, for filtering autocomplete suggestions. In JavaScript, "$" and "#" are considered word characters. But when your cursor is within a _string_ in JavaScript, "-" is _also_ considered a word character. To achieve this, the JavaScript `overrides.scm` file contains the following pattern:
226
227```scheme
228[
229  (string)
230  (template_string)
231] @string
232```
233
234And the JavaScript `config.toml` contains this setting:
235
236```toml
237word_characters = ["#", "$"]
238
239[overrides.string]
240word_characters = ["-"]
241```
242
243You can also disable certain auto-closing brackets in a specific scope. For example, to prevent auto-closing `'` within strings, you could put the following in the JavaScript `config.toml`:
244
245```toml
246brackets = [
247  { start = "'", end = "'", close = true, newline = false, not_in = ["string"] },
248  # other pairs...
249]
250```
251
252#### Range inclusivity
253
254By default, the ranges defined in `overrides.scm` are _exclusive_. So in the case above, if you cursor was _outside_ the quotation marks delimiting the string, the `string` scope would not take effect. Sometimes, you may want to make the range _inclusive_. You can do this by adding the `.inclusive` suffix to the capture name in the query.
255
256For example, in JavaScript, we also disable auto-closing of single quotes within comments. And the comment scope must extend all the way to the newline after a line comment. To achieve this, the JavaScript `overrides.scm` contains the following pattern:
257
258```scheme
259(comment) @comment.inclusive
260```
261
262### Text redactions
263
264The `redactions.scm` file defines text redaction rules. When collaborating and sharing your screen, it makes sure that certain syntax nodes are rendered in a redacted mode to avoid them from leaking.
265
266Here's an example from a `redactions.scm` file for JSON:
267
268```scheme
269(pair value: (number) @redact)
270(pair value: (string) @redact)
271(array (number) @redact)
272(array (string) @redact)
273```
274
275This query marks number and string values in key-value pairs and arrays for redaction.
276
277| Capture | Description                    |
278| ------- | ------------------------------ |
279| @redact | Captures values to be redacted |
280
281### Runnable code detection
282
283The `runnables.scm` file defines rules for detecting runnable code.
284
285Here's an example from an `runnables.scm` file for JSON:
286
287```scheme
288(
289    (document
290        (object
291            (pair
292                key: (string
293                    (string_content) @_name
294                    (#eq? @_name "scripts")
295                )
296                value: (object
297                    (pair
298                        key: (string (string_content) @run @script)
299                    )
300                )
301            )
302        )
303    )
304    (#set! tag package-script)
305    (#set! tag composer-script)
306)
307```
308
309This query detects runnable scripts in package.json and composer.json files.
310
311The `@run` capture specifies where the run button should appear in the editor. Other captures, except those prefixed with an underscore, are exposed as environment variables with a prefix of `ZED_CUSTOM_$(capture_name)` when running the code.
312
313| Capture | Description                                            |
314| ------- | ------------------------------------------------------ |
315| @\_name | Captures the "scripts" key                             |
316| @run    | Captures the script name                               |
317| @script | Also captures the script name (for different purposes) |
318
319<!--
320TBD: `#set! tag`
321-->
322
323## Language Servers
324
325Zed uses the [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) to provide advanced language support.
326
327An extension may provide any number of language servers. To provide a language server from your extension, add an entry to your `extension.toml` with the name of your language server and the language it applies to:
328
329```toml
330[language_servers.my-language]
331name = "My Language LSP"
332language = "My Language"
333```
334
335Then, in the Rust code for your extension, implement the `language_server_command` method on your extension:
336
337```rust
338impl zed::Extension for MyExtension {
339    fn language_server_command(
340        &mut self,
341        language_server_id: &LanguageServerId,
342        worktree: &zed::Worktree,
343    ) -> Result<zed::Command> {
344        Ok(zed::Command {
345            command: get_path_to_language_server_executable()?,
346            args: get_args_for_language_server()?,
347            env: get_env_for_language_server()?,
348        })
349    }
350}
351```
352
353You can customize the handling of the language server using several optional methods in the `Extension` trait. For example, you can control how completions are styled using the `label_for_completion` method. For a complete list of methods, see the [API docs for the Zed extension API](https://docs.rs/zed_extension_api).