Extract `language_core` and `grammars` crates from `language` (#52238)

Nathan Sobo , Agus Zubiaga , and Tom Houlé created

This extracts a `language_core` crate from the existing `language`
crate, and creates a `grammars` data crate. The goal is to separate
tree-sitter grammar infrastructure, language configuration, and LSP
adapter types from the heavier buffer/editor integration layer in
`language`.

## Motivation

The `language` crate pulls in `text`, `theme`, `settings`, `rpc`,
`task`, `fs`, `clock`, `sum_tree`, and `fuzzy` — all of which are needed
for buffer integration (`Buffer`, `SyntaxMap`, `Outline`,
`DiagnosticSet`) but not for grammar parsing or language configuration.
Extracting the core types lets downstream consumers depend on
`language_core` without pulling in the full integration stack.

## Dependency graph after extraction

```
language_core   ← gpui, lsp, tree-sitter, util, collections
grammars        ← language_core, rust_embed, tree-sitter-{rust,python,...}
language        ← language_core, text, theme, settings, rpc, task, fs, ...
languages       ← language, grammars
```

## What moved to `language_core`

- `Grammar`, `GrammarId`, and all query config/builder types
- `LanguageConfig`, `LanguageMatcher`, bracket/comment/indent config
types
- `HighlightMap`, `HighlightId` (theme-dependent free functions
`highlight_style` and `highlight_name` stay in `language`)
- `LanguageName`, `LanguageId`
- `LanguageQueries`, `QUERY_FILENAME_PREFIXES`
- `CodeLabel`, `CodeLabelBuilder`, `Symbol`
- `Diagnostic`, `DiagnosticSourceKind`
- `Toolchain`, `ToolchainScope`, `ToolchainList`, `ToolchainMetadata`
- `ManifestName`
- `SoftWrap`
- LSP data types: `BinaryStatus`, `ServerHealth`,
`LanguageServerStatusUpdate`, `PromptResponseContext`, `ToLspPosition`

## What stays in `language`

- `Buffer`, `BufferSnapshot`, `SyntaxMap`, `Outline`, `DiagnosticSet`,
`LanguageScope`
- `LspAdapter`, `CachedLspAdapter`, `LspAdapterDelegate` (reference
`Arc<Language>` and `WorktreeId`)
- `ToolchainLister`, `LanguageToolchainStore` (reference `task` and
`settings` types)
- `ManifestQuery`, `ManifestProvider`, `ManifestDelegate` (reference
`WorktreeId`)
- Parser/query cursor pools, `PLAIN_TEXT`, point conversion functions

## What the `grammars` crate provides

- Embedded `.scm` query files and `config.toml` files for all built-in
languages (via `rust_embed`)
- `load_queries(name)`, `load_config(name)`,
`load_config_for_feature(name, grammars_loaded)`, and `get_file(path)`
functions
- `native_grammars()` for tree-sitter grammar registration (behind
`load-grammars` feature)

## Pre-cleanup (also in this PR)

- Removed unused `Option<&Buffer>` from
`LspAdapter::process_diagnostics`
- Removed unused `&App` from `LspAdapter::retain_old_diagnostic`
- Removed `fs: &dyn Fs` from `ToolchainLister` trait methods
(`PythonToolchainProvider` captures `fs` at construction time instead)
- Moved `Diagnostic`/`DiagnosticSourceKind` out of `buffer.rs` into
their own module

## Backward compatibility

The `language` crate re-exports everything from `language_core`, so
existing `use language::Grammar` (etc.) continues to work unchanged. The
only downstream change required is importing `CodeLabelExt` where
`.fallback_for_completion()` is called on the now-foreign `CodeLabel`
type.

Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <agus@zed.dev>
Co-authored-by: Tom Houlé <tom@tomhoule.com>

Change summary

Cargo.lock                                                          |   62 
Cargo.toml                                                          |    4 
crates/debugger_ui/src/tests/inline_values.rs                       |   10 
crates/edit_prediction_cli/src/filter_languages.rs                  |    6 
crates/edit_prediction_context/src/edit_prediction_context_tests.rs |    5 
crates/editor/src/display_map.rs                                    |    9 
crates/editor/src/editor.rs                                         |    7 
crates/editor/src/signature_help.rs                                 |    3 
crates/grammars/Cargo.toml                                          |   60 
crates/grammars/LICENSE-GPL                                         |    1 
crates/grammars/src/bash/brackets.scm                               |    0 
crates/grammars/src/bash/config.toml                                |    0 
crates/grammars/src/bash/highlights.scm                             |    0 
crates/grammars/src/bash/indents.scm                                |    0 
crates/grammars/src/bash/injections.scm                             |    0 
crates/grammars/src/bash/overrides.scm                              |    0 
crates/grammars/src/bash/redactions.scm                             |    0 
crates/grammars/src/bash/runnables.scm                              |    0 
crates/grammars/src/bash/textobjects.scm                            |    0 
crates/grammars/src/c/brackets.scm                                  |    0 
crates/grammars/src/c/config.toml                                   |    0 
crates/grammars/src/c/highlights.scm                                |    0 
crates/grammars/src/c/imports.scm                                   |    0 
crates/grammars/src/c/indents.scm                                   |    0 
crates/grammars/src/c/injections.scm                                |    0 
crates/grammars/src/c/outline.scm                                   |    0 
crates/grammars/src/c/overrides.scm                                 |    0 
crates/grammars/src/c/runnables.scm                                 |    0 
crates/grammars/src/c/textobjects.scm                               |    0 
crates/grammars/src/cpp/brackets.scm                                |    0 
crates/grammars/src/cpp/config.toml                                 |    0 
crates/grammars/src/cpp/highlights.scm                              |    0 
crates/grammars/src/cpp/imports.scm                                 |    0 
crates/grammars/src/cpp/indents.scm                                 |    0 
crates/grammars/src/cpp/injections.scm                              |    0 
crates/grammars/src/cpp/outline.scm                                 |    0 
crates/grammars/src/cpp/overrides.scm                               |    0 
crates/grammars/src/cpp/semantic_token_rules.json                   |    0 
crates/grammars/src/cpp/textobjects.scm                             |    0 
crates/grammars/src/css/brackets.scm                                |    0 
crates/grammars/src/css/config.toml                                 |    0 
crates/grammars/src/css/highlights.scm                              |    0 
crates/grammars/src/css/indents.scm                                 |    0 
crates/grammars/src/css/injections.scm                              |    0 
crates/grammars/src/css/outline.scm                                 |    0 
crates/grammars/src/css/overrides.scm                               |    0 
crates/grammars/src/css/textobjects.scm                             |    0 
crates/grammars/src/diff/config.toml                                |    0 
crates/grammars/src/diff/highlights.scm                             |    0 
crates/grammars/src/diff/injections.scm                             |    0 
crates/grammars/src/gitcommit/config.toml                           |    0 
crates/grammars/src/gitcommit/highlights.scm                        |    0 
crates/grammars/src/gitcommit/injections.scm                        |    0 
crates/grammars/src/go/brackets.scm                                 |    0 
crates/grammars/src/go/config.toml                                  |    0 
crates/grammars/src/go/debugger.scm                                 |    0 
crates/grammars/src/go/highlights.scm                               |    0 
crates/grammars/src/go/imports.scm                                  |    0 
crates/grammars/src/go/indents.scm                                  |    0 
crates/grammars/src/go/injections.scm                               |    0 
crates/grammars/src/go/outline.scm                                  |    0 
crates/grammars/src/go/overrides.scm                                |    0 
crates/grammars/src/go/runnables.scm                                |    0 
crates/grammars/src/go/semantic_token_rules.json                    |    0 
crates/grammars/src/go/textobjects.scm                              |    0 
crates/grammars/src/gomod/config.toml                               |    0 
crates/grammars/src/gomod/highlights.scm                            |    0 
crates/grammars/src/gomod/injections.scm                            |    0 
crates/grammars/src/gomod/structure.scm                             |    0 
crates/grammars/src/gowork/config.toml                              |    0 
crates/grammars/src/gowork/highlights.scm                           |    0 
crates/grammars/src/gowork/injections.scm                           |    0 
crates/grammars/src/grammars.rs                                     |  108 
crates/grammars/src/javascript/brackets.scm                         |    0 
crates/grammars/src/javascript/config.toml                          |    0 
crates/grammars/src/javascript/debugger.scm                         |    0 
crates/grammars/src/javascript/highlights.scm                       |    0 
crates/grammars/src/javascript/imports.scm                          |    0 
crates/grammars/src/javascript/indents.scm                          |    0 
crates/grammars/src/javascript/injections.scm                       |    0 
crates/grammars/src/javascript/outline.scm                          |    0 
crates/grammars/src/javascript/overrides.scm                        |    0 
crates/grammars/src/javascript/runnables.scm                        |    0 
crates/grammars/src/javascript/textobjects.scm                      |    0 
crates/grammars/src/jsdoc/brackets.scm                              |    0 
crates/grammars/src/jsdoc/config.toml                               |    0 
crates/grammars/src/jsdoc/highlights.scm                            |    0 
crates/grammars/src/json/brackets.scm                               |    0 
crates/grammars/src/json/config.toml                                |    0 
crates/grammars/src/json/highlights.scm                             |    0 
crates/grammars/src/json/indents.scm                                |    0 
crates/grammars/src/json/outline.scm                                |    0 
crates/grammars/src/json/overrides.scm                              |    0 
crates/grammars/src/json/redactions.scm                             |    0 
crates/grammars/src/json/runnables.scm                              |    0 
crates/grammars/src/json/textobjects.scm                            |    0 
crates/grammars/src/jsonc/brackets.scm                              |    0 
crates/grammars/src/jsonc/config.toml                               |    0 
crates/grammars/src/jsonc/highlights.scm                            |    0 
crates/grammars/src/jsonc/indents.scm                               |    0 
crates/grammars/src/jsonc/injections.scm                            |    0 
crates/grammars/src/jsonc/outline.scm                               |    0 
crates/grammars/src/jsonc/overrides.scm                             |    0 
crates/grammars/src/jsonc/redactions.scm                            |    0 
crates/grammars/src/jsonc/textobjects.scm                           |    0 
crates/grammars/src/markdown-inline/config.toml                     |    0 
crates/grammars/src/markdown-inline/highlights.scm                  |    0 
crates/grammars/src/markdown-inline/injections.scm                  |    0 
crates/grammars/src/markdown/brackets.scm                           |    0 
crates/grammars/src/markdown/config.toml                            |    0 
crates/grammars/src/markdown/highlights.scm                         |    0 
crates/grammars/src/markdown/indents.scm                            |    0 
crates/grammars/src/markdown/injections.scm                         |    0 
crates/grammars/src/markdown/outline.scm                            |    0 
crates/grammars/src/markdown/textobjects.scm                        |    0 
crates/grammars/src/python/brackets.scm                             |    0 
crates/grammars/src/python/config.toml                              |    0 
crates/grammars/src/python/debugger.scm                             |    0 
crates/grammars/src/python/highlights.scm                           |    0 
crates/grammars/src/python/imports.scm                              |    0 
crates/grammars/src/python/indents.scm                              |    0 
crates/grammars/src/python/injections.scm                           |    0 
crates/grammars/src/python/outline.scm                              |    0 
crates/grammars/src/python/overrides.scm                            |    0 
crates/grammars/src/python/runnables.scm                            |    0 
crates/grammars/src/python/semantic_token_rules.json                |    0 
crates/grammars/src/python/textobjects.scm                          |    0 
crates/grammars/src/regex/brackets.scm                              |    0 
crates/grammars/src/regex/config.toml                               |    0 
crates/grammars/src/regex/highlights.scm                            |    0 
crates/grammars/src/rust/brackets.scm                               |    0 
crates/grammars/src/rust/config.toml                                |    0 
crates/grammars/src/rust/debugger.scm                               |    0 
crates/grammars/src/rust/highlights.scm                             |    0 
crates/grammars/src/rust/imports.scm                                |    0 
crates/grammars/src/rust/indents.scm                                |    0 
crates/grammars/src/rust/injections.scm                             |    0 
crates/grammars/src/rust/outline.scm                                |    0 
crates/grammars/src/rust/overrides.scm                              |    0 
crates/grammars/src/rust/runnables.scm                              |    0 
crates/grammars/src/rust/semantic_token_rules.json                  |    0 
crates/grammars/src/rust/textobjects.scm                            |    0 
crates/grammars/src/tsx/brackets.scm                                |    0 
crates/grammars/src/tsx/config.toml                                 |    0 
crates/grammars/src/tsx/debugger.scm                                |    0 
crates/grammars/src/tsx/highlights.scm                              |    0 
crates/grammars/src/tsx/imports.scm                                 |    0 
crates/grammars/src/tsx/indents.scm                                 |    0 
crates/grammars/src/tsx/injections.scm                              |    0 
crates/grammars/src/tsx/outline.scm                                 |    0 
crates/grammars/src/tsx/overrides.scm                               |    0 
crates/grammars/src/tsx/runnables.scm                               |    0 
crates/grammars/src/tsx/textobjects.scm                             |    0 
crates/grammars/src/typescript/brackets.scm                         |    0 
crates/grammars/src/typescript/config.toml                          |    0 
crates/grammars/src/typescript/debugger.scm                         |    0 
crates/grammars/src/typescript/highlights.scm                       |    0 
crates/grammars/src/typescript/imports.scm                          |    0 
crates/grammars/src/typescript/indents.scm                          |    0 
crates/grammars/src/typescript/injections.scm                       |    0 
crates/grammars/src/typescript/outline.scm                          |    0 
crates/grammars/src/typescript/overrides.scm                        |    0 
crates/grammars/src/typescript/runnables.scm                        |    0 
crates/grammars/src/typescript/textobjects.scm                      |    0 
crates/grammars/src/yaml/brackets.scm                               |    0 
crates/grammars/src/yaml/config.toml                                |    0 
crates/grammars/src/yaml/highlights.scm                             |    0 
crates/grammars/src/yaml/injections.scm                             |    0 
crates/grammars/src/yaml/outline.scm                                |    0 
crates/grammars/src/yaml/overrides.scm                              |    0 
crates/grammars/src/yaml/redactions.scm                             |    0 
crates/grammars/src/yaml/textobjects.scm                            |    0 
crates/grammars/src/zed-keybind-context/brackets.scm                |    0 
crates/grammars/src/zed-keybind-context/config.toml                 |    0 
crates/grammars/src/zed-keybind-context/highlights.scm              |    0 
crates/keymap_editor/src/keymap_editor.rs                           |    4 
crates/language/Cargo.toml                                          |    2 
crates/language/benches/highlight_map.rs                            |   10 
crates/language/src/buffer.rs                                       |   86 
crates/language/src/diagnostic.rs                                   |    1 
crates/language/src/highlight_map.rs                                |   98 
crates/language/src/language.rs                                     | 1312 
crates/language/src/language_registry.rs                            |  155 
crates/language/src/manifest.rs                                     |   39 
crates/language/src/syntax_map.rs                                   |    4 
crates/language/src/syntax_map/syntax_map_tests.rs                  |    2 
crates/language/src/toolchain.rs                                    |  126 
crates/language_core/Cargo.toml                                     |   29 
crates/language_core/LICENSE-GPL                                    |    1 
crates/language_core/src/code_label.rs                              |  122 
crates/language_core/src/diagnostic.rs                              |   76 
crates/language_core/src/grammar.rs                                 |  821 
crates/language_core/src/highlight_map.rs                           |   52 
crates/language_core/src/language_config.rs                         |  539 
crates/language_core/src/language_core.rs                           |   39 
crates/language_core/src/language_name.rs                           |  109 
crates/language_core/src/lsp_adapter.rs                             |   44 
crates/language_core/src/manifest.rs                                |   36 
crates/language_core/src/queries.rs                                 |   33 
crates/language_core/src/toolchain.rs                               |  124 
crates/language_tools/src/highlights_tree_view.rs                   |    7 
crates/languages/Cargo.toml                                         |   38 
crates/languages/src/c.rs                                           |    2 
crates/languages/src/cpp.rs                                         |    4 
crates/languages/src/go.rs                                          |    4 
crates/languages/src/lib.rs                                         |   87 
crates/languages/src/python.rs                                      |   20 
crates/languages/src/rust.rs                                        |   12 
crates/markdown/src/markdown.rs                                     |    4 
crates/markdown_preview/src/markdown_elements.rs                    |    3 
crates/markdown_preview/src/markdown_renderer.rs                    |    6 
crates/outline_panel/src/outline_panel.rs                           |    4 
crates/project/src/lsp_store.rs                                     |   39 
crates/project/src/project.rs                                       |    1 
crates/project/src/toolchain_store.rs                               |   22 
crates/project/tests/integration/project_tests.rs                   |    2 
crates/remote_server/src/headless_project.rs                        |    1 
crates/theme/src/styles/syntax.rs                                   |    7 
crates/vim/src/state.rs                                             |    5 
219 files changed, 2,475 insertions(+), 1,932 deletions(-)

Detailed changes

Cargo.lock 🔗

@@ -7884,6 +7884,35 @@ dependencies = [
  "zed-scap",
 ]
 
+[[package]]
+name = "grammars"
+version = "0.1.0"
+dependencies = [
+ "anyhow",
+ "language_core",
+ "rust-embed",
+ "toml 0.8.23",
+ "tree-sitter",
+ "tree-sitter-bash",
+ "tree-sitter-c",
+ "tree-sitter-cpp",
+ "tree-sitter-css",
+ "tree-sitter-diff",
+ "tree-sitter-gitcommit",
+ "tree-sitter-go",
+ "tree-sitter-gomod",
+ "tree-sitter-gowork",
+ "tree-sitter-jsdoc",
+ "tree-sitter-json",
+ "tree-sitter-md",
+ "tree-sitter-python",
+ "tree-sitter-regex",
+ "tree-sitter-rust",
+ "tree-sitter-typescript",
+ "tree-sitter-yaml",
+ "util",
+]
+
 [[package]]
 name = "grid"
 version = "0.18.0"
@@ -9345,6 +9374,7 @@ dependencies = [
  "imara-diff",
  "indoc",
  "itertools 0.14.0",
+ "language_core",
  "log",
  "lsp",
  "parking_lot",
@@ -9353,7 +9383,6 @@ dependencies = [
  "rand 0.9.2",
  "regex",
  "rpc",
- "schemars",
  "semver",
  "serde",
  "serde_json",
@@ -9388,6 +9417,25 @@ dependencies = [
  "ztracing",
 ]
 
+[[package]]
+name = "language_core"
+version = "0.1.0"
+dependencies = [
+ "anyhow",
+ "collections",
+ "gpui",
+ "log",
+ "lsp",
+ "parking_lot",
+ "regex",
+ "schemars",
+ "serde",
+ "serde_json",
+ "toml 0.8.23",
+ "tree-sitter",
+ "util",
+]
+
 [[package]]
 name = "language_extension"
 version = "0.1.0"
@@ -9580,9 +9628,11 @@ dependencies = [
  "async-trait",
  "chrono",
  "collections",
+ "fs",
  "futures 0.3.31",
  "globset",
  "gpui",
+ "grammars",
  "http_client",
  "itertools 0.14.0",
  "json_schema_store",
@@ -9602,7 +9652,6 @@ dependencies = [
  "project",
  "regex",
  "rope",
- "rust-embed",
  "semver",
  "serde",
  "serde_json",
@@ -9614,25 +9663,16 @@ dependencies = [
  "task",
  "terminal",
  "theme",
- "toml 0.8.23",
  "tree-sitter",
  "tree-sitter-bash",
  "tree-sitter-c",
  "tree-sitter-cpp",
  "tree-sitter-css",
- "tree-sitter-diff",
  "tree-sitter-gitcommit",
  "tree-sitter-go",
- "tree-sitter-gomod",
- "tree-sitter-gowork",
- "tree-sitter-jsdoc",
- "tree-sitter-json",
- "tree-sitter-md",
  "tree-sitter-python",
- "tree-sitter-regex",
  "tree-sitter-rust",
  "tree-sitter-typescript",
- "tree-sitter-yaml",
  "unindent",
  "url",
  "util",

Cargo.toml 🔗

@@ -87,6 +87,7 @@ members = [
     "crates/git_ui",
     "crates/go_to_line",
     "crates/google_ai",
+    "crates/grammars",
     "crates/gpui",
     "crates/gpui_linux",
     "crates/gpui_macos",
@@ -108,6 +109,7 @@ members = [
     "crates/json_schema_store",
     "crates/keymap_editor",
     "crates/language",
+    "crates/language_core",
     "crates/language_extension",
     "crates/language_model",
     "crates/language_models",
@@ -330,6 +332,7 @@ git_hosting_providers = { path = "crates/git_hosting_providers" }
 git_ui = { path = "crates/git_ui" }
 go_to_line = { path = "crates/go_to_line" }
 google_ai = { path = "crates/google_ai" }
+grammars = { path = "crates/grammars" }
 gpui = { path = "crates/gpui", default-features = false }
 gpui_linux = { path = "crates/gpui_linux", default-features = false }
 gpui_macos = { path = "crates/gpui_macos", default-features = false }
@@ -354,6 +357,7 @@ journal = { path = "crates/journal" }
 json_schema_store = { path = "crates/json_schema_store" }
 keymap_editor = { path = "crates/keymap_editor" }
 language = { path = "crates/language" }
+language_core = { path = "crates/language_core" }
 language_extension = { path = "crates/language_extension" }
 language_model = { path = "crates/language_model" }
 language_models = { path = "crates/language_models" }

crates/debugger_ui/src/tests/inline_values.rs 🔗

@@ -1826,7 +1826,7 @@ def process_data(untyped_param, typed_param: int, another_typed: str):
 }
 
 fn python_lang() -> Language {
-    let debug_variables_query = include_str!("../../../languages/src/python/debugger.scm");
+    let debug_variables_query = include_str!("../../../grammars/src/python/debugger.scm");
     Language::new(
         LanguageConfig {
             name: "Python".into(),
@@ -1843,7 +1843,7 @@ fn python_lang() -> Language {
 }
 
 fn go_lang() -> Arc<Language> {
-    let debug_variables_query = include_str!("../../../languages/src/go/debugger.scm");
+    let debug_variables_query = include_str!("../../../grammars/src/go/debugger.scm");
     Arc::new(
         Language::new(
             LanguageConfig {
@@ -2262,7 +2262,7 @@ fn main() {
 }
 
 fn javascript_lang() -> Arc<Language> {
-    let debug_variables_query = include_str!("../../../languages/src/javascript/debugger.scm");
+    let debug_variables_query = include_str!("../../../grammars/src/javascript/debugger.scm");
     Arc::new(
         Language::new(
             LanguageConfig {
@@ -2281,7 +2281,7 @@ fn javascript_lang() -> Arc<Language> {
 }
 
 fn typescript_lang() -> Arc<Language> {
-    let debug_variables_query = include_str!("../../../languages/src/typescript/debugger.scm");
+    let debug_variables_query = include_str!("../../../grammars/src/typescript/debugger.scm");
     Arc::new(
         Language::new(
             LanguageConfig {
@@ -2300,7 +2300,7 @@ fn typescript_lang() -> Arc<Language> {
 }
 
 fn tsx_lang() -> Arc<Language> {
-    let debug_variables_query = include_str!("../../../languages/src/tsx/debugger.scm");
+    let debug_variables_query = include_str!("../../../grammars/src/tsx/debugger.scm");
     Arc::new(
         Language::new(
             LanguageConfig {

crates/edit_prediction_cli/src/filter_languages.rs 🔗

@@ -13,7 +13,7 @@
 //!
 //! Language is detected based on file extension of the `cursor_path` field.
 //! The extension-to-language mapping is built from the embedded language
-//! config files in the `languages` crate.
+//! config files in the `grammars` crate.
 
 use anyhow::{Context as _, Result, bail};
 use clap::Args;
@@ -29,7 +29,7 @@ mod language_configs_embedded {
     use rust_embed::RustEmbed;
 
     #[derive(RustEmbed)]
-    #[folder = "../languages/src/"]
+    #[folder = "../grammars/src/"]
     #[include = "*/config.toml"]
     pub struct LanguageConfigs;
 }
@@ -123,7 +123,7 @@ fn build_extension_to_language_map() -> HashMap<String, String> {
 
 #[cfg(feature = "dynamic_prompts")]
 fn build_extension_to_language_map() -> HashMap<String, String> {
-    const LANGUAGES_SRC_DIR: &str = concat!(env!("CARGO_MANIFEST_DIR"), "/../languages/src");
+    const LANGUAGES_SRC_DIR: &str = concat!(env!("CARGO_MANIFEST_DIR"), "/../grammars/src");
 
     let mut map = HashMap::default();
 

crates/edit_prediction_context/src/edit_prediction_context_tests.rs 🔗

@@ -160,7 +160,7 @@ async fn test_edit_prediction_context(cx: &mut TestAppContext) {
 }
 
 #[gpui::test]
-fn test_assemble_excerpts(cx: &mut TestAppContext) {
+async fn test_assemble_excerpts(cx: &mut TestAppContext) {
     let table = [
         (
             indoc! {r#"
@@ -289,6 +289,9 @@ fn test_assemble_excerpts(cx: &mut TestAppContext) {
     for (input, expected_output) in table {
         let (input, ranges) = marked_text_ranges(&input, false);
         let buffer = cx.new(|cx| Buffer::local(input, cx).with_language(rust_lang(), cx));
+        buffer
+            .read_with(cx, |buffer, _| buffer.parsing_idle())
+            .await;
         buffer.read_with(cx, |buffer, _cx| {
             let ranges: Vec<(Range<Point>, usize)> = ranges
                 .into_iter()

crates/editor/src/display_map.rs 🔗

@@ -101,6 +101,7 @@ use language::{
     Point, Subscription as BufferSubscription,
     language_settings::{AllLanguageSettings, LanguageSettings},
 };
+
 use multi_buffer::{
     Anchor, AnchorRangeExt, ExcerptId, MultiBuffer, MultiBufferOffset, MultiBufferOffsetUtf16,
     MultiBufferPoint, MultiBufferRow, MultiBufferSnapshot, RowInfo, ToOffset, ToPoint,
@@ -1905,7 +1906,7 @@ impl DisplaySnapshot {
         .flat_map(|chunk| {
             let syntax_highlight_style = chunk
                 .syntax_highlight_id
-                .and_then(|id| id.style(&editor_style.syntax));
+                .and_then(|id| editor_style.syntax.get(id).cloned());
 
             let chunk_highlight = chunk.highlight_style.map(|chunk_highlight| {
                 HighlightStyle {
@@ -1999,7 +2000,8 @@ impl DisplaySnapshot {
 
             let syntax_style = chunk
                 .syntax_highlight_id
-                .and_then(|id| id.style(syntax_theme));
+                .and_then(|id| syntax_theme.get(id).cloned());
+
             let overlay_style = chunk.highlight_style;
 
             let combined = match (syntax_style, overlay_style) {
@@ -4015,7 +4017,8 @@ pub mod tests {
         for chunk in snapshot.chunks(rows, true, HighlightStyles::default()) {
             let syntax_color = chunk
                 .syntax_highlight_id
-                .and_then(|id| id.style(theme)?.color);
+                .and_then(|id| theme.get(id)?.color);
+
             let highlight_color = chunk.highlight_style.and_then(|style| style.color);
             if let Some((last_chunk, last_syntax_color, last_highlight_color)) = chunks.last_mut()
                 && syntax_color == *last_syntax_color

crates/editor/src/editor.rs 🔗

@@ -19160,7 +19160,7 @@ impl Editor {
                                 move |cx: &mut BlockContext| {
                                     let mut text_style = cx.editor_style.text.clone();
                                     if let Some(highlight_style) = old_highlight_id
-                                        .and_then(|h| h.style(&cx.editor_style.syntax))
+                                        .and_then(|h| cx.editor_style.syntax.get(h).cloned())
                                     {
                                         text_style = text_style.highlight(highlight_style);
                                     }
@@ -25039,7 +25039,8 @@ impl Editor {
         for chunk in chunks {
             let highlight = chunk
                 .syntax_highlight_id
-                .and_then(|id| id.name(&style.syntax));
+                .and_then(|id| style.syntax.get_capture_name(id));
+
             let mut chunk_lines = chunk.text.split('\n').peekable();
             while let Some(text) = chunk_lines.next() {
                 let mut merged_with_last_token = false;
@@ -28863,7 +28864,7 @@ pub fn styled_runs_for_code_label<'a>(
                     background_color: Some(local_player.selection),
                     ..Default::default()
                 }
-            } else if let Some(style) = highlight_id.style(syntax_theme) {
+            } else if let Some(style) = syntax_theme.get(*highlight_id).cloned() {
                 style
             } else {
                 return Default::default();

crates/editor/src/signature_help.rs 🔗

@@ -6,6 +6,7 @@ use gpui::{
     TextStyle, Window, combine_highlights,
 };
 use language::BufferSnapshot;
+
 use markdown::{Markdown, MarkdownElement};
 use multi_buffer::{Anchor, MultiBufferOffset, ToOffset};
 use settings::Settings;
@@ -236,7 +237,7 @@ impl Editor {
                                     .highlight_text(&text, 0..signature.label.len())
                                     .into_iter()
                                     .flat_map(|(range, highlight_id)| {
-                                        Some((range, highlight_id.style(cx.theme().syntax())?))
+                                        Some((range, *cx.theme().syntax().get(highlight_id)?))
                                     });
                                 signature.highlights =
                                     combine_highlights(signature.highlights.clone(), highlights)

crates/grammars/Cargo.toml 🔗

@@ -0,0 +1,60 @@
+[package]
+name = "grammars"
+version = "0.1.0"
+edition = "2024"
+publish = false
+
+[lints]
+workspace = true
+
+[lib]
+path = "src/grammars.rs"
+
+[dependencies]
+language_core.workspace = true
+rust-embed.workspace = true
+anyhow.workspace = true
+toml.workspace = true
+util.workspace = true
+
+tree-sitter = { workspace = true, optional = true }
+tree-sitter-bash = { workspace = true, optional = true }
+tree-sitter-c = { workspace = true, optional = true }
+tree-sitter-cpp = { workspace = true, optional = true }
+tree-sitter-css = { workspace = true, optional = true }
+tree-sitter-diff = { workspace = true, optional = true }
+tree-sitter-gitcommit = { workspace = true, optional = true }
+tree-sitter-go = { workspace = true, optional = true }
+tree-sitter-go-mod = { workspace = true, optional = true }
+tree-sitter-gowork = { workspace = true, optional = true }
+tree-sitter-jsdoc = { workspace = true, optional = true }
+tree-sitter-json = { workspace = true, optional = true }
+tree-sitter-md = { workspace = true, optional = true }
+tree-sitter-python = { workspace = true, optional = true }
+tree-sitter-regex = { workspace = true, optional = true }
+tree-sitter-rust = { workspace = true, optional = true }
+tree-sitter-typescript = { workspace = true, optional = true }
+tree-sitter-yaml = { workspace = true, optional = true }
+
+[features]
+load-grammars = [
+    "tree-sitter",
+    "tree-sitter-bash",
+    "tree-sitter-c",
+    "tree-sitter-cpp",
+    "tree-sitter-css",
+    "tree-sitter-diff",
+    "tree-sitter-gitcommit",
+    "tree-sitter-go",
+    "tree-sitter-go-mod",
+    "tree-sitter-gowork",
+    "tree-sitter-jsdoc",
+    "tree-sitter-json",
+    "tree-sitter-md",
+    "tree-sitter-python",
+    "tree-sitter-regex",
+    "tree-sitter-rust",
+    "tree-sitter-typescript",
+    "tree-sitter-yaml",
+]
+test-support = ["load-grammars"]

crates/grammars/src/grammars.rs 🔗

@@ -0,0 +1,108 @@
+use anyhow::Context as _;
+use language_core::{LanguageConfig, LanguageQueries, QUERY_FILENAME_PREFIXES};
+use rust_embed::RustEmbed;
+use util::asset_str;
+
+#[derive(RustEmbed)]
+#[folder = "src/"]
+#[exclude = "*.rs"]
+struct GrammarDir;
+
+/// Register all built-in native tree-sitter grammars with the provided registration function.
+///
+/// Each grammar is registered as a `(&str, tree_sitter_language::LanguageFn)` pair.
+/// This must be called before loading language configs/queries.
+#[cfg(feature = "load-grammars")]
+pub fn native_grammars() -> Vec<(&'static str, tree_sitter::Language)> {
+    vec![
+        ("bash", tree_sitter_bash::LANGUAGE.into()),
+        ("c", tree_sitter_c::LANGUAGE.into()),
+        ("cpp", tree_sitter_cpp::LANGUAGE.into()),
+        ("css", tree_sitter_css::LANGUAGE.into()),
+        ("diff", tree_sitter_diff::LANGUAGE.into()),
+        ("go", tree_sitter_go::LANGUAGE.into()),
+        ("gomod", tree_sitter_go_mod::LANGUAGE.into()),
+        ("gowork", tree_sitter_gowork::LANGUAGE.into()),
+        ("jsdoc", tree_sitter_jsdoc::LANGUAGE.into()),
+        ("json", tree_sitter_json::LANGUAGE.into()),
+        ("jsonc", tree_sitter_json::LANGUAGE.into()),
+        ("markdown", tree_sitter_md::LANGUAGE.into()),
+        ("markdown-inline", tree_sitter_md::INLINE_LANGUAGE.into()),
+        ("python", tree_sitter_python::LANGUAGE.into()),
+        ("regex", tree_sitter_regex::LANGUAGE.into()),
+        ("rust", tree_sitter_rust::LANGUAGE.into()),
+        ("tsx", tree_sitter_typescript::LANGUAGE_TSX.into()),
+        (
+            "typescript",
+            tree_sitter_typescript::LANGUAGE_TYPESCRIPT.into(),
+        ),
+        ("yaml", tree_sitter_yaml::LANGUAGE.into()),
+        ("gitcommit", tree_sitter_gitcommit::LANGUAGE.into()),
+    ]
+}
+
+/// Load and parse the `config.toml` for a given language name.
+pub fn load_config(name: &str) -> LanguageConfig {
+    let config_toml = String::from_utf8(
+        GrammarDir::get(&format!("{}/config.toml", name))
+            .unwrap_or_else(|| panic!("missing config for language {:?}", name))
+            .data
+            .to_vec(),
+    )
+    .unwrap();
+
+    let config: LanguageConfig = ::toml::from_str(&config_toml)
+        .with_context(|| format!("failed to load config.toml for language {name:?}"))
+        .unwrap();
+
+    config
+}
+
+/// Load and parse the `config.toml` for a given language name, stripping fields
+/// that require grammar support when grammars are not loaded.
+pub fn load_config_for_feature(name: &str, grammars_loaded: bool) -> LanguageConfig {
+    let config = load_config(name);
+
+    if grammars_loaded {
+        config
+    } else {
+        LanguageConfig {
+            name: config.name,
+            matcher: config.matcher,
+            jsx_tag_auto_close: config.jsx_tag_auto_close,
+            ..Default::default()
+        }
+    }
+}
+
+/// Get a raw embedded file by path (relative to `src/`).
+///
+/// Returns the file data as bytes, or `None` if the file does not exist.
+pub fn get_file(path: &str) -> Option<rust_embed::EmbeddedFile> {
+    GrammarDir::get(path)
+}
+
+/// Load all `.scm` query files for a given language name into a `LanguageQueries`.
+///
+/// Multiple `.scm` files with the same prefix (e.g. `highlights.scm` and
+/// `highlights_extra.scm`) are concatenated together with their contents appended.
+pub fn load_queries(name: &str) -> LanguageQueries {
+    let mut result = LanguageQueries::default();
+    for path in GrammarDir::iter() {
+        if let Some(remainder) = path.strip_prefix(name).and_then(|p| p.strip_prefix('/')) {
+            if !remainder.ends_with(".scm") {
+                continue;
+            }
+            for (prefix, query) in QUERY_FILENAME_PREFIXES {
+                if remainder.starts_with(prefix) {
+                    let contents = asset_str::<GrammarDir>(path.as_ref());
+                    match query(&mut result) {
+                        None => *query(&mut result) = Some(contents),
+                        Some(existing) => existing.to_mut().push_str(contents.as_ref()),
+                    }
+                }
+            }
+        }
+    }
+    result
+}

crates/keymap_editor/src/keymap_editor.rs 🔗

@@ -24,6 +24,7 @@ use gpui::{
     actions, anchored, deferred, div,
 };
 use language::{Language, LanguageConfig, ToOffset as _};
+
 use notifications::status_toast::{StatusToast, ToastIcon};
 use project::{CompletionDisplayOptions, Project};
 use settings::{
@@ -2405,9 +2406,10 @@ impl RenderOnce for SyntaxHighlightedText {
             }
 
             let mut run_style = text_style.clone();
-            if let Some(highlight_style) = highlight_id.style(syntax_theme) {
+            if let Some(highlight_style) = syntax_theme.get(highlight_id).cloned() {
                 run_style = run_style.highlight(highlight_style);
             }
+
             // add the highlighted range
             runs.push(run_style.to_run(highlight_range.len()));
             offset = highlight_range.end;

crates/language/Cargo.toml 🔗

@@ -40,6 +40,7 @@ globset.workspace = true
 gpui.workspace = true
 http_client.workspace = true
 imara-diff.workspace = true
+language_core.workspace = true
 itertools.workspace = true
 log.workspace = true
 lsp.workspace = true
@@ -48,7 +49,6 @@ postage.workspace = true
 rand = { workspace = true, optional = true }
 regex.workspace = true
 rpc.workspace = true
-schemars.workspace = true
 semver.workspace = true
 serde.workspace = true
 serde_json.workspace = true

crates/language/benches/highlight_map.rs 🔗

@@ -1,6 +1,6 @@
 use criterion::{BenchmarkId, Criterion, black_box, criterion_group, criterion_main};
 use gpui::rgba;
-use language::HighlightMap;
+use language::build_highlight_map;
 use theme::SyntaxTheme;
 
 fn syntax_theme(highlight_names: &[&str]) -> SyntaxTheme {
@@ -115,8 +115,8 @@ static LARGE_CAPTURE_NAMES: &[&str] = &[
     "variable.parameter",
 ];
 
-fn bench_highlight_map_new(c: &mut Criterion) {
-    let mut group = c.benchmark_group("HighlightMap::new");
+fn bench_build_highlight_map(c: &mut Criterion) {
+    let mut group = c.benchmark_group("build_highlight_map");
 
     for (capture_label, capture_names) in [
         ("small_captures", SMALL_CAPTURE_NAMES as &[&str]),
@@ -131,7 +131,7 @@ fn bench_highlight_map_new(c: &mut Criterion) {
                 BenchmarkId::new(capture_label, theme_label),
                 &(capture_names, &theme),
                 |b, (capture_names, theme)| {
-                    b.iter(|| HighlightMap::new(black_box(capture_names), black_box(theme)));
+                    b.iter(|| build_highlight_map(black_box(capture_names), black_box(theme)));
                 },
             );
         }
@@ -140,5 +140,5 @@ fn bench_highlight_map_new(c: &mut Criterion) {
     group.finish();
 }
 
-criterion_group!(benches, bench_highlight_map_new);
+criterion_group!(benches, bench_build_highlight_map);
 criterion_main!(benches);

crates/language/src/buffer.rs 🔗

@@ -16,11 +16,10 @@ use crate::{
     unified_diff_with_offsets,
 };
 pub use crate::{
-    Grammar, Language, LanguageRegistry,
-    diagnostic_set::DiagnosticSet,
-    highlight_map::{HighlightId, HighlightMap},
+    Grammar, HighlightId, HighlightMap, Language, LanguageRegistry, diagnostic_set::DiagnosticSet,
     proto,
 };
+
 use anyhow::{Context as _, Result};
 use clock::Lamport;
 pub use clock::ReplicaId;
@@ -33,10 +32,8 @@ use gpui::{
     Task, TextStyle,
 };
 
-use lsp::{LanguageServerId, NumberOrString};
+use lsp::LanguageServerId;
 use parking_lot::Mutex;
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
 use settings::WorktreeId;
 use smallvec::SmallVec;
 use smol::future::yield_now;
@@ -252,57 +249,6 @@ struct SelectionSet {
     lamport_timestamp: clock::Lamport,
 }
 
-/// A diagnostic associated with a certain range of a buffer.
-#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
-pub struct Diagnostic {
-    /// The name of the service that produced this diagnostic.
-    pub source: Option<String>,
-    /// The ID provided by the dynamic registration that produced this diagnostic.
-    pub registration_id: Option<SharedString>,
-    /// A machine-readable code that identifies this diagnostic.
-    pub code: Option<NumberOrString>,
-    pub code_description: Option<lsp::Uri>,
-    /// Whether this diagnostic is a hint, warning, or error.
-    pub severity: DiagnosticSeverity,
-    /// The human-readable message associated with this diagnostic.
-    pub message: String,
-    /// The human-readable message (in markdown format)
-    pub markdown: Option<String>,
-    /// An id that identifies the group to which this diagnostic belongs.
-    ///
-    /// When a language server produces a diagnostic with
-    /// one or more associated diagnostics, those diagnostics are all
-    /// assigned a single group ID.
-    pub group_id: usize,
-    /// Whether this diagnostic is the primary diagnostic for its group.
-    ///
-    /// In a given group, the primary diagnostic is the top-level diagnostic
-    /// returned by the language server. The non-primary diagnostics are the
-    /// associated diagnostics.
-    pub is_primary: bool,
-    /// Whether this diagnostic is considered to originate from an analysis of
-    /// files on disk, as opposed to any unsaved buffer contents. This is a
-    /// property of a given diagnostic source, and is configured for a given
-    /// language server via the [`LspAdapter::disk_based_diagnostic_sources`](crate::LspAdapter::disk_based_diagnostic_sources) method
-    /// for the language server.
-    pub is_disk_based: bool,
-    /// Whether this diagnostic marks unnecessary code.
-    pub is_unnecessary: bool,
-    /// Quick separation of diagnostics groups based by their source.
-    pub source_kind: DiagnosticSourceKind,
-    /// Data from language server that produced this diagnostic. Passed back to the LS when we request code actions for this diagnostic.
-    pub data: Option<Value>,
-    /// Whether to underline the corresponding text range in the editor.
-    pub underline: bool,
-}
-
-#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
-pub enum DiagnosticSourceKind {
-    Pulled,
-    Pushed,
-    Other,
-}
-
 /// An operation used to synchronize this buffer with its other replicas.
 #[derive(Clone, Debug, PartialEq)]
 pub enum Operation {
@@ -749,7 +695,7 @@ impl HighlightedTextBuilder {
 
             if let Some(highlight_style) = chunk
                 .syntax_highlight_id
-                .and_then(|id| id.style(syntax_theme))
+                .and_then(|id| syntax_theme.get(id).cloned())
             {
                 let highlight_style = override_style.map_or(highlight_style, |override_style| {
                     highlight_style.highlight(override_style)
@@ -4551,7 +4497,8 @@ impl BufferSnapshot {
                 let style = chunk
                     .syntax_highlight_id
                     .zip(theme)
-                    .and_then(|(highlight, theme)| highlight.style(theme));
+                    .and_then(|(highlight, theme)| theme.get(highlight).cloned());
+
                 if let Some(style) = style {
                     let start = text.len();
                     let end = start + chunk.text.len();
@@ -5836,27 +5783,6 @@ impl operation_queue::Operation for Operation {
     }
 }
 
-impl Default for Diagnostic {
-    fn default() -> Self {
-        Self {
-            source: Default::default(),
-            source_kind: DiagnosticSourceKind::Other,
-            code: None,
-            code_description: None,
-            severity: DiagnosticSeverity::ERROR,
-            message: Default::default(),
-            markdown: None,
-            group_id: 0,
-            is_primary: false,
-            is_disk_based: false,
-            is_unnecessary: false,
-            underline: true,
-            data: None,
-            registration_id: None,
-        }
-    }
-}
-
 impl IndentSize {
     /// Returns an [`IndentSize`] representing the given spaces.
     pub fn spaces(len: u32) -> Self {

crates/language/src/highlight_map.rs 🔗

@@ -1,98 +0,0 @@
-use gpui::HighlightStyle;
-use std::sync::Arc;
-use theme::SyntaxTheme;
-
-#[derive(Clone, Debug)]
-pub struct HighlightMap(Arc<[HighlightId]>);
-
-#[derive(Clone, Copy, Debug, PartialEq, Eq)]
-pub struct HighlightId(pub u32);
-
-const DEFAULT_SYNTAX_HIGHLIGHT_ID: HighlightId = HighlightId(u32::MAX);
-
-impl HighlightMap {
-    pub fn new(capture_names: &[&str], theme: &SyntaxTheme) -> Self {
-        // For each capture name in the highlight query, find the longest
-        // key in the theme's syntax styles that matches all of the
-        // dot-separated components of the capture name.
-        HighlightMap(
-            capture_names
-                .iter()
-                .map(|capture_name| {
-                    theme
-                        .highlight_id(capture_name)
-                        .map_or(DEFAULT_SYNTAX_HIGHLIGHT_ID, HighlightId)
-                })
-                .collect(),
-        )
-    }
-
-    pub fn get(&self, capture_id: u32) -> HighlightId {
-        self.0
-            .get(capture_id as usize)
-            .copied()
-            .unwrap_or(DEFAULT_SYNTAX_HIGHLIGHT_ID)
-    }
-}
-
-impl HighlightId {
-    pub const TABSTOP_INSERT_ID: HighlightId = HighlightId(u32::MAX - 1);
-    pub const TABSTOP_REPLACE_ID: HighlightId = HighlightId(u32::MAX - 2);
-
-    pub(crate) fn is_default(&self) -> bool {
-        *self == DEFAULT_SYNTAX_HIGHLIGHT_ID
-    }
-
-    pub fn style(&self, theme: &SyntaxTheme) -> Option<HighlightStyle> {
-        theme.get(self.0 as usize).cloned()
-    }
-
-    pub fn name<'a>(&self, theme: &'a SyntaxTheme) -> Option<&'a str> {
-        theme.get_capture_name(self.0 as usize)
-    }
-}
-
-impl Default for HighlightMap {
-    fn default() -> Self {
-        Self(Arc::new([]))
-    }
-}
-
-impl Default for HighlightId {
-    fn default() -> Self {
-        DEFAULT_SYNTAX_HIGHLIGHT_ID
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use gpui::rgba;
-
-    #[test]
-    fn test_highlight_map() {
-        let theme = SyntaxTheme::new(
-            [
-                ("function", rgba(0x100000ff)),
-                ("function.method", rgba(0x200000ff)),
-                ("function.async", rgba(0x300000ff)),
-                ("variable.builtin.self.rust", rgba(0x400000ff)),
-                ("variable.builtin", rgba(0x500000ff)),
-                ("variable", rgba(0x600000ff)),
-            ]
-            .iter()
-            .map(|(name, color)| (name.to_string(), (*color).into())),
-        );
-
-        let capture_names = &[
-            "function.special",
-            "function.async.rust",
-            "variable.builtin.self",
-        ];
-
-        let map = HighlightMap::new(capture_names, &theme);
-        assert_eq!(map.get(0).name(&theme), Some("function"));
-        assert_eq!(map.get(1).name(&theme), Some("function.async"));
-        assert_eq!(map.get(2).name(&theme), Some("variable.builtin"));
-    }
-}

crates/language/src/language.rs 🔗

@@ -7,9 +7,10 @@
 //!
 //! Notably we do *not* assign a single language to a single file; in real world a single file can consist of multiple programming languages - HTML is a good example of that - and `language` crate tends to reflect that status quo in its API.
 mod buffer;
+mod diagnostic;
 mod diagnostic_set;
-mod highlight_map;
 mod language_registry;
+
 pub mod language_settings;
 mod manifest;
 pub mod modeline;
@@ -23,17 +24,30 @@ mod toolchain;
 #[cfg(test)]
 pub mod buffer_tests;
 
-use crate::language_settings::SoftWrap;
 pub use crate::language_settings::{AutoIndentMode, EditPredictionsMode, IndentGuideSettings};
 use anyhow::{Context as _, Result};
 use async_trait::async_trait;
-use collections::{HashMap, HashSet, IndexSet};
+use collections::{HashMap, HashSet};
 use futures::Future;
 use futures::future::LocalBoxFuture;
 use futures::lock::OwnedMutexGuard;
-use gpui::{App, AsyncApp, Entity, SharedString};
-pub use highlight_map::HighlightMap;
+use gpui::{App, AsyncApp, Entity};
 use http_client::HttpClient;
+
+pub use language_core::highlight_map::{HighlightId, HighlightMap};
+
+pub use language_core::{
+    BlockCommentConfig, BracketPair, BracketPairConfig, BracketPairContent, BracketsConfig,
+    BracketsPatternConfig, CodeLabel, CodeLabelBuilder, DebugVariablesConfig, DebuggerTextObject,
+    DecreaseIndentConfig, Grammar, GrammarId, HighlightsConfig, ImportsConfig, IndentConfig,
+    InjectionConfig, InjectionPatternConfig, JsxTagAutoCloseConfig, LanguageConfig,
+    LanguageConfigOverride, LanguageId, LanguageMatcher, OrderedListConfig, OutlineConfig,
+    Override, OverrideConfig, OverrideEntry, PromptResponseContext, RedactionConfig,
+    RunnableCapture, RunnableConfig, SoftWrap, Symbol, TaskListConfig, TextObject,
+    TextObjectConfig, ToLspPosition, WrapCharactersConfig,
+    auto_indent_using_last_non_empty_line_default, deserialize_regex, deserialize_regex_vec,
+    regex_json_schema, regex_vec_json_schema, serialize_regex,
+};
 pub use language_registry::{
     LanguageName, LanguageServerStatusUpdate, LoadedLanguage, ServerHealth,
 };
@@ -44,13 +58,10 @@ pub use manifest::{ManifestDelegate, ManifestName, ManifestProvider, ManifestQue
 pub use modeline::{ModelineSettings, parse_modeline};
 use parking_lot::Mutex;
 use regex::Regex;
-use schemars::{JsonSchema, SchemaGenerator, json_schema};
 use semver::Version;
-use serde::{Deserialize, Deserializer, Serialize, Serializer, de};
 use serde_json::Value;
 use settings::WorktreeId;
 use smol::future::FutureExt as _;
-use std::num::NonZeroU32;
 use std::{
     ffi::OsStr,
     fmt::Debug,
@@ -59,10 +70,7 @@ use std::{
     ops::{DerefMut, Range},
     path::{Path, PathBuf},
     str,
-    sync::{
-        Arc, LazyLock,
-        atomic::{AtomicUsize, Ordering::SeqCst},
-    },
+    sync::{Arc, LazyLock},
 };
 use syntax_map::{QueryCursorHandle, SyntaxSnapshot};
 use task::RunnableTag;
@@ -77,12 +85,12 @@ pub use toolchain::{
     LanguageToolchainStore, LocalLanguageToolchainStore, Toolchain, ToolchainList, ToolchainLister,
     ToolchainMetadata, ToolchainScope,
 };
-use tree_sitter::{self, Query, QueryCursor, WasmStore, wasmtime};
+use tree_sitter::{self, QueryCursor, WasmStore, wasmtime};
 use util::rel_path::RelPath;
-use util::serde::default_true;
 
 pub use buffer::Operation;
 pub use buffer::*;
+pub use diagnostic::{Diagnostic, DiagnosticSourceKind};
 pub use diagnostic_set::{DiagnosticEntry, DiagnosticEntryRef, DiagnosticGroup};
 pub use language_registry::{
     AvailableLanguage, BinaryStatus, LanguageNotFound, LanguageQueries, LanguageRegistry,
@@ -96,6 +104,16 @@ pub use syntax_map::{
 pub use text::{AnchorRangeExt, LineEnding};
 pub use tree_sitter::{Node, Parser, Tree, TreeCursor};
 
+pub(crate) fn to_settings_soft_wrap(value: language_core::SoftWrap) -> settings::SoftWrap {
+    match value {
+        language_core::SoftWrap::None => settings::SoftWrap::None,
+        language_core::SoftWrap::PreferLine => settings::SoftWrap::PreferLine,
+        language_core::SoftWrap::EditorWidth => settings::SoftWrap::EditorWidth,
+        language_core::SoftWrap::PreferredLineLength => settings::SoftWrap::PreferredLineLength,
+        language_core::SoftWrap::Bounded => settings::SoftWrap::Bounded,
+    }
+}
+
 static QUERY_CURSORS: Mutex<Vec<QueryCursor>> = Mutex::new(vec![]);
 static PARSERS: Mutex<Vec<Parser>> = Mutex::new(vec![]);
 
@@ -125,8 +143,6 @@ where
     func(cursor.deref_mut())
 }
 
-static NEXT_LANGUAGE_ID: AtomicUsize = AtomicUsize::new(0);
-static NEXT_GRAMMAR_ID: AtomicUsize = AtomicUsize::new(0);
 static WASM_ENGINE: LazyLock<wasmtime::Engine> = LazyLock::new(|| {
     wasmtime::Engine::new(&wasmtime::Config::new()).expect("Failed to create Wasmtime engine")
 });
@@ -188,26 +204,12 @@ pub static PLAIN_TEXT: LazyLock<Arc<Language>> = LazyLock::new(|| {
     ))
 });
 
-/// Types that represent a position in a buffer, and can be converted into
-/// an LSP position, to send to a language server.
-pub trait ToLspPosition {
-    /// Converts the value into an LSP position.
-    fn to_lsp_position(self) -> lsp::Position;
-}
-
 #[derive(Debug, Clone, PartialEq, Eq, Hash)]
 pub struct Location {
     pub buffer: Entity<Buffer>,
     pub range: Range<Anchor>,
 }
 
-#[derive(Debug, Clone)]
-pub struct Symbol {
-    pub name: String,
-    pub kind: lsp::SymbolKind,
-    pub container_name: Option<String>,
-}
-
 type ServerBinaryCache = futures::lock::Mutex<Option<(bool, LanguageServerBinary)>>;
 type DownloadableLanguageServerBinary = LocalBoxFuture<'static, Result<LanguageServerBinary>>;
 pub type LanguageServerBinaryLocations = LocalBoxFuture<
@@ -292,14 +294,12 @@ impl CachedLspAdapter {
         &self,
         params: &mut lsp::PublishDiagnosticsParams,
         server_id: LanguageServerId,
-        existing_diagnostics: Option<&'_ Buffer>,
     ) {
-        self.adapter
-            .process_diagnostics(params, server_id, existing_diagnostics)
+        self.adapter.process_diagnostics(params, server_id)
     }
 
-    pub fn retain_old_diagnostic(&self, previous_diagnostic: &Diagnostic, cx: &App) -> bool {
-        self.adapter.retain_old_diagnostic(previous_diagnostic, cx)
+    pub fn retain_old_diagnostic(&self, previous_diagnostic: &Diagnostic) -> bool {
+        self.adapter.retain_old_diagnostic(previous_diagnostic)
     }
 
     pub fn underline_diagnostic(&self, diagnostic: &lsp::Diagnostic) -> bool {
@@ -397,31 +397,14 @@ pub trait LspAdapterDelegate: Send + Sync {
     async fn try_exec(&self, binary: LanguageServerBinary) -> Result<()>;
 }
 
-/// Context provided to LSP adapters when a user responds to a ShowMessageRequest prompt.
-/// This allows adapters to intercept preference selections (like "Always" or "Never")
-/// and potentially persist them to Zed's settings.
-#[derive(Debug, Clone)]
-pub struct PromptResponseContext {
-    /// The original message shown to the user
-    pub message: String,
-    /// The action (button) the user selected
-    pub selected_action: lsp::MessageActionItem,
-}
-
 #[async_trait(?Send)]
 pub trait LspAdapter: 'static + Send + Sync + DynLspInstaller {
     fn name(&self) -> LanguageServerName;
 
-    fn process_diagnostics(
-        &self,
-        _: &mut lsp::PublishDiagnosticsParams,
-        _: LanguageServerId,
-        _: Option<&'_ Buffer>,
-    ) {
-    }
+    fn process_diagnostics(&self, _: &mut lsp::PublishDiagnosticsParams, _: LanguageServerId) {}
 
     /// When processing new `lsp::PublishDiagnosticsParams` diagnostics, whether to retain previous one(s) or not.
-    fn retain_old_diagnostic(&self, _previous_diagnostic: &Diagnostic, _cx: &App) -> bool {
+    fn retain_old_diagnostic(&self, _previous_diagnostic: &Diagnostic) -> bool {
         false
     }
 
@@ -812,300 +795,6 @@ where
     }
 }
 
-#[derive(Clone, Debug, Default, PartialEq, Eq)]
-pub struct CodeLabel {
-    /// The text to display.
-    pub text: String,
-    /// Syntax highlighting runs.
-    pub runs: Vec<(Range<usize>, HighlightId)>,
-    /// The portion of the text that should be used in fuzzy filtering.
-    pub filter_range: Range<usize>,
-}
-
-#[derive(Clone, Debug, Default, PartialEq, Eq)]
-pub struct CodeLabelBuilder {
-    /// The text to display.
-    text: String,
-    /// Syntax highlighting runs.
-    runs: Vec<(Range<usize>, HighlightId)>,
-    /// The portion of the text that should be used in fuzzy filtering.
-    filter_range: Range<usize>,
-}
-
-#[derive(Clone, Deserialize, JsonSchema, Debug)]
-pub struct LanguageConfig {
-    /// Human-readable name of the language.
-    pub name: LanguageName,
-    /// The name of this language for a Markdown code fence block
-    pub code_fence_block_name: Option<Arc<str>>,
-    /// Alternative language names that Jupyter kernels may report for this language.
-    /// Used when a kernel's `language` field differs from Zed's language name.
-    /// For example, the Nu extension would set this to `["nushell"]`.
-    #[serde(default)]
-    pub kernel_language_names: Vec<Arc<str>>,
-    // The name of the grammar in a WASM bundle (experimental).
-    pub grammar: Option<Arc<str>>,
-    /// The criteria for matching this language to a given file.
-    #[serde(flatten)]
-    pub matcher: LanguageMatcher,
-    /// List of bracket types in a language.
-    #[serde(default)]
-    pub brackets: BracketPairConfig,
-    /// If set to true, auto indentation uses last non empty line to determine
-    /// the indentation level for a new line.
-    #[serde(default = "auto_indent_using_last_non_empty_line_default")]
-    pub auto_indent_using_last_non_empty_line: bool,
-    // Whether indentation of pasted content should be adjusted based on the context.
-    #[serde(default)]
-    pub auto_indent_on_paste: Option<bool>,
-    /// A regex that is used to determine whether the indentation level should be
-    /// increased in the following line.
-    #[serde(default, deserialize_with = "deserialize_regex")]
-    #[schemars(schema_with = "regex_json_schema")]
-    pub increase_indent_pattern: Option<Regex>,
-    /// A regex that is used to determine whether the indentation level should be
-    /// decreased in the following line.
-    #[serde(default, deserialize_with = "deserialize_regex")]
-    #[schemars(schema_with = "regex_json_schema")]
-    pub decrease_indent_pattern: Option<Regex>,
-    /// A list of rules for decreasing indentation. Each rule pairs a regex with a set of valid
-    /// "block-starting" tokens. When a line matches a pattern, its indentation is aligned with
-    /// the most recent line that began with a corresponding token. This enables context-aware
-    /// outdenting, like aligning an `else` with its `if`.
-    #[serde(default)]
-    pub decrease_indent_patterns: Vec<DecreaseIndentConfig>,
-    /// A list of characters that trigger the automatic insertion of a closing
-    /// bracket when they immediately precede the point where an opening
-    /// bracket is inserted.
-    #[serde(default)]
-    pub autoclose_before: String,
-    /// A placeholder used internally by Semantic Index.
-    #[serde(default)]
-    pub collapsed_placeholder: String,
-    /// A line comment string that is inserted in e.g. `toggle comments` action.
-    /// A language can have multiple flavours of line comments. All of the provided line comments are
-    /// used for comment continuations on the next line, but only the first one is used for Editor::ToggleComments.
-    #[serde(default)]
-    pub line_comments: Vec<Arc<str>>,
-    /// Delimiters and configuration for recognizing and formatting block comments.
-    #[serde(default)]
-    pub block_comment: Option<BlockCommentConfig>,
-    /// Delimiters and configuration for recognizing and formatting documentation comments.
-    #[serde(default, alias = "documentation")]
-    pub documentation_comment: Option<BlockCommentConfig>,
-    /// List markers that are inserted unchanged on newline (e.g., `- `, `* `, `+ `).
-    #[serde(default)]
-    pub unordered_list: Vec<Arc<str>>,
-    /// Configuration for ordered lists with auto-incrementing numbers on newline (e.g., `1. ` becomes `2. `).
-    #[serde(default)]
-    pub ordered_list: Vec<OrderedListConfig>,
-    /// Configuration for task lists where multiple markers map to a single continuation prefix (e.g., `- [x] ` continues as `- [ ] `).
-    #[serde(default)]
-    pub task_list: Option<TaskListConfig>,
-    /// A list of additional regex patterns that should be treated as prefixes
-    /// for creating boundaries during rewrapping, ensuring content from one
-    /// prefixed section doesn't merge with another (e.g., markdown list items).
-    /// By default, Zed treats as paragraph and comment prefixes as boundaries.
-    #[serde(default, deserialize_with = "deserialize_regex_vec")]
-    #[schemars(schema_with = "regex_vec_json_schema")]
-    pub rewrap_prefixes: Vec<Regex>,
-    /// A list of language servers that are allowed to run on subranges of a given language.
-    #[serde(default)]
-    pub scope_opt_in_language_servers: Vec<LanguageServerName>,
-    #[serde(default)]
-    pub overrides: HashMap<String, LanguageConfigOverride>,
-    /// A list of characters that Zed should treat as word characters for the
-    /// purpose of features that operate on word boundaries, like 'move to next word end'
-    /// or a whole-word search in buffer search.
-    #[serde(default)]
-    pub word_characters: HashSet<char>,
-    /// Whether to indent lines using tab characters, as opposed to multiple
-    /// spaces.
-    #[serde(default)]
-    pub hard_tabs: Option<bool>,
-    /// How many columns a tab should occupy.
-    #[serde(default)]
-    #[schemars(range(min = 1, max = 128))]
-    pub tab_size: Option<NonZeroU32>,
-    /// How to soft-wrap long lines of text.
-    #[serde(default)]
-    pub soft_wrap: Option<SoftWrap>,
-    /// When set, selections can be wrapped using prefix/suffix pairs on both sides.
-    #[serde(default)]
-    pub wrap_characters: Option<WrapCharactersConfig>,
-    /// The name of a Prettier parser that will be used for this language when no file path is available.
-    /// If there's a parser name in the language settings, that will be used instead.
-    #[serde(default)]
-    pub prettier_parser_name: Option<String>,
-    /// If true, this language is only for syntax highlighting via an injection into other
-    /// languages, but should not appear to the user as a distinct language.
-    #[serde(default)]
-    pub hidden: bool,
-    /// If configured, this language contains JSX style tags, and should support auto-closing of those tags.
-    #[serde(default)]
-    pub jsx_tag_auto_close: Option<JsxTagAutoCloseConfig>,
-    /// A list of characters that Zed should treat as word characters for completion queries.
-    #[serde(default)]
-    pub completion_query_characters: HashSet<char>,
-    /// A list of characters that Zed should treat as word characters for linked edit operations.
-    #[serde(default)]
-    pub linked_edit_characters: HashSet<char>,
-    /// A list of preferred debuggers for this language.
-    #[serde(default)]
-    pub debuggers: IndexSet<SharedString>,
-    /// A list of import namespace segments that aren't expected to appear in file paths. For
-    /// example, "super" and "crate" in Rust.
-    #[serde(default)]
-    pub ignored_import_segments: HashSet<Arc<str>>,
-    /// Regular expression that matches substrings to omit from import paths, to make the paths more
-    /// similar to how they are specified when imported. For example, "/mod\.rs$" or "/__init__\.py$".
-    #[serde(default, deserialize_with = "deserialize_regex")]
-    #[schemars(schema_with = "regex_json_schema")]
-    pub import_path_strip_regex: Option<Regex>,
-}
-
-impl LanguageConfig {
-    pub const FILE_NAME: &str = "config.toml";
-
-    pub fn load(config_path: impl AsRef<Path>) -> Result<Self> {
-        let config = std::fs::read_to_string(config_path.as_ref())?;
-        toml::from_str(&config).map_err(Into::into)
-    }
-}
-
-#[derive(Clone, Debug, Deserialize, Default, JsonSchema)]
-pub struct DecreaseIndentConfig {
-    #[serde(default, deserialize_with = "deserialize_regex")]
-    #[schemars(schema_with = "regex_json_schema")]
-    pub pattern: Option<Regex>,
-    #[serde(default)]
-    pub valid_after: Vec<String>,
-}
-
-/// Configuration for continuing ordered lists with auto-incrementing numbers.
-#[derive(Clone, Debug, Deserialize, JsonSchema)]
-pub struct OrderedListConfig {
-    /// A regex pattern with a capture group for the number portion (e.g., `(\\d+)\\. `).
-    pub pattern: String,
-    /// A format string where `{1}` is replaced with the incremented number (e.g., `{1}. `).
-    pub format: String,
-}
-
-/// Configuration for continuing task lists on newline.
-#[derive(Clone, Debug, Deserialize, JsonSchema)]
-pub struct TaskListConfig {
-    /// The list markers to match (e.g., `- [ ] `, `- [x] `).
-    pub prefixes: Vec<Arc<str>>,
-    /// The marker to insert when continuing the list on a new line (e.g., `- [ ] `).
-    pub continuation: Arc<str>,
-}
-
-#[derive(Clone, Debug, Serialize, Deserialize, Default, JsonSchema)]
-pub struct LanguageMatcher {
-    /// Given a list of `LanguageConfig`'s, the language of a file can be determined based on the path extension matching any of the `path_suffixes`.
-    #[serde(default)]
-    pub path_suffixes: Vec<String>,
-    /// A regex pattern that determines whether the language should be assigned to a file or not.
-    #[serde(
-        default,
-        serialize_with = "serialize_regex",
-        deserialize_with = "deserialize_regex"
-    )]
-    #[schemars(schema_with = "regex_json_schema")]
-    pub first_line_pattern: Option<Regex>,
-    /// Alternative names for this language used in vim/emacs modelines.
-    /// These are matched case-insensitively against the `mode` (emacs) or
-    /// `filetype`/`ft` (vim) specified in the modeline.
-    #[serde(default)]
-    pub modeline_aliases: Vec<String>,
-}
-
-/// The configuration for JSX tag auto-closing.
-#[derive(Clone, Deserialize, JsonSchema, Debug)]
-pub struct JsxTagAutoCloseConfig {
-    /// The name of the node for a opening tag
-    pub open_tag_node_name: String,
-    /// The name of the node for an closing tag
-    pub close_tag_node_name: String,
-    /// The name of the node for a complete element with children for open and close tags
-    pub jsx_element_node_name: String,
-    /// The name of the node found within both opening and closing
-    /// tags that describes the tag name
-    pub tag_name_node_name: String,
-    /// Alternate Node names for tag names.
-    /// Specifically needed as TSX represents the name in `<Foo.Bar>`
-    /// as `member_expression` rather than `identifier` as usual
-    #[serde(default)]
-    pub tag_name_node_name_alternates: Vec<String>,
-    /// Some grammars are smart enough to detect a closing tag
-    /// that is not valid i.e. doesn't match it's corresponding
-    /// opening tag or does not have a corresponding opening tag
-    /// This should be set to the name of the node for invalid
-    /// closing tags if the grammar contains such a node, otherwise
-    /// detecting already closed tags will not work properly
-    #[serde(default)]
-    pub erroneous_close_tag_node_name: Option<String>,
-    /// See above for erroneous_close_tag_node_name for details
-    /// This should be set if the node used for the tag name
-    /// within erroneous closing tags is different from the
-    /// normal tag name node name
-    #[serde(default)]
-    pub erroneous_close_tag_name_node_name: Option<String>,
-}
-
-/// The configuration for block comments for this language.
-#[derive(Clone, Debug, JsonSchema, PartialEq)]
-pub struct BlockCommentConfig {
-    /// A start tag of block comment.
-    pub start: Arc<str>,
-    /// A end tag of block comment.
-    pub end: Arc<str>,
-    /// A character to add as a prefix when a new line is added to a block comment.
-    pub prefix: Arc<str>,
-    /// A indent to add for prefix and end line upon new line.
-    #[schemars(range(min = 1, max = 128))]
-    pub tab_size: u32,
-}
-
-impl<'de> Deserialize<'de> for BlockCommentConfig {
-    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
-    where
-        D: Deserializer<'de>,
-    {
-        #[derive(Deserialize)]
-        #[serde(untagged)]
-        enum BlockCommentConfigHelper {
-            New {
-                start: Arc<str>,
-                end: Arc<str>,
-                prefix: Arc<str>,
-                tab_size: u32,
-            },
-            Old([Arc<str>; 2]),
-        }
-
-        match BlockCommentConfigHelper::deserialize(deserializer)? {
-            BlockCommentConfigHelper::New {
-                start,
-                end,
-                prefix,
-                tab_size,
-            } => Ok(BlockCommentConfig {
-                start,
-                end,
-                prefix,
-                tab_size,
-            }),
-            BlockCommentConfigHelper::Old([start, end]) => Ok(BlockCommentConfig {
-                start,
-                end,
-                prefix: "".into(),
-                tab_size: 0,
-            }),
-        }
-    }
-}
-
 /// Represents a language for the given range. Some languages (e.g. HTML)
 /// interleave several languages together, thus a single buffer might actually contain
 /// several nested scopes.
@@ -1115,148 +804,6 @@ pub struct LanguageScope {
     override_id: Option<u32>,
 }
 
-#[derive(Clone, Deserialize, Default, Debug, JsonSchema)]
-pub struct LanguageConfigOverride {
-    #[serde(default)]
-    pub line_comments: Override<Vec<Arc<str>>>,
-    #[serde(default)]
-    pub block_comment: Override<BlockCommentConfig>,
-    #[serde(skip)]
-    pub disabled_bracket_ixs: Vec<u16>,
-    #[serde(default)]
-    pub word_characters: Override<HashSet<char>>,
-    #[serde(default)]
-    pub completion_query_characters: Override<HashSet<char>>,
-    #[serde(default)]
-    pub linked_edit_characters: Override<HashSet<char>>,
-    #[serde(default)]
-    pub opt_into_language_servers: Vec<LanguageServerName>,
-    #[serde(default)]
-    pub prefer_label_for_snippet: Option<bool>,
-}
-
-#[derive(Clone, Deserialize, Debug, Serialize, JsonSchema)]
-#[serde(untagged)]
-pub enum Override<T> {
-    Remove { remove: bool },
-    Set(T),
-}
-
-impl<T> Default for Override<T> {
-    fn default() -> Self {
-        Override::Remove { remove: false }
-    }
-}
-
-impl<T> Override<T> {
-    fn as_option<'a>(this: Option<&'a Self>, original: Option<&'a T>) -> Option<&'a T> {
-        match this {
-            Some(Self::Set(value)) => Some(value),
-            Some(Self::Remove { remove: true }) => None,
-            Some(Self::Remove { remove: false }) | None => original,
-        }
-    }
-}
-
-impl Default for LanguageConfig {
-    fn default() -> Self {
-        Self {
-            name: LanguageName::new_static(""),
-            code_fence_block_name: None,
-            kernel_language_names: Default::default(),
-            grammar: None,
-            matcher: LanguageMatcher::default(),
-            brackets: Default::default(),
-            auto_indent_using_last_non_empty_line: auto_indent_using_last_non_empty_line_default(),
-            auto_indent_on_paste: None,
-            increase_indent_pattern: Default::default(),
-            decrease_indent_pattern: Default::default(),
-            decrease_indent_patterns: Default::default(),
-            autoclose_before: Default::default(),
-            line_comments: Default::default(),
-            block_comment: Default::default(),
-            documentation_comment: Default::default(),
-            unordered_list: Default::default(),
-            ordered_list: Default::default(),
-            task_list: Default::default(),
-            rewrap_prefixes: Default::default(),
-            scope_opt_in_language_servers: Default::default(),
-            overrides: Default::default(),
-            word_characters: Default::default(),
-            collapsed_placeholder: Default::default(),
-            hard_tabs: None,
-            tab_size: None,
-            soft_wrap: None,
-            wrap_characters: None,
-            prettier_parser_name: None,
-            hidden: false,
-            jsx_tag_auto_close: None,
-            completion_query_characters: Default::default(),
-            linked_edit_characters: Default::default(),
-            debuggers: Default::default(),
-            ignored_import_segments: Default::default(),
-            import_path_strip_regex: None,
-        }
-    }
-}
-
-#[derive(Clone, Debug, Deserialize, JsonSchema)]
-pub struct WrapCharactersConfig {
-    /// Opening token split into a prefix and suffix. The first caret goes
-    /// after the prefix (i.e., between prefix and suffix).
-    pub start_prefix: String,
-    pub start_suffix: String,
-    /// Closing token split into a prefix and suffix. The second caret goes
-    /// after the prefix (i.e., between prefix and suffix).
-    pub end_prefix: String,
-    pub end_suffix: String,
-}
-
-fn auto_indent_using_last_non_empty_line_default() -> bool {
-    true
-}
-
-fn deserialize_regex<'de, D: Deserializer<'de>>(d: D) -> Result<Option<Regex>, D::Error> {
-    let source = Option::<String>::deserialize(d)?;
-    if let Some(source) = source {
-        Ok(Some(regex::Regex::new(&source).map_err(de::Error::custom)?))
-    } else {
-        Ok(None)
-    }
-}
-
-fn regex_json_schema(_: &mut schemars::SchemaGenerator) -> schemars::Schema {
-    json_schema!({
-        "type": "string"
-    })
-}
-
-fn serialize_regex<S>(regex: &Option<Regex>, serializer: S) -> Result<S::Ok, S::Error>
-where
-    S: Serializer,
-{
-    match regex {
-        Some(regex) => serializer.serialize_str(regex.as_str()),
-        None => serializer.serialize_none(),
-    }
-}
-
-fn deserialize_regex_vec<'de, D: Deserializer<'de>>(d: D) -> Result<Vec<Regex>, D::Error> {
-    let sources = Vec::<String>::deserialize(d)?;
-    sources
-        .into_iter()
-        .map(|source| regex::Regex::new(&source))
-        .collect::<Result<_, _>>()
-        .map_err(de::Error::custom)
-}
-
-fn regex_vec_json_schema(_: &mut SchemaGenerator) -> schemars::Schema {
-    json_schema!({
-        "type": "array",
-        "items": { "type": "string" }
-    })
-}
-
 #[doc(hidden)]
 #[cfg(any(test, feature = "test-support"))]
 pub struct FakeLspAdapter {
@@ -1279,79 +826,6 @@ pub struct FakeLspAdapter {
     >,
 }
 
-/// Configuration of handling bracket pairs for a given language.
-///
-/// This struct includes settings for defining which pairs of characters are considered brackets and
-/// also specifies any language-specific scopes where these pairs should be ignored for bracket matching purposes.
-#[derive(Clone, Debug, Default, JsonSchema)]
-#[schemars(with = "Vec::<BracketPairContent>")]
-pub struct BracketPairConfig {
-    /// A list of character pairs that should be treated as brackets in the context of a given language.
-    pub pairs: Vec<BracketPair>,
-    /// A list of tree-sitter scopes for which a given bracket should not be active.
-    /// N-th entry in `[Self::disabled_scopes_by_bracket_ix]` contains a list of disabled scopes for an n-th entry in `[Self::pairs]`
-    pub disabled_scopes_by_bracket_ix: Vec<Vec<String>>,
-}
-
-impl BracketPairConfig {
-    pub fn is_closing_brace(&self, c: char) -> bool {
-        self.pairs.iter().any(|pair| pair.end.starts_with(c))
-    }
-}
-
-#[derive(Deserialize, JsonSchema)]
-pub struct BracketPairContent {
-    #[serde(flatten)]
-    pub bracket_pair: BracketPair,
-    #[serde(default)]
-    pub not_in: Vec<String>,
-}
-
-impl<'de> Deserialize<'de> for BracketPairConfig {
-    fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
-    where
-        D: Deserializer<'de>,
-    {
-        let result = Vec::<BracketPairContent>::deserialize(deserializer)?;
-        let (brackets, disabled_scopes_by_bracket_ix) = result
-            .into_iter()
-            .map(|entry| (entry.bracket_pair, entry.not_in))
-            .unzip();
-
-        Ok(BracketPairConfig {
-            pairs: brackets,
-            disabled_scopes_by_bracket_ix,
-        })
-    }
-}
-
-/// Describes a single bracket pair and how an editor should react to e.g. inserting
-/// an opening bracket or to a newline character insertion in between `start` and `end` characters.
-#[derive(Clone, Debug, Default, Deserialize, PartialEq, JsonSchema)]
-pub struct BracketPair {
-    /// Starting substring for a bracket.
-    pub start: String,
-    /// Ending substring for a bracket.
-    pub end: String,
-    /// True if `end` should be automatically inserted right after `start` characters.
-    pub close: bool,
-    /// True if selected text should be surrounded by `start` and `end` characters.
-    #[serde(default = "default_true")]
-    pub surround: bool,
-    /// True if an extra newline should be inserted while the cursor is in the middle
-    /// of that bracket pair.
-    pub newline: bool,
-}
-
-#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
-pub struct LanguageId(usize);
-
-impl LanguageId {
-    pub(crate) fn new() -> Self {
-        Self(NEXT_LANGUAGE_ID.fetch_add(1, SeqCst))
-    }
-}
-
 pub struct Language {
     pub(crate) id: LanguageId,
     pub(crate) config: LanguageConfig,
@@ -1361,184 +835,6 @@ pub struct Language {
     pub(crate) manifest_name: Option<ManifestName>,
 }
 
-#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
-pub struct GrammarId(pub usize);
-
-impl GrammarId {
-    pub(crate) fn new() -> Self {
-        Self(NEXT_GRAMMAR_ID.fetch_add(1, SeqCst))
-    }
-}
-
-pub struct Grammar {
-    id: GrammarId,
-    pub ts_language: tree_sitter::Language,
-    pub(crate) error_query: Option<Query>,
-    pub highlights_config: Option<HighlightsConfig>,
-    pub(crate) brackets_config: Option<BracketsConfig>,
-    pub(crate) redactions_config: Option<RedactionConfig>,
-    pub(crate) runnable_config: Option<RunnableConfig>,
-    pub(crate) indents_config: Option<IndentConfig>,
-    pub outline_config: Option<OutlineConfig>,
-    pub text_object_config: Option<TextObjectConfig>,
-    pub(crate) injection_config: Option<InjectionConfig>,
-    pub(crate) override_config: Option<OverrideConfig>,
-    pub(crate) debug_variables_config: Option<DebugVariablesConfig>,
-    pub(crate) imports_config: Option<ImportsConfig>,
-    pub(crate) highlight_map: Mutex<HighlightMap>,
-}
-
-pub struct HighlightsConfig {
-    pub query: Query,
-    pub identifier_capture_indices: Vec<u32>,
-}
-
-struct IndentConfig {
-    query: Query,
-    indent_capture_ix: u32,
-    start_capture_ix: Option<u32>,
-    end_capture_ix: Option<u32>,
-    outdent_capture_ix: Option<u32>,
-    suffixed_start_captures: HashMap<u32, SharedString>,
-}
-
-pub struct OutlineConfig {
-    pub query: Query,
-    pub item_capture_ix: u32,
-    pub name_capture_ix: u32,
-    pub context_capture_ix: Option<u32>,
-    pub extra_context_capture_ix: Option<u32>,
-    pub open_capture_ix: Option<u32>,
-    pub close_capture_ix: Option<u32>,
-    pub annotation_capture_ix: Option<u32>,
-}
-
-#[derive(Debug, Clone, Copy, PartialEq)]
-pub enum DebuggerTextObject {
-    Variable,
-    Scope,
-}
-
-impl DebuggerTextObject {
-    pub fn from_capture_name(name: &str) -> Option<DebuggerTextObject> {
-        match name {
-            "debug-variable" => Some(DebuggerTextObject::Variable),
-            "debug-scope" => Some(DebuggerTextObject::Scope),
-            _ => None,
-        }
-    }
-}
-
-#[derive(Debug, Clone, Copy, PartialEq)]
-pub enum TextObject {
-    InsideFunction,
-    AroundFunction,
-    InsideClass,
-    AroundClass,
-    InsideComment,
-    AroundComment,
-}
-
-impl TextObject {
-    pub fn from_capture_name(name: &str) -> Option<TextObject> {
-        match name {
-            "function.inside" => Some(TextObject::InsideFunction),
-            "function.around" => Some(TextObject::AroundFunction),
-            "class.inside" => Some(TextObject::InsideClass),
-            "class.around" => Some(TextObject::AroundClass),
-            "comment.inside" => Some(TextObject::InsideComment),
-            "comment.around" => Some(TextObject::AroundComment),
-            _ => None,
-        }
-    }
-
-    pub fn around(&self) -> Option<Self> {
-        match self {
-            TextObject::InsideFunction => Some(TextObject::AroundFunction),
-            TextObject::InsideClass => Some(TextObject::AroundClass),
-            TextObject::InsideComment => Some(TextObject::AroundComment),
-            _ => None,
-        }
-    }
-}
-
-pub struct TextObjectConfig {
-    pub query: Query,
-    pub text_objects_by_capture_ix: Vec<(u32, TextObject)>,
-}
-
-struct InjectionConfig {
-    query: Query,
-    content_capture_ix: u32,
-    language_capture_ix: Option<u32>,
-    patterns: Vec<InjectionPatternConfig>,
-}
-
-struct RedactionConfig {
-    pub query: Query,
-    pub redaction_capture_ix: u32,
-}
-
-#[derive(Clone, Debug, PartialEq)]
-enum RunnableCapture {
-    Named(SharedString),
-    Run,
-}
-
-struct RunnableConfig {
-    pub query: Query,
-    /// A mapping from capture indice to capture kind
-    pub extra_captures: Vec<RunnableCapture>,
-}
-
-struct OverrideConfig {
-    query: Query,
-    values: HashMap<u32, OverrideEntry>,
-}
-
-#[derive(Debug)]
-struct OverrideEntry {
-    name: String,
-    range_is_inclusive: bool,
-    value: LanguageConfigOverride,
-}
-
-#[derive(Default, Clone)]
-struct InjectionPatternConfig {
-    language: Option<Box<str>>,
-    combined: bool,
-}
-
-#[derive(Debug)]
-struct BracketsConfig {
-    query: Query,
-    open_capture_ix: u32,
-    close_capture_ix: u32,
-    patterns: Vec<BracketsPatternConfig>,
-}
-
-#[derive(Clone, Debug, Default)]
-struct BracketsPatternConfig {
-    newline_only: bool,
-    rainbow_exclude: bool,
-}
-
-pub struct DebugVariablesConfig {
-    pub query: Query,
-    pub objects_by_capture_ix: Vec<(u32, DebuggerTextObject)>,
-}
-
-pub struct ImportsConfig {
-    pub query: Query,
-    pub import_ix: u32,
-    pub name_ix: Option<u32>,
-    pub namespace_ix: Option<u32>,
-    pub source_ix: Option<u32>,
-    pub list_ix: Option<u32>,
-    pub wildcard_ix: Option<u32>,
-    pub alias_ix: Option<u32>,
-}
-
 impl Language {
     pub fn new(config: LanguageConfig, ts_language: Option<tree_sitter::Language>) -> Self {
         Self::new_with_id(LanguageId::new(), config, ts_language)
@@ -1556,25 +852,7 @@ impl Language {
         Self {
             id,
             config,
-            grammar: ts_language.map(|ts_language| {
-                Arc::new(Grammar {
-                    id: GrammarId::new(),
-                    highlights_config: None,
-                    brackets_config: None,
-                    outline_config: None,
-                    text_object_config: None,
-                    indents_config: None,
-                    injection_config: None,
-                    override_config: None,
-                    redactions_config: None,
-                    runnable_config: None,
-                    error_query: Query::new(&ts_language, "(ERROR) @error").ok(),
-                    debug_variables_config: None,
-                    imports_config: None,
-                    ts_language,
-                    highlight_map: Default::default(),
-                })
-            }),
+            grammar: ts_language.map(|ts_language| Arc::new(Grammar::new(ts_language))),
             context_provider: None,
             toolchain: None,
             manifest_name: None,
@@ -1597,493 +875,99 @@ impl Language {
     }
 
     pub fn with_queries(mut self, queries: LanguageQueries) -> Result<Self> {
-        if let Some(query) = queries.highlights {
-            self = self
-                .with_highlights_query(query.as_ref())
-                .context("Error loading highlights query")?;
-        }
-        if let Some(query) = queries.brackets {
-            self = self
-                .with_brackets_query(query.as_ref())
-                .context("Error loading brackets query")?;
-        }
-        if let Some(query) = queries.indents {
-            self = self
-                .with_indents_query(query.as_ref())
-                .context("Error loading indents query")?;
-        }
-        if let Some(query) = queries.outline {
-            self = self
-                .with_outline_query(query.as_ref())
-                .context("Error loading outline query")?;
-        }
-        if let Some(query) = queries.injections {
-            self = self
-                .with_injection_query(query.as_ref())
-                .context("Error loading injection query")?;
-        }
-        if let Some(query) = queries.overrides {
-            self = self
-                .with_override_query(query.as_ref())
-                .context("Error loading override query")?;
-        }
-        if let Some(query) = queries.redactions {
-            self = self
-                .with_redaction_query(query.as_ref())
-                .context("Error loading redaction query")?;
-        }
-        if let Some(query) = queries.runnables {
-            self = self
-                .with_runnable_query(query.as_ref())
-                .context("Error loading runnables query")?;
-        }
-        if let Some(query) = queries.text_objects {
-            self = self
-                .with_text_object_query(query.as_ref())
-                .context("Error loading textobject query")?;
-        }
-        if let Some(query) = queries.debugger {
-            self = self
-                .with_debug_variables_query(query.as_ref())
-                .context("Error loading debug variables query")?;
-        }
-        if let Some(query) = queries.imports {
-            self = self
-                .with_imports_query(query.as_ref())
-                .context("Error loading imports query")?;
+        if let Some(grammar) = self.grammar.take() {
+            let grammar =
+                Arc::try_unwrap(grammar).map_err(|_| anyhow::anyhow!("cannot mutate grammar"))?;
+            let grammar = grammar.with_queries(queries, &mut self.config)?;
+            self.grammar = Some(Arc::new(grammar));
         }
         Ok(self)
     }
 
-    pub fn with_highlights_query(mut self, source: &str) -> Result<Self> {
-        let grammar = self.grammar_mut()?;
-        let query = Query::new(&grammar.ts_language, source)?;
-
-        let mut identifier_capture_indices = Vec::new();
-        for name in [
-            "variable",
-            "constant",
-            "constructor",
-            "function",
-            "function.method",
-            "function.method.call",
-            "function.special",
-            "property",
-            "type",
-            "type.interface",
-        ] {
-            identifier_capture_indices.extend(query.capture_index_for_name(name));
-        }
-
-        grammar.highlights_config = Some(HighlightsConfig {
-            query,
-            identifier_capture_indices,
-        });
-
-        Ok(self)
+    pub fn with_highlights_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query(|grammar| grammar.with_highlights_query(source))
     }
 
-    pub fn with_runnable_query(mut self, source: &str) -> Result<Self> {
-        let grammar = self.grammar_mut()?;
-
-        let query = Query::new(&grammar.ts_language, source)?;
-        let extra_captures: Vec<_> = query
-            .capture_names()
-            .iter()
-            .map(|&name| match name {
-                "run" => RunnableCapture::Run,
-                name => RunnableCapture::Named(name.to_string().into()),
-            })
-            .collect();
-
-        grammar.runnable_config = Some(RunnableConfig {
-            extra_captures,
-            query,
-        });
-
-        Ok(self)
+    pub fn with_runnable_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query(|grammar| grammar.with_runnable_query(source))
     }
 
-    pub fn with_outline_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-        let mut item_capture_ix = 0;
-        let mut name_capture_ix = 0;
-        let mut context_capture_ix = None;
-        let mut extra_context_capture_ix = None;
-        let mut open_capture_ix = None;
-        let mut close_capture_ix = None;
-        let mut annotation_capture_ix = None;
-        if populate_capture_indices(
-            &query,
-            &self.config.name,
-            "outline",
-            &[],
-            &mut [
-                Capture::Required("item", &mut item_capture_ix),
-                Capture::Required("name", &mut name_capture_ix),
-                Capture::Optional("context", &mut context_capture_ix),
-                Capture::Optional("context.extra", &mut extra_context_capture_ix),
-                Capture::Optional("open", &mut open_capture_ix),
-                Capture::Optional("close", &mut close_capture_ix),
-                Capture::Optional("annotation", &mut annotation_capture_ix),
-            ],
-        ) {
-            self.grammar_mut()?.outline_config = Some(OutlineConfig {
-                query,
-                item_capture_ix,
-                name_capture_ix,
-                context_capture_ix,
-                extra_context_capture_ix,
-                open_capture_ix,
-                close_capture_ix,
-                annotation_capture_ix,
-            });
-        }
-        Ok(self)
+    pub fn with_outline_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| grammar.with_outline_query(source, name))
     }
 
-    pub fn with_text_object_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-
-        let mut text_objects_by_capture_ix = Vec::new();
-        for (ix, name) in query.capture_names().iter().enumerate() {
-            if let Some(text_object) = TextObject::from_capture_name(name) {
-                text_objects_by_capture_ix.push((ix as u32, text_object));
-            } else {
-                log::warn!(
-                    "unrecognized capture name '{}' in {} textobjects TreeSitter query",
-                    name,
-                    self.config.name,
-                );
-            }
-        }
-
-        self.grammar_mut()?.text_object_config = Some(TextObjectConfig {
-            query,
-            text_objects_by_capture_ix,
-        });
-        Ok(self)
+    pub fn with_text_object_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| {
+            grammar.with_text_object_query(source, name)
+        })
     }
 
-    pub fn with_debug_variables_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-
-        let mut objects_by_capture_ix = Vec::new();
-        for (ix, name) in query.capture_names().iter().enumerate() {
-            if let Some(text_object) = DebuggerTextObject::from_capture_name(name) {
-                objects_by_capture_ix.push((ix as u32, text_object));
-            } else {
-                log::warn!(
-                    "unrecognized capture name '{}' in {} debugger TreeSitter query",
-                    name,
-                    self.config.name,
-                );
-            }
-        }
+    pub fn with_debug_variables_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| {
+            grammar.with_debug_variables_query(source, name)
+        })
+    }
 
-        self.grammar_mut()?.debug_variables_config = Some(DebugVariablesConfig {
-            query,
-            objects_by_capture_ix,
-        });
-        Ok(self)
+    pub fn with_imports_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| grammar.with_imports_query(source, name))
     }
 
-    pub fn with_imports_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-
-        let mut import_ix = 0;
-        let mut name_ix = None;
-        let mut namespace_ix = None;
-        let mut source_ix = None;
-        let mut list_ix = None;
-        let mut wildcard_ix = None;
-        let mut alias_ix = None;
-        if populate_capture_indices(
-            &query,
-            &self.config.name,
-            "imports",
-            &[],
-            &mut [
-                Capture::Required("import", &mut import_ix),
-                Capture::Optional("name", &mut name_ix),
-                Capture::Optional("namespace", &mut namespace_ix),
-                Capture::Optional("source", &mut source_ix),
-                Capture::Optional("list", &mut list_ix),
-                Capture::Optional("wildcard", &mut wildcard_ix),
-                Capture::Optional("alias", &mut alias_ix),
-            ],
-        ) {
-            self.grammar_mut()?.imports_config = Some(ImportsConfig {
-                query,
-                import_ix,
-                name_ix,
-                namespace_ix,
-                source_ix,
-                list_ix,
-                wildcard_ix,
-                alias_ix,
-            });
-        }
-        return Ok(self);
-    }
-
-    pub fn with_brackets_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-        let mut open_capture_ix = 0;
-        let mut close_capture_ix = 0;
-        if populate_capture_indices(
-            &query,
-            &self.config.name,
-            "brackets",
-            &[],
-            &mut [
-                Capture::Required("open", &mut open_capture_ix),
-                Capture::Required("close", &mut close_capture_ix),
-            ],
-        ) {
-            let patterns = (0..query.pattern_count())
-                .map(|ix| {
-                    let mut config = BracketsPatternConfig::default();
-                    for setting in query.property_settings(ix) {
-                        let setting_key = setting.key.as_ref();
-                        if setting_key == "newline.only" {
-                            config.newline_only = true
-                        }
-                        if setting_key == "rainbow.exclude" {
-                            config.rainbow_exclude = true
-                        }
-                    }
-                    config
-                })
-                .collect();
-            self.grammar_mut()?.brackets_config = Some(BracketsConfig {
-                query,
-                open_capture_ix,
-                close_capture_ix,
-                patterns,
-            });
-        }
-        Ok(self)
+    pub fn with_brackets_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| grammar.with_brackets_query(source, name))
     }
 
-    pub fn with_indents_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-        let mut indent_capture_ix = 0;
-        let mut start_capture_ix = None;
-        let mut end_capture_ix = None;
-        let mut outdent_capture_ix = None;
-        if populate_capture_indices(
-            &query,
-            &self.config.name,
-            "indents",
-            &["start."],
-            &mut [
-                Capture::Required("indent", &mut indent_capture_ix),
-                Capture::Optional("start", &mut start_capture_ix),
-                Capture::Optional("end", &mut end_capture_ix),
-                Capture::Optional("outdent", &mut outdent_capture_ix),
-            ],
-        ) {
-            let mut suffixed_start_captures = HashMap::default();
-            for (ix, name) in query.capture_names().iter().enumerate() {
-                if let Some(suffix) = name.strip_prefix("start.") {
-                    suffixed_start_captures.insert(ix as u32, suffix.to_owned().into());
-                }
-            }
+    pub fn with_indents_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| grammar.with_indents_query(source, name))
+    }
 
-            self.grammar_mut()?.indents_config = Some(IndentConfig {
-                query,
-                indent_capture_ix,
-                start_capture_ix,
-                end_capture_ix,
-                outdent_capture_ix,
-                suffixed_start_captures,
-            });
-        }
-        Ok(self)
+    pub fn with_injection_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| grammar.with_injection_query(source, name))
     }
 
-    pub fn with_injection_query(mut self, source: &str) -> Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-        let mut language_capture_ix = None;
-        let mut injection_language_capture_ix = None;
-        let mut content_capture_ix = None;
-        let mut injection_content_capture_ix = None;
-        if populate_capture_indices(
-            &query,
-            &self.config.name,
-            "injections",
-            &[],
-            &mut [
-                Capture::Optional("language", &mut language_capture_ix),
-                Capture::Optional("injection.language", &mut injection_language_capture_ix),
-                Capture::Optional("content", &mut content_capture_ix),
-                Capture::Optional("injection.content", &mut injection_content_capture_ix),
-            ],
-        ) {
-            language_capture_ix = match (language_capture_ix, injection_language_capture_ix) {
-                (None, Some(ix)) => Some(ix),
-                (Some(_), Some(_)) => {
-                    anyhow::bail!("both language and injection.language captures are present");
-                }
-                _ => language_capture_ix,
-            };
-            content_capture_ix = match (content_capture_ix, injection_content_capture_ix) {
-                (None, Some(ix)) => Some(ix),
-                (Some(_), Some(_)) => {
-                    anyhow::bail!("both content and injection.content captures are present")
-                }
-                _ => content_capture_ix,
-            };
-            let patterns = (0..query.pattern_count())
-                .map(|ix| {
-                    let mut config = InjectionPatternConfig::default();
-                    for setting in query.property_settings(ix) {
-                        match setting.key.as_ref() {
-                            "language" | "injection.language" => {
-                                config.language.clone_from(&setting.value);
-                            }
-                            "combined" | "injection.combined" => {
-                                config.combined = true;
-                            }
-                            _ => {}
-                        }
-                    }
-                    config
-                })
-                .collect();
-            if let Some(content_capture_ix) = content_capture_ix {
-                self.grammar_mut()?.injection_config = Some(InjectionConfig {
-                    query,
-                    language_capture_ix,
-                    content_capture_ix,
-                    patterns,
-                });
-            } else {
-                log::error!(
-                    "missing required capture in injections {} TreeSitter query: \
-                    content or injection.content",
-                    &self.config.name,
-                );
-            }
+    pub fn with_override_query(mut self, source: &str) -> Result<Self> {
+        if let Some(grammar_arc) = self.grammar.take() {
+            let grammar = Arc::try_unwrap(grammar_arc)
+                .map_err(|_| anyhow::anyhow!("cannot mutate grammar"))?;
+            let grammar = grammar.with_override_query(
+                source,
+                &self.config.name,
+                &self.config.overrides,
+                &mut self.config.brackets,
+                &self.config.scope_opt_in_language_servers,
+            )?;
+            self.grammar = Some(Arc::new(grammar));
         }
         Ok(self)
     }
 
-    pub fn with_override_query(mut self, source: &str) -> anyhow::Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-
-        let mut override_configs_by_id = HashMap::default();
-        for (ix, mut name) in query.capture_names().iter().copied().enumerate() {
-            let mut range_is_inclusive = false;
-            if name.starts_with('_') {
-                continue;
-            }
-            if let Some(prefix) = name.strip_suffix(".inclusive") {
-                name = prefix;
-                range_is_inclusive = true;
-            }
-
-            let value = self.config.overrides.get(name).cloned().unwrap_or_default();
-            for server_name in &value.opt_into_language_servers {
-                if !self
-                    .config
-                    .scope_opt_in_language_servers
-                    .contains(server_name)
-                {
-                    util::debug_panic!(
-                        "Server {server_name:?} has been opted-in by scope {name:?} but has not been marked as an opt-in server"
-                    );
-                }
-            }
-
-            override_configs_by_id.insert(
-                ix as u32,
-                OverrideEntry {
-                    name: name.to_string(),
-                    range_is_inclusive,
-                    value,
-                },
-            );
-        }
-
-        let referenced_override_names = self.config.overrides.keys().chain(
-            self.config
-                .brackets
-                .disabled_scopes_by_bracket_ix
-                .iter()
-                .flatten(),
-        );
-
-        for referenced_name in referenced_override_names {
-            if !override_configs_by_id
-                .values()
-                .any(|entry| entry.name == *referenced_name)
-            {
-                anyhow::bail!(
-                    "language {:?} has overrides in config not in query: {referenced_name:?}",
-                    self.config.name
-                );
-            }
-        }
+    pub fn with_redaction_query(self, source: &str) -> Result<Self> {
+        self.with_grammar_query_and_name(|grammar, name| grammar.with_redaction_query(source, name))
+    }
 
-        for entry in override_configs_by_id.values_mut() {
-            entry.value.disabled_bracket_ixs = self
-                .config
-                .brackets
-                .disabled_scopes_by_bracket_ix
-                .iter()
-                .enumerate()
-                .filter_map(|(ix, disabled_scope_names)| {
-                    if disabled_scope_names.contains(&entry.name) {
-                        Some(ix as u16)
-                    } else {
-                        None
-                    }
-                })
-                .collect();
+    fn with_grammar_query(
+        mut self,
+        build: impl FnOnce(Grammar) -> Result<Grammar>,
+    ) -> Result<Self> {
+        if let Some(grammar_arc) = self.grammar.take() {
+            let grammar = Arc::try_unwrap(grammar_arc)
+                .map_err(|_| anyhow::anyhow!("cannot mutate grammar"))?;
+            self.grammar = Some(Arc::new(build(grammar)?));
         }
-
-        self.config.brackets.disabled_scopes_by_bracket_ix.clear();
-
-        let grammar = self.grammar_mut()?;
-        grammar.override_config = Some(OverrideConfig {
-            query,
-            values: override_configs_by_id,
-        });
         Ok(self)
     }
 
-    pub fn with_redaction_query(mut self, source: &str) -> anyhow::Result<Self> {
-        let query = Query::new(&self.expect_grammar()?.ts_language, source)?;
-        let mut redaction_capture_ix = 0;
-        if populate_capture_indices(
-            &query,
-            &self.config.name,
-            "redactions",
-            &[],
-            &mut [Capture::Required("redact", &mut redaction_capture_ix)],
-        ) {
-            self.grammar_mut()?.redactions_config = Some(RedactionConfig {
-                query,
-                redaction_capture_ix,
-            });
+    fn with_grammar_query_and_name(
+        mut self,
+        build: impl FnOnce(Grammar, &LanguageName) -> Result<Grammar>,
+    ) -> Result<Self> {
+        if let Some(grammar_arc) = self.grammar.take() {
+            let grammar = Arc::try_unwrap(grammar_arc)
+                .map_err(|_| anyhow::anyhow!("cannot mutate grammar"))?;
+            self.grammar = Some(Arc::new(build(grammar, &self.config.name)?));
         }
         Ok(self)
     }
 
-    fn expect_grammar(&self) -> Result<&Grammar> {
-        self.grammar
-            .as_ref()
-            .map(|grammar| grammar.as_ref())
-            .context("no grammar for language")
-    }
-
-    fn grammar_mut(&mut self) -> Result<&mut Grammar> {
-        Arc::get_mut(self.grammar.as_mut().context("no grammar for language")?)
-            .context("cannot mutate grammar")
-    }
-
     pub fn name(&self) -> LanguageName {
         self.config.name.clone()
     }

crates/language/src/language_registry.rs 🔗

@@ -5,6 +5,10 @@ use crate::{
 };
 use anyhow::{Context as _, Result, anyhow};
 use collections::{FxHashMap, HashMap, HashSet, hash_map};
+pub use language_core::{
+    BinaryStatus, LanguageName, LanguageQueries, LanguageServerStatusUpdate,
+    QUERY_FILENAME_PREFIXES, ServerHealth,
+};
 use settings::{AllLanguageSettingsContent, LanguageSettingsContent};
 
 use futures::{
@@ -12,15 +16,13 @@ use futures::{
     channel::{mpsc, oneshot},
 };
 use globset::GlobSet;
-use gpui::{App, BackgroundExecutor, SharedString};
+use gpui::{App, BackgroundExecutor};
 use lsp::LanguageServerId;
 use parking_lot::{Mutex, RwLock};
 use postage::watch;
-use schemars::JsonSchema;
-use serde::{Deserialize, Serialize};
+
 use smallvec::SmallVec;
 use std::{
-    borrow::{Borrow, Cow},
     cell::LazyCell,
     ffi::OsStr,
     ops::Not,
@@ -33,91 +35,6 @@ use theme::Theme;
 use unicase::UniCase;
 use util::{ResultExt, maybe, post_inc};
 
-#[derive(
-    Debug, Clone, Hash, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize, JsonSchema,
-)]
-pub struct LanguageName(pub SharedString);
-
-impl LanguageName {
-    pub fn new(s: &str) -> Self {
-        Self(SharedString::new(s))
-    }
-
-    pub fn new_static(s: &'static str) -> Self {
-        Self(SharedString::new_static(s))
-    }
-
-    pub fn from_proto(s: String) -> Self {
-        Self(SharedString::from(s))
-    }
-
-    pub fn to_proto(&self) -> String {
-        self.0.to_string()
-    }
-
-    pub fn lsp_id(&self) -> String {
-        match self.0.as_ref() {
-            "Plain Text" => "plaintext".to_string(),
-            language_name => language_name.to_lowercase(),
-        }
-    }
-}
-
-impl From<LanguageName> for SharedString {
-    fn from(value: LanguageName) -> Self {
-        value.0
-    }
-}
-
-impl From<SharedString> for LanguageName {
-    fn from(value: SharedString) -> Self {
-        LanguageName(value)
-    }
-}
-
-impl AsRef<str> for LanguageName {
-    fn as_ref(&self) -> &str {
-        self.0.as_ref()
-    }
-}
-
-impl Borrow<str> for LanguageName {
-    fn borrow(&self) -> &str {
-        self.0.as_ref()
-    }
-}
-
-impl PartialEq<str> for LanguageName {
-    fn eq(&self, other: &str) -> bool {
-        self.0.as_ref() == other
-    }
-}
-
-impl PartialEq<&str> for LanguageName {
-    fn eq(&self, other: &&str) -> bool {
-        self.0.as_ref() == *other
-    }
-}
-
-impl std::fmt::Display for LanguageName {
-    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
-        write!(f, "{}", self.0)
-    }
-}
-
-impl From<&'static str> for LanguageName {
-    fn from(str: &'static str) -> Self {
-        Self(SharedString::new_static(str))
-    }
-}
-
-impl From<LanguageName> for String {
-    fn from(value: LanguageName) -> Self {
-        let value: &str = &value.0;
-        Self::from(value)
-    }
-}
-
 pub struct LanguageRegistry {
     state: RwLock<LanguageRegistryState>,
     language_server_download_dir: Option<Arc<Path>>,
@@ -153,31 +70,6 @@ pub struct FakeLanguageServerEntry {
     pub _server: Option<lsp::FakeLanguageServer>,
 }
 
-#[derive(Clone, Debug, PartialEq, Eq)]
-pub enum LanguageServerStatusUpdate {
-    Binary(BinaryStatus),
-    Health(ServerHealth, Option<SharedString>),
-}
-
-#[derive(Debug, PartialEq, Eq, Deserialize, Serialize, Clone, Copy)]
-#[serde(rename_all = "camelCase")]
-pub enum ServerHealth {
-    Ok,
-    Warning,
-    Error,
-}
-
-#[derive(Clone, Debug, PartialEq, Eq)]
-pub enum BinaryStatus {
-    None,
-    CheckingForUpdate,
-    Downloading,
-    Starting,
-    Stopping,
-    Stopped,
-    Failed { error: String },
-}
-
 #[derive(Clone)]
 pub struct AvailableLanguage {
     id: LanguageId,
@@ -232,39 +124,6 @@ impl std::fmt::Display for LanguageNotFound {
     }
 }
 
-pub const QUERY_FILENAME_PREFIXES: &[(
-    &str,
-    fn(&mut LanguageQueries) -> &mut Option<Cow<'static, str>>,
-)] = &[
-    ("highlights", |q| &mut q.highlights),
-    ("brackets", |q| &mut q.brackets),
-    ("outline", |q| &mut q.outline),
-    ("indents", |q| &mut q.indents),
-    ("injections", |q| &mut q.injections),
-    ("overrides", |q| &mut q.overrides),
-    ("redactions", |q| &mut q.redactions),
-    ("runnables", |q| &mut q.runnables),
-    ("debugger", |q| &mut q.debugger),
-    ("textobjects", |q| &mut q.text_objects),
-    ("imports", |q| &mut q.imports),
-];
-
-/// Tree-sitter language queries for a given language.
-#[derive(Debug, Default)]
-pub struct LanguageQueries {
-    pub highlights: Option<Cow<'static, str>>,
-    pub brackets: Option<Cow<'static, str>>,
-    pub indents: Option<Cow<'static, str>>,
-    pub outline: Option<Cow<'static, str>>,
-    pub injections: Option<Cow<'static, str>>,
-    pub overrides: Option<Cow<'static, str>>,
-    pub redactions: Option<Cow<'static, str>>,
-    pub runnables: Option<Cow<'static, str>>,
-    pub text_objects: Option<Cow<'static, str>>,
-    pub debugger: Option<Cow<'static, str>>,
-    pub imports: Option<Cow<'static, str>>,
-}
-
 #[derive(Clone, Default)]
 struct ServerStatusSender {
     txs: Arc<Mutex<Vec<mpsc::UnboundedSender<(LanguageServerName, BinaryStatus)>>>>,
@@ -1261,7 +1120,7 @@ impl LanguageRegistryState {
             LanguageSettingsContent {
                 tab_size: language.config.tab_size,
                 hard_tabs: language.config.hard_tabs,
-                soft_wrap: language.config.soft_wrap,
+                soft_wrap: language.config.soft_wrap.map(crate::to_settings_soft_wrap),
                 auto_indent_on_paste: language.config.auto_indent_on_paste,
                 ..Default::default()
             },

crates/language/src/manifest.rs 🔗

@@ -1,43 +1,12 @@
-use std::{borrow::Borrow, sync::Arc};
+use std::sync::Arc;
 
-use gpui::SharedString;
 use settings::WorktreeId;
 use util::rel_path::RelPath;
 
-#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
-pub struct ManifestName(SharedString);
+// Re-export ManifestName from language_core.
+pub use language_core::ManifestName;
 
-impl Borrow<SharedString> for ManifestName {
-    fn borrow(&self) -> &SharedString {
-        &self.0
-    }
-}
-
-impl Borrow<str> for ManifestName {
-    fn borrow(&self) -> &str {
-        &self.0
-    }
-}
-
-impl From<SharedString> for ManifestName {
-    fn from(value: SharedString) -> Self {
-        Self(value)
-    }
-}
-
-impl From<ManifestName> for SharedString {
-    fn from(value: ManifestName) -> Self {
-        value.0
-    }
-}
-
-impl AsRef<SharedString> for ManifestName {
-    fn as_ref(&self) -> &SharedString {
-        &self.0
-    }
-}
-
-/// Represents a manifest query; given a path to a file, [ManifestSearcher] is tasked with finding a path to the directory containing the manifest for that file.
+/// Represents a manifest query; given a path to a file, the manifest provider is tasked with finding a path to the directory containing the manifest for that file.
 ///
 /// Since parts of the path might have already been explored, there's an additional `depth` parameter that indicates to what ancestry level a given path should be explored.
 /// For example, given a path like `foo/bar/baz`, a depth of 2 would explore `foo/bar/baz` and `foo/bar`, but not `foo`.

crates/language/src/syntax_map.rs 🔗

@@ -1121,7 +1121,7 @@ impl<'a> SyntaxMapCaptures<'a> {
             let grammar_index = result
                 .grammars
                 .iter()
-                .position(|g| g.id == grammar.id())
+                .position(|g| g.id() == grammar.id())
                 .unwrap_or_else(|| {
                     result.grammars.push(grammar);
                     result.grammars.len() - 1
@@ -1265,7 +1265,7 @@ impl<'a> SyntaxMapMatches<'a> {
             let grammar_index = result
                 .grammars
                 .iter()
-                .position(|g| g.id == grammar.id())
+                .position(|g| g.id() == grammar.id())
                 .unwrap_or_else(|| {
                     result.grammars.push(grammar);
                     result.grammars.len() - 1

crates/language/src/syntax_map/syntax_map_tests.rs 🔗

@@ -1492,7 +1492,7 @@ fn python_lang() -> Language {
     )
     .with_queries(LanguageQueries {
         injections: Some(Cow::from(include_str!(
-            "../../../languages/src/python/injections.scm"
+            "../../../grammars/src/python/injections.scm"
         ))),
         ..Default::default()
     })

crates/language/src/toolchain.rs 🔗

@@ -4,95 +4,21 @@
 //! which is a set of tools used to interact with the projects written in said language.
 //! For example, a Python project can have an associated virtual environment; a Rust project can have a toolchain override.
 
-use std::{
-    path::{Path, PathBuf},
-    sync::Arc,
-};
+use std::{path::PathBuf, sync::Arc};
 
 use async_trait::async_trait;
 use collections::HashMap;
-use fs::Fs;
+
 use futures::future::BoxFuture;
-use gpui::{App, AsyncApp, SharedString};
+use gpui::{App, AsyncApp};
 use settings::WorktreeId;
 use task::ShellKind;
 use util::rel_path::RelPath;
 
-use crate::{LanguageName, ManifestName};
-
-/// Represents a single toolchain.
-#[derive(Clone, Eq, Debug)]
-pub struct Toolchain {
-    /// User-facing label
-    pub name: SharedString,
-    /// Absolute path
-    pub path: SharedString,
-    pub language_name: LanguageName,
-    /// Full toolchain data (including language-specific details)
-    pub as_json: serde_json::Value,
-}
-
-/// Declares a scope of a toolchain added by user.
-///
-/// When the user adds a toolchain, we give them an option to see that toolchain in:
-/// - All of their projects
-/// - A project they're currently in.
-/// - Only in the subproject they're currently in.
-#[derive(Clone, Debug, Eq, PartialEq, Ord, PartialOrd)]
-pub enum ToolchainScope {
-    Subproject(Arc<Path>, Arc<RelPath>),
-    Project,
-    /// Available in all projects on this box. It wouldn't make sense to show suggestions across machines.
-    Global,
-}
-
-impl ToolchainScope {
-    pub fn label(&self) -> &'static str {
-        match self {
-            ToolchainScope::Subproject(_, _) => "Subproject",
-            ToolchainScope::Project => "Project",
-            ToolchainScope::Global => "Global",
-        }
-    }
-
-    pub fn description(&self) -> &'static str {
-        match self {
-            ToolchainScope::Subproject(_, _) => {
-                "Available only in the subproject you're currently in."
-            }
-            ToolchainScope::Project => "Available in all locations in your current project.",
-            ToolchainScope::Global => "Available in all of your projects on this machine.",
-        }
-    }
-}
-
-impl std::hash::Hash for Toolchain {
-    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
-        let Self {
-            name,
-            path,
-            language_name,
-            as_json: _,
-        } = self;
-        name.hash(state);
-        path.hash(state);
-        language_name.hash(state);
-    }
-}
+use crate::LanguageName;
 
-impl PartialEq for Toolchain {
-    fn eq(&self, other: &Self) -> bool {
-        let Self {
-            name,
-            path,
-            language_name,
-            as_json: _,
-        } = self;
-        // Do not use as_json for comparisons; it shouldn't impact equality, as it's not user-surfaced.
-        // Thus, there could be multiple entries that look the same in the UI.
-        (name, path, language_name).eq(&(&other.name, &other.path, &other.language_name))
-    }
-}
+// Re-export core data types from language_core.
+pub use language_core::{Toolchain, ToolchainList, ToolchainMetadata, ToolchainScope};
 
 #[async_trait]
 pub trait ToolchainLister: Send + Sync + 'static {
@@ -102,7 +28,6 @@ pub trait ToolchainLister: Send + Sync + 'static {
         worktree_root: PathBuf,
         subroot_relative_path: Arc<RelPath>,
         project_env: Option<HashMap<String, String>>,
-        fs: &dyn Fs,
     ) -> ToolchainList;
 
     /// Given a user-created toolchain, resolve lister-specific details.
@@ -111,7 +36,6 @@ pub trait ToolchainLister: Send + Sync + 'static {
         &self,
         path: PathBuf,
         project_env: Option<HashMap<String, String>>,
-        fs: &dyn Fs,
     ) -> anyhow::Result<Toolchain>;
 
     fn activation_script(
@@ -125,16 +49,6 @@ pub trait ToolchainLister: Send + Sync + 'static {
     fn meta(&self) -> ToolchainMetadata;
 }
 
-#[derive(Clone, PartialEq, Eq, Hash)]
-pub struct ToolchainMetadata {
-    /// Returns a term which we should use in UI to refer to toolchains produced by a given `[ToolchainLister]`.
-    pub term: SharedString,
-    /// A user-facing placeholder describing the semantic meaning of a path to a new toolchain.
-    pub new_toolchain_placeholder: SharedString,
-    /// The name of the manifest file for this toolchain.
-    pub manifest_name: ManifestName,
-}
-
 #[async_trait(?Send)]
 pub trait LanguageToolchainStore: Send + Sync + 'static {
     async fn active_toolchain(
@@ -168,31 +82,3 @@ impl<T: LocalLanguageToolchainStore> LanguageToolchainStore for T {
         self.active_toolchain(worktree_id, &relative_path, language_name, cx)
     }
 }
-
-type DefaultIndex = usize;
-#[derive(Default, Clone, Debug)]
-pub struct ToolchainList {
-    pub toolchains: Vec<Toolchain>,
-    pub default: Option<DefaultIndex>,
-    pub groups: Box<[(usize, SharedString)]>,
-}
-
-impl ToolchainList {
-    pub fn toolchains(&self) -> &[Toolchain] {
-        &self.toolchains
-    }
-    pub fn default_toolchain(&self) -> Option<Toolchain> {
-        self.default.and_then(|ix| self.toolchains.get(ix)).cloned()
-    }
-    pub fn group_for_index(&self, index: usize) -> Option<(usize, SharedString)> {
-        if index >= self.toolchains.len() {
-            return None;
-        }
-        let first_equal_or_greater = self
-            .groups
-            .partition_point(|(group_lower_bound, _)| group_lower_bound <= &index);
-        self.groups
-            .get(first_equal_or_greater.checked_sub(1)?)
-            .cloned()
-    }
-}

crates/language_core/Cargo.toml 🔗

@@ -0,0 +1,29 @@
+[package]
+name = "language_core"
+version = "0.1.0"
+edition = "2024"
+publish = false
+
+[lib]
+path = "src/language_core.rs"
+
+[dependencies]
+anyhow.workspace = true
+collections.workspace = true
+gpui.workspace = true
+log.workspace = true
+lsp.workspace = true
+parking_lot.workspace = true
+regex.workspace = true
+schemars.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+toml.workspace = true
+tree-sitter.workspace = true
+util.workspace = true
+
+[dev-dependencies]
+gpui = { workspace = true, features = ["test-support"] }
+
+[features]
+test-support = []

crates/language_core/src/code_label.rs 🔗

@@ -0,0 +1,122 @@
+use crate::highlight_map::HighlightId;
+use std::ops::Range;
+
+#[derive(Debug, Clone)]
+pub struct Symbol {
+    pub name: String,
+    pub kind: lsp::SymbolKind,
+    pub container_name: Option<String>,
+}
+
+#[derive(Clone, Debug, Default, PartialEq, Eq)]
+pub struct CodeLabel {
+    /// The text to display.
+    pub text: String,
+    /// Syntax highlighting runs.
+    pub runs: Vec<(Range<usize>, HighlightId)>,
+    /// The portion of the text that should be used in fuzzy filtering.
+    pub filter_range: Range<usize>,
+}
+
+#[derive(Clone, Debug, Default, PartialEq, Eq)]
+pub struct CodeLabelBuilder {
+    /// The text to display.
+    text: String,
+    /// Syntax highlighting runs.
+    runs: Vec<(Range<usize>, HighlightId)>,
+    /// The portion of the text that should be used in fuzzy filtering.
+    filter_range: Range<usize>,
+}
+
+impl CodeLabel {
+    pub fn plain(text: String, filter_text: Option<&str>) -> Self {
+        Self::filtered(text.clone(), text.len(), filter_text, Vec::new())
+    }
+
+    pub fn filtered(
+        text: String,
+        label_len: usize,
+        filter_text: Option<&str>,
+        runs: Vec<(Range<usize>, HighlightId)>,
+    ) -> Self {
+        assert!(label_len <= text.len());
+        let filter_range = filter_text
+            .and_then(|filter| text.find(filter).map(|index| index..index + filter.len()))
+            .unwrap_or(0..label_len);
+        Self::new(text, filter_range, runs)
+    }
+
+    pub fn new(
+        text: String,
+        filter_range: Range<usize>,
+        runs: Vec<(Range<usize>, HighlightId)>,
+    ) -> Self {
+        assert!(
+            text.get(filter_range.clone()).is_some(),
+            "invalid filter range"
+        );
+        runs.iter().for_each(|(range, _)| {
+            assert!(
+                text.get(range.clone()).is_some(),
+                "invalid run range with inputs. Requested range {range:?} in text '{text}'",
+            );
+        });
+        Self {
+            runs,
+            filter_range,
+            text,
+        }
+    }
+
+    pub fn text(&self) -> &str {
+        self.text.as_str()
+    }
+
+    pub fn filter_text(&self) -> &str {
+        &self.text[self.filter_range.clone()]
+    }
+}
+
+impl From<String> for CodeLabel {
+    fn from(value: String) -> Self {
+        Self::plain(value, None)
+    }
+}
+
+impl From<&str> for CodeLabel {
+    fn from(value: &str) -> Self {
+        Self::plain(value.to_string(), None)
+    }
+}
+
+impl CodeLabelBuilder {
+    pub fn respan_filter_range(&mut self, filter_text: Option<&str>) {
+        self.filter_range = filter_text
+            .and_then(|filter| {
+                self.text
+                    .find(filter)
+                    .map(|index| index..index + filter.len())
+            })
+            .unwrap_or(0..self.text.len());
+    }
+
+    pub fn push_str(&mut self, text: &str, highlight: Option<HighlightId>) {
+        let start_index = self.text.len();
+        self.text.push_str(text);
+        if let Some(highlight) = highlight {
+            let end_index = self.text.len();
+            self.runs.push((start_index..end_index, highlight));
+        }
+    }
+
+    pub fn build(mut self) -> CodeLabel {
+        if self.filter_range.end == 0 {
+            self.respan_filter_range(None);
+        }
+        CodeLabel {
+            text: self.text,
+            runs: self.runs,
+            filter_range: self.filter_range,
+        }
+    }
+}

crates/language_core/src/diagnostic.rs 🔗

@@ -0,0 +1,76 @@
+use gpui::SharedString;
+use lsp::{DiagnosticSeverity, NumberOrString};
+use serde::{Deserialize, Serialize};
+use serde_json::Value;
+
+/// A diagnostic associated with a certain range of a buffer.
+#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
+pub struct Diagnostic {
+    /// The name of the service that produced this diagnostic.
+    pub source: Option<String>,
+    /// The ID provided by the dynamic registration that produced this diagnostic.
+    pub registration_id: Option<SharedString>,
+    /// A machine-readable code that identifies this diagnostic.
+    pub code: Option<NumberOrString>,
+    pub code_description: Option<lsp::Uri>,
+    /// Whether this diagnostic is a hint, warning, or error.
+    pub severity: DiagnosticSeverity,
+    /// The human-readable message associated with this diagnostic.
+    pub message: String,
+    /// The human-readable message (in markdown format)
+    pub markdown: Option<String>,
+    /// An id that identifies the group to which this diagnostic belongs.
+    ///
+    /// When a language server produces a diagnostic with
+    /// one or more associated diagnostics, those diagnostics are all
+    /// assigned a single group ID.
+    pub group_id: usize,
+    /// Whether this diagnostic is the primary diagnostic for its group.
+    ///
+    /// In a given group, the primary diagnostic is the top-level diagnostic
+    /// returned by the language server. The non-primary diagnostics are the
+    /// associated diagnostics.
+    pub is_primary: bool,
+    /// Whether this diagnostic is considered to originate from an analysis of
+    /// files on disk, as opposed to any unsaved buffer contents. This is a
+    /// property of a given diagnostic source, and is configured for a given
+    /// language server via the `LspAdapter::disk_based_diagnostic_sources` method
+    /// for the language server.
+    pub is_disk_based: bool,
+    /// Whether this diagnostic marks unnecessary code.
+    pub is_unnecessary: bool,
+    /// Quick separation of diagnostics groups based by their source.
+    pub source_kind: DiagnosticSourceKind,
+    /// Data from language server that produced this diagnostic. Passed back to the LS when we request code actions for this diagnostic.
+    pub data: Option<Value>,
+    /// Whether to underline the corresponding text range in the editor.
+    pub underline: bool,
+}
+
+#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
+pub enum DiagnosticSourceKind {
+    Pulled,
+    Pushed,
+    Other,
+}
+
+impl Default for Diagnostic {
+    fn default() -> Self {
+        Self {
+            source: Default::default(),
+            source_kind: DiagnosticSourceKind::Other,
+            code: None,
+            code_description: None,
+            severity: DiagnosticSeverity::ERROR,
+            message: Default::default(),
+            markdown: None,
+            group_id: 0,
+            is_primary: false,
+            is_disk_based: false,
+            is_unnecessary: false,
+            underline: true,
+            data: None,
+            registration_id: None,
+        }
+    }
+}

crates/language_core/src/grammar.rs 🔗

@@ -0,0 +1,821 @@
+use crate::{
+    HighlightId, HighlightMap, LanguageConfig, LanguageConfigOverride, LanguageName,
+    LanguageQueries, language_config::BracketPairConfig,
+};
+use anyhow::{Context as _, Result};
+use collections::HashMap;
+use gpui::SharedString;
+use lsp::LanguageServerName;
+use parking_lot::Mutex;
+use std::sync::atomic::{AtomicUsize, Ordering::SeqCst};
+use tree_sitter::Query;
+
+pub static NEXT_GRAMMAR_ID: AtomicUsize = AtomicUsize::new(0);
+
+#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
+pub struct GrammarId(pub usize);
+
+impl GrammarId {
+    pub fn new() -> Self {
+        Self(NEXT_GRAMMAR_ID.fetch_add(1, SeqCst))
+    }
+}
+
+impl Default for GrammarId {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+pub struct Grammar {
+    id: GrammarId,
+    pub ts_language: tree_sitter::Language,
+    pub error_query: Option<Query>,
+    pub highlights_config: Option<HighlightsConfig>,
+    pub brackets_config: Option<BracketsConfig>,
+    pub redactions_config: Option<RedactionConfig>,
+    pub runnable_config: Option<RunnableConfig>,
+    pub indents_config: Option<IndentConfig>,
+    pub outline_config: Option<OutlineConfig>,
+    pub text_object_config: Option<TextObjectConfig>,
+    pub injection_config: Option<InjectionConfig>,
+    pub override_config: Option<OverrideConfig>,
+    pub debug_variables_config: Option<DebugVariablesConfig>,
+    pub imports_config: Option<ImportsConfig>,
+    pub highlight_map: Mutex<HighlightMap>,
+}
+
+pub struct HighlightsConfig {
+    pub query: Query,
+    pub identifier_capture_indices: Vec<u32>,
+}
+
+pub struct IndentConfig {
+    pub query: Query,
+    pub indent_capture_ix: u32,
+    pub start_capture_ix: Option<u32>,
+    pub end_capture_ix: Option<u32>,
+    pub outdent_capture_ix: Option<u32>,
+    pub suffixed_start_captures: HashMap<u32, SharedString>,
+}
+
+pub struct OutlineConfig {
+    pub query: Query,
+    pub item_capture_ix: u32,
+    pub name_capture_ix: u32,
+    pub context_capture_ix: Option<u32>,
+    pub extra_context_capture_ix: Option<u32>,
+    pub open_capture_ix: Option<u32>,
+    pub close_capture_ix: Option<u32>,
+    pub annotation_capture_ix: Option<u32>,
+}
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum DebuggerTextObject {
+    Variable,
+    Scope,
+}
+
+impl DebuggerTextObject {
+    pub fn from_capture_name(name: &str) -> Option<DebuggerTextObject> {
+        match name {
+            "debug-variable" => Some(DebuggerTextObject::Variable),
+            "debug-scope" => Some(DebuggerTextObject::Scope),
+            _ => None,
+        }
+    }
+}
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum TextObject {
+    InsideFunction,
+    AroundFunction,
+    InsideClass,
+    AroundClass,
+    InsideComment,
+    AroundComment,
+}
+
+impl TextObject {
+    pub fn from_capture_name(name: &str) -> Option<TextObject> {
+        match name {
+            "function.inside" => Some(TextObject::InsideFunction),
+            "function.around" => Some(TextObject::AroundFunction),
+            "class.inside" => Some(TextObject::InsideClass),
+            "class.around" => Some(TextObject::AroundClass),
+            "comment.inside" => Some(TextObject::InsideComment),
+            "comment.around" => Some(TextObject::AroundComment),
+            _ => None,
+        }
+    }
+
+    pub fn around(&self) -> Option<Self> {
+        match self {
+            TextObject::InsideFunction => Some(TextObject::AroundFunction),
+            TextObject::InsideClass => Some(TextObject::AroundClass),
+            TextObject::InsideComment => Some(TextObject::AroundComment),
+            _ => None,
+        }
+    }
+}
+
+pub struct TextObjectConfig {
+    pub query: Query,
+    pub text_objects_by_capture_ix: Vec<(u32, TextObject)>,
+}
+
+pub struct InjectionConfig {
+    pub query: Query,
+    pub content_capture_ix: u32,
+    pub language_capture_ix: Option<u32>,
+    pub patterns: Vec<InjectionPatternConfig>,
+}
+
+pub struct RedactionConfig {
+    pub query: Query,
+    pub redaction_capture_ix: u32,
+}
+
+#[derive(Clone, Debug, PartialEq)]
+pub enum RunnableCapture {
+    Named(SharedString),
+    Run,
+}
+
+pub struct RunnableConfig {
+    pub query: Query,
+    /// A mapping from capture index to capture kind
+    pub extra_captures: Vec<RunnableCapture>,
+}
+
+pub struct OverrideConfig {
+    pub query: Query,
+    pub values: HashMap<u32, OverrideEntry>,
+}
+
+#[derive(Debug)]
+pub struct OverrideEntry {
+    pub name: String,
+    pub range_is_inclusive: bool,
+    pub value: LanguageConfigOverride,
+}
+
+#[derive(Default, Clone)]
+pub struct InjectionPatternConfig {
+    pub language: Option<Box<str>>,
+    pub combined: bool,
+}
+
+#[derive(Debug)]
+pub struct BracketsConfig {
+    pub query: Query,
+    pub open_capture_ix: u32,
+    pub close_capture_ix: u32,
+    pub patterns: Vec<BracketsPatternConfig>,
+}
+
+#[derive(Clone, Debug, Default)]
+pub struct BracketsPatternConfig {
+    pub newline_only: bool,
+    pub rainbow_exclude: bool,
+}
+
+pub struct DebugVariablesConfig {
+    pub query: Query,
+    pub objects_by_capture_ix: Vec<(u32, DebuggerTextObject)>,
+}
+
+pub struct ImportsConfig {
+    pub query: Query,
+    pub import_ix: u32,
+    pub name_ix: Option<u32>,
+    pub namespace_ix: Option<u32>,
+    pub source_ix: Option<u32>,
+    pub list_ix: Option<u32>,
+    pub wildcard_ix: Option<u32>,
+    pub alias_ix: Option<u32>,
+}
+
+enum Capture<'a> {
+    Required(&'static str, &'a mut u32),
+    Optional(&'static str, &'a mut Option<u32>),
+}
+
+fn populate_capture_indices(
+    query: &Query,
+    language_name: &LanguageName,
+    query_type: &str,
+    expected_prefixes: &[&str],
+    captures: &mut [Capture<'_>],
+) -> bool {
+    let mut found_required_indices = Vec::new();
+    'outer: for (ix, name) in query.capture_names().iter().enumerate() {
+        for (required_ix, capture) in captures.iter_mut().enumerate() {
+            match capture {
+                Capture::Required(capture_name, index) if capture_name == name => {
+                    **index = ix as u32;
+                    found_required_indices.push(required_ix);
+                    continue 'outer;
+                }
+                Capture::Optional(capture_name, index) if capture_name == name => {
+                    **index = Some(ix as u32);
+                    continue 'outer;
+                }
+                _ => {}
+            }
+        }
+        if !name.starts_with("_")
+            && !expected_prefixes
+                .iter()
+                .any(|&prefix| name.starts_with(prefix))
+        {
+            log::warn!(
+                "unrecognized capture name '{}' in {} {} TreeSitter query \
+                (suppress this warning by prefixing with '_')",
+                name,
+                language_name,
+                query_type
+            );
+        }
+    }
+    let mut missing_required_captures = Vec::new();
+    for (capture_ix, capture) in captures.iter().enumerate() {
+        if let Capture::Required(capture_name, _) = capture
+            && !found_required_indices.contains(&capture_ix)
+        {
+            missing_required_captures.push(*capture_name);
+        }
+    }
+    let success = missing_required_captures.is_empty();
+    if !success {
+        log::error!(
+            "missing required capture(s) in {} {} TreeSitter query: {}",
+            language_name,
+            query_type,
+            missing_required_captures.join(", ")
+        );
+    }
+    success
+}
+
+impl Grammar {
+    pub fn new(ts_language: tree_sitter::Language) -> Self {
+        Self {
+            id: GrammarId::new(),
+            highlights_config: None,
+            brackets_config: None,
+            outline_config: None,
+            text_object_config: None,
+            indents_config: None,
+            injection_config: None,
+            override_config: None,
+            redactions_config: None,
+            runnable_config: None,
+            error_query: Query::new(&ts_language, "(ERROR) @error").ok(),
+            debug_variables_config: None,
+            imports_config: None,
+            ts_language,
+            highlight_map: Default::default(),
+        }
+    }
+
+    pub fn id(&self) -> GrammarId {
+        self.id
+    }
+
+    pub fn highlight_map(&self) -> HighlightMap {
+        self.highlight_map.lock().clone()
+    }
+
+    pub fn highlight_id_for_name(&self, name: &str) -> Option<HighlightId> {
+        let capture_id = self
+            .highlights_config
+            .as_ref()?
+            .query
+            .capture_index_for_name(name)?;
+        Some(self.highlight_map.lock().get(capture_id))
+    }
+
+    pub fn debug_variables_config(&self) -> Option<&DebugVariablesConfig> {
+        self.debug_variables_config.as_ref()
+    }
+
+    pub fn imports_config(&self) -> Option<&ImportsConfig> {
+        self.imports_config.as_ref()
+    }
+
+    /// Load all queries from `LanguageQueries` into this grammar, mutating the
+    /// associated `LanguageConfig` (the override query clears
+    /// `brackets.disabled_scopes_by_bracket_ix`).
+    pub fn with_queries(
+        mut self,
+        queries: LanguageQueries,
+        config: &mut LanguageConfig,
+    ) -> Result<Self> {
+        let name = &config.name;
+        if let Some(query) = queries.highlights {
+            self = self
+                .with_highlights_query(query.as_ref())
+                .context("Error loading highlights query")?;
+        }
+        if let Some(query) = queries.brackets {
+            self = self
+                .with_brackets_query(query.as_ref(), name)
+                .context("Error loading brackets query")?;
+        }
+        if let Some(query) = queries.indents {
+            self = self
+                .with_indents_query(query.as_ref(), name)
+                .context("Error loading indents query")?;
+        }
+        if let Some(query) = queries.outline {
+            self = self
+                .with_outline_query(query.as_ref(), name)
+                .context("Error loading outline query")?;
+        }
+        if let Some(query) = queries.injections {
+            self = self
+                .with_injection_query(query.as_ref(), name)
+                .context("Error loading injection query")?;
+        }
+        if let Some(query) = queries.overrides {
+            self = self
+                .with_override_query(
+                    query.as_ref(),
+                    name,
+                    &config.overrides,
+                    &mut config.brackets,
+                    &config.scope_opt_in_language_servers,
+                )
+                .context("Error loading override query")?;
+        }
+        if let Some(query) = queries.redactions {
+            self = self
+                .with_redaction_query(query.as_ref(), name)
+                .context("Error loading redaction query")?;
+        }
+        if let Some(query) = queries.runnables {
+            self = self
+                .with_runnable_query(query.as_ref())
+                .context("Error loading runnables query")?;
+        }
+        if let Some(query) = queries.text_objects {
+            self = self
+                .with_text_object_query(query.as_ref(), name)
+                .context("Error loading textobject query")?;
+        }
+        if let Some(query) = queries.debugger {
+            self = self
+                .with_debug_variables_query(query.as_ref(), name)
+                .context("Error loading debug variables query")?;
+        }
+        if let Some(query) = queries.imports {
+            self = self
+                .with_imports_query(query.as_ref(), name)
+                .context("Error loading imports query")?;
+        }
+        Ok(self)
+    }
+
+    pub fn with_highlights_query(mut self, source: &str) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+
+        let mut identifier_capture_indices = Vec::new();
+        for name in [
+            "variable",
+            "constant",
+            "constructor",
+            "function",
+            "function.method",
+            "function.method.call",
+            "function.special",
+            "property",
+            "type",
+            "type.interface",
+        ] {
+            identifier_capture_indices.extend(query.capture_index_for_name(name));
+        }
+
+        self.highlights_config = Some(HighlightsConfig {
+            query,
+            identifier_capture_indices,
+        });
+
+        Ok(self)
+    }
+
+    pub fn with_runnable_query(mut self, source: &str) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+        let extra_captures: Vec<_> = query
+            .capture_names()
+            .iter()
+            .map(|&name| match name {
+                "run" => RunnableCapture::Run,
+                name => RunnableCapture::Named(name.to_string().into()),
+            })
+            .collect();
+
+        self.runnable_config = Some(RunnableConfig {
+            extra_captures,
+            query,
+        });
+
+        Ok(self)
+    }
+
+    pub fn with_outline_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+        let mut item_capture_ix = 0;
+        let mut name_capture_ix = 0;
+        let mut context_capture_ix = None;
+        let mut extra_context_capture_ix = None;
+        let mut open_capture_ix = None;
+        let mut close_capture_ix = None;
+        let mut annotation_capture_ix = None;
+        if populate_capture_indices(
+            &query,
+            language_name,
+            "outline",
+            &[],
+            &mut [
+                Capture::Required("item", &mut item_capture_ix),
+                Capture::Required("name", &mut name_capture_ix),
+                Capture::Optional("context", &mut context_capture_ix),
+                Capture::Optional("context.extra", &mut extra_context_capture_ix),
+                Capture::Optional("open", &mut open_capture_ix),
+                Capture::Optional("close", &mut close_capture_ix),
+                Capture::Optional("annotation", &mut annotation_capture_ix),
+            ],
+        ) {
+            self.outline_config = Some(OutlineConfig {
+                query,
+                item_capture_ix,
+                name_capture_ix,
+                context_capture_ix,
+                extra_context_capture_ix,
+                open_capture_ix,
+                close_capture_ix,
+                annotation_capture_ix,
+            });
+        }
+        Ok(self)
+    }
+
+    pub fn with_text_object_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+
+        let mut text_objects_by_capture_ix = Vec::new();
+        for (ix, name) in query.capture_names().iter().enumerate() {
+            if let Some(text_object) = TextObject::from_capture_name(name) {
+                text_objects_by_capture_ix.push((ix as u32, text_object));
+            } else {
+                log::warn!(
+                    "unrecognized capture name '{}' in {} textobjects TreeSitter query",
+                    name,
+                    language_name,
+                );
+            }
+        }
+
+        self.text_object_config = Some(TextObjectConfig {
+            query,
+            text_objects_by_capture_ix,
+        });
+        Ok(self)
+    }
+
+    pub fn with_debug_variables_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+
+        let mut objects_by_capture_ix = Vec::new();
+        for (ix, name) in query.capture_names().iter().enumerate() {
+            if let Some(text_object) = DebuggerTextObject::from_capture_name(name) {
+                objects_by_capture_ix.push((ix as u32, text_object));
+            } else {
+                log::warn!(
+                    "unrecognized capture name '{}' in {} debugger TreeSitter query",
+                    name,
+                    language_name,
+                );
+            }
+        }
+
+        self.debug_variables_config = Some(DebugVariablesConfig {
+            query,
+            objects_by_capture_ix,
+        });
+        Ok(self)
+    }
+
+    pub fn with_imports_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+
+        let mut import_ix = 0;
+        let mut name_ix = None;
+        let mut namespace_ix = None;
+        let mut source_ix = None;
+        let mut list_ix = None;
+        let mut wildcard_ix = None;
+        let mut alias_ix = None;
+        if populate_capture_indices(
+            &query,
+            language_name,
+            "imports",
+            &[],
+            &mut [
+                Capture::Required("import", &mut import_ix),
+                Capture::Optional("name", &mut name_ix),
+                Capture::Optional("namespace", &mut namespace_ix),
+                Capture::Optional("source", &mut source_ix),
+                Capture::Optional("list", &mut list_ix),
+                Capture::Optional("wildcard", &mut wildcard_ix),
+                Capture::Optional("alias", &mut alias_ix),
+            ],
+        ) {
+            self.imports_config = Some(ImportsConfig {
+                query,
+                import_ix,
+                name_ix,
+                namespace_ix,
+                source_ix,
+                list_ix,
+                wildcard_ix,
+                alias_ix,
+            });
+        }
+        Ok(self)
+    }
+
+    pub fn with_brackets_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+        let mut open_capture_ix = 0;
+        let mut close_capture_ix = 0;
+        if populate_capture_indices(
+            &query,
+            language_name,
+            "brackets",
+            &[],
+            &mut [
+                Capture::Required("open", &mut open_capture_ix),
+                Capture::Required("close", &mut close_capture_ix),
+            ],
+        ) {
+            let patterns = (0..query.pattern_count())
+                .map(|ix| {
+                    let mut config = BracketsPatternConfig::default();
+                    for setting in query.property_settings(ix) {
+                        let setting_key = setting.key.as_ref();
+                        if setting_key == "newline.only" {
+                            config.newline_only = true
+                        }
+                        if setting_key == "rainbow.exclude" {
+                            config.rainbow_exclude = true
+                        }
+                    }
+                    config
+                })
+                .collect();
+            self.brackets_config = Some(BracketsConfig {
+                query,
+                open_capture_ix,
+                close_capture_ix,
+                patterns,
+            });
+        }
+        Ok(self)
+    }
+
+    pub fn with_indents_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+        let mut indent_capture_ix = 0;
+        let mut start_capture_ix = None;
+        let mut end_capture_ix = None;
+        let mut outdent_capture_ix = None;
+        if populate_capture_indices(
+            &query,
+            language_name,
+            "indents",
+            &["start."],
+            &mut [
+                Capture::Required("indent", &mut indent_capture_ix),
+                Capture::Optional("start", &mut start_capture_ix),
+                Capture::Optional("end", &mut end_capture_ix),
+                Capture::Optional("outdent", &mut outdent_capture_ix),
+            ],
+        ) {
+            let mut suffixed_start_captures = HashMap::default();
+            for (ix, name) in query.capture_names().iter().enumerate() {
+                if let Some(suffix) = name.strip_prefix("start.") {
+                    suffixed_start_captures.insert(ix as u32, suffix.to_owned().into());
+                }
+            }
+
+            self.indents_config = Some(IndentConfig {
+                query,
+                indent_capture_ix,
+                start_capture_ix,
+                end_capture_ix,
+                outdent_capture_ix,
+                suffixed_start_captures,
+            });
+        }
+        Ok(self)
+    }
+
+    pub fn with_injection_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+        let mut language_capture_ix = None;
+        let mut injection_language_capture_ix = None;
+        let mut content_capture_ix = None;
+        let mut injection_content_capture_ix = None;
+        if populate_capture_indices(
+            &query,
+            language_name,
+            "injections",
+            &[],
+            &mut [
+                Capture::Optional("language", &mut language_capture_ix),
+                Capture::Optional("injection.language", &mut injection_language_capture_ix),
+                Capture::Optional("content", &mut content_capture_ix),
+                Capture::Optional("injection.content", &mut injection_content_capture_ix),
+            ],
+        ) {
+            language_capture_ix = match (language_capture_ix, injection_language_capture_ix) {
+                (None, Some(ix)) => Some(ix),
+                (Some(_), Some(_)) => {
+                    anyhow::bail!("both language and injection.language captures are present");
+                }
+                _ => language_capture_ix,
+            };
+            content_capture_ix = match (content_capture_ix, injection_content_capture_ix) {
+                (None, Some(ix)) => Some(ix),
+                (Some(_), Some(_)) => {
+                    anyhow::bail!("both content and injection.content captures are present")
+                }
+                _ => content_capture_ix,
+            };
+            let patterns = (0..query.pattern_count())
+                .map(|ix| {
+                    let mut config = InjectionPatternConfig::default();
+                    for setting in query.property_settings(ix) {
+                        match setting.key.as_ref() {
+                            "language" | "injection.language" => {
+                                config.language.clone_from(&setting.value);
+                            }
+                            "combined" | "injection.combined" => {
+                                config.combined = true;
+                            }
+                            _ => {}
+                        }
+                    }
+                    config
+                })
+                .collect();
+            if let Some(content_capture_ix) = content_capture_ix {
+                self.injection_config = Some(InjectionConfig {
+                    query,
+                    language_capture_ix,
+                    content_capture_ix,
+                    patterns,
+                });
+            } else {
+                log::error!(
+                    "missing required capture in injections {} TreeSitter query: \
+                    content or injection.content",
+                    language_name,
+                );
+            }
+        }
+        Ok(self)
+    }
+
+    pub fn with_override_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+        overrides: &HashMap<String, LanguageConfigOverride>,
+        brackets: &mut BracketPairConfig,
+        scope_opt_in_language_servers: &[LanguageServerName],
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+
+        let mut override_configs_by_id = HashMap::default();
+        for (ix, mut name) in query.capture_names().iter().copied().enumerate() {
+            let mut range_is_inclusive = false;
+            if name.starts_with('_') {
+                continue;
+            }
+            if let Some(prefix) = name.strip_suffix(".inclusive") {
+                name = prefix;
+                range_is_inclusive = true;
+            }
+
+            let value = overrides.get(name).cloned().unwrap_or_default();
+            for server_name in &value.opt_into_language_servers {
+                if !scope_opt_in_language_servers.contains(server_name) {
+                    util::debug_panic!(
+                        "Server {server_name:?} has been opted-in by scope {name:?} but has not been marked as an opt-in server"
+                    );
+                }
+            }
+
+            override_configs_by_id.insert(
+                ix as u32,
+                OverrideEntry {
+                    name: name.to_string(),
+                    range_is_inclusive,
+                    value,
+                },
+            );
+        }
+
+        let referenced_override_names = overrides
+            .keys()
+            .chain(brackets.disabled_scopes_by_bracket_ix.iter().flatten());
+
+        for referenced_name in referenced_override_names {
+            if !override_configs_by_id
+                .values()
+                .any(|entry| entry.name == *referenced_name)
+            {
+                anyhow::bail!(
+                    "language {:?} has overrides in config not in query: {referenced_name:?}",
+                    language_name
+                );
+            }
+        }
+
+        for entry in override_configs_by_id.values_mut() {
+            entry.value.disabled_bracket_ixs = brackets
+                .disabled_scopes_by_bracket_ix
+                .iter()
+                .enumerate()
+                .filter_map(|(ix, disabled_scope_names)| {
+                    if disabled_scope_names.contains(&entry.name) {
+                        Some(ix as u16)
+                    } else {
+                        None
+                    }
+                })
+                .collect();
+        }
+
+        brackets.disabled_scopes_by_bracket_ix.clear();
+
+        self.override_config = Some(OverrideConfig {
+            query,
+            values: override_configs_by_id,
+        });
+        Ok(self)
+    }
+
+    pub fn with_redaction_query(
+        mut self,
+        source: &str,
+        language_name: &LanguageName,
+    ) -> Result<Self> {
+        let query = Query::new(&self.ts_language, source)?;
+        let mut redaction_capture_ix = 0;
+        if populate_capture_indices(
+            &query,
+            language_name,
+            "redactions",
+            &[],
+            &mut [Capture::Required("redact", &mut redaction_capture_ix)],
+        ) {
+            self.redactions_config = Some(RedactionConfig {
+                query,
+                redaction_capture_ix,
+            });
+        }
+        Ok(self)
+    }
+}

crates/language_core/src/highlight_map.rs 🔗

@@ -0,0 +1,52 @@
+use std::sync::Arc;
+
+#[derive(Clone, Debug)]
+pub struct HighlightMap(Arc<[HighlightId]>);
+
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+pub struct HighlightId(pub u32);
+
+const DEFAULT_SYNTAX_HIGHLIGHT_ID: HighlightId = HighlightId(u32::MAX);
+
+impl HighlightMap {
+    #[inline]
+    pub fn from_ids(highlight_ids: impl IntoIterator<Item = HighlightId>) -> Self {
+        Self(highlight_ids.into_iter().collect())
+    }
+
+    #[inline]
+    pub fn get(&self, capture_id: u32) -> HighlightId {
+        self.0
+            .get(capture_id as usize)
+            .copied()
+            .unwrap_or(DEFAULT_SYNTAX_HIGHLIGHT_ID)
+    }
+}
+
+impl HighlightId {
+    pub const TABSTOP_INSERT_ID: HighlightId = HighlightId(u32::MAX - 1);
+    pub const TABSTOP_REPLACE_ID: HighlightId = HighlightId(u32::MAX - 2);
+
+    #[inline]
+    pub fn is_default(&self) -> bool {
+        *self == DEFAULT_SYNTAX_HIGHLIGHT_ID
+    }
+}
+
+impl Default for HighlightMap {
+    fn default() -> Self {
+        Self(Arc::new([]))
+    }
+}
+
+impl Default for HighlightId {
+    fn default() -> Self {
+        DEFAULT_SYNTAX_HIGHLIGHT_ID
+    }
+}
+
+impl From<HighlightId> for usize {
+    fn from(value: HighlightId) -> Self {
+        value.0 as usize
+    }
+}

crates/language_core/src/language_config.rs 🔗

@@ -0,0 +1,539 @@
+use crate::LanguageName;
+use collections::{HashMap, HashSet, IndexSet};
+use gpui::SharedString;
+use lsp::LanguageServerName;
+use regex::Regex;
+use schemars::{JsonSchema, SchemaGenerator, json_schema};
+use serde::{Deserialize, Deserializer, Serialize, Serializer, de};
+use std::{num::NonZeroU32, path::Path, sync::Arc};
+use util::serde::default_true;
+
+/// Controls the soft-wrapping behavior in the editor.
+#[derive(Copy, Clone, Debug, Serialize, Deserialize, PartialEq, Eq, JsonSchema)]
+#[serde(rename_all = "snake_case")]
+pub enum SoftWrap {
+    /// Prefer a single line generally, unless an overly long line is encountered.
+    None,
+    /// Deprecated: use None instead. Left to avoid breaking existing users' configs.
+    /// Prefer a single line generally, unless an overly long line is encountered.
+    PreferLine,
+    /// Soft wrap lines that exceed the editor width.
+    EditorWidth,
+    /// Soft wrap lines at the preferred line length.
+    PreferredLineLength,
+    /// Soft wrap line at the preferred line length or the editor width (whichever is smaller).
+    Bounded,
+}
+
+/// Top-level configuration for a language, typically loaded from a `config.toml`
+/// shipped alongside the grammar.
+#[derive(Clone, Debug, Deserialize, JsonSchema)]
+pub struct LanguageConfig {
+    /// Human-readable name of the language.
+    pub name: LanguageName,
+    /// The name of this language for a Markdown code fence block
+    pub code_fence_block_name: Option<Arc<str>>,
+    /// Alternative language names that Jupyter kernels may report for this language.
+    /// Used when a kernel's `language` field differs from Zed's language name.
+    /// For example, the Nu extension would set this to `["nushell"]`.
+    #[serde(default)]
+    pub kernel_language_names: Vec<Arc<str>>,
+    // The name of the grammar in a WASM bundle (experimental).
+    pub grammar: Option<Arc<str>>,
+    /// The criteria for matching this language to a given file.
+    #[serde(flatten)]
+    pub matcher: LanguageMatcher,
+    /// List of bracket types in a language.
+    #[serde(default)]
+    pub brackets: BracketPairConfig,
+    /// If set to true, auto indentation uses last non empty line to determine
+    /// the indentation level for a new line.
+    #[serde(default = "auto_indent_using_last_non_empty_line_default")]
+    pub auto_indent_using_last_non_empty_line: bool,
+    // Whether indentation of pasted content should be adjusted based on the context.
+    #[serde(default)]
+    pub auto_indent_on_paste: Option<bool>,
+    /// A regex that is used to determine whether the indentation level should be
+    /// increased in the following line.
+    #[serde(default, deserialize_with = "deserialize_regex")]
+    #[schemars(schema_with = "regex_json_schema")]
+    pub increase_indent_pattern: Option<Regex>,
+    /// A regex that is used to determine whether the indentation level should be
+    /// decreased in the following line.
+    #[serde(default, deserialize_with = "deserialize_regex")]
+    #[schemars(schema_with = "regex_json_schema")]
+    pub decrease_indent_pattern: Option<Regex>,
+    /// A list of rules for decreasing indentation. Each rule pairs a regex with a set of valid
+    /// "block-starting" tokens. When a line matches a pattern, its indentation is aligned with
+    /// the most recent line that began with a corresponding token. This enables context-aware
+    /// outdenting, like aligning an `else` with its `if`.
+    #[serde(default)]
+    pub decrease_indent_patterns: Vec<DecreaseIndentConfig>,
+    /// A list of characters that trigger the automatic insertion of a closing
+    /// bracket when they immediately precede the point where an opening
+    /// bracket is inserted.
+    #[serde(default)]
+    pub autoclose_before: String,
+    /// A placeholder used internally by Semantic Index.
+    #[serde(default)]
+    pub collapsed_placeholder: String,
+    /// A line comment string that is inserted in e.g. `toggle comments` action.
+    /// A language can have multiple flavours of line comments. All of the provided line comments are
+    /// used for comment continuations on the next line, but only the first one is used for Editor::ToggleComments.
+    #[serde(default)]
+    pub line_comments: Vec<Arc<str>>,
+    /// Delimiters and configuration for recognizing and formatting block comments.
+    #[serde(default)]
+    pub block_comment: Option<BlockCommentConfig>,
+    /// Delimiters and configuration for recognizing and formatting documentation comments.
+    #[serde(default, alias = "documentation")]
+    pub documentation_comment: Option<BlockCommentConfig>,
+    /// List markers that are inserted unchanged on newline (e.g., `- `, `* `, `+ `).
+    #[serde(default)]
+    pub unordered_list: Vec<Arc<str>>,
+    /// Configuration for ordered lists with auto-incrementing numbers on newline (e.g., `1. ` becomes `2. `).
+    #[serde(default)]
+    pub ordered_list: Vec<OrderedListConfig>,
+    /// Configuration for task lists where multiple markers map to a single continuation prefix (e.g., `- [x] ` continues as `- [ ] `).
+    #[serde(default)]
+    pub task_list: Option<TaskListConfig>,
+    /// A list of additional regex patterns that should be treated as prefixes
+    /// for creating boundaries during rewrapping, ensuring content from one
+    /// prefixed section doesn't merge with another (e.g., markdown list items).
+    /// By default, Zed treats as paragraph and comment prefixes as boundaries.
+    #[serde(default, deserialize_with = "deserialize_regex_vec")]
+    #[schemars(schema_with = "regex_vec_json_schema")]
+    pub rewrap_prefixes: Vec<Regex>,
+    /// A list of language servers that are allowed to run on subranges of a given language.
+    #[serde(default)]
+    pub scope_opt_in_language_servers: Vec<LanguageServerName>,
+    #[serde(default)]
+    pub overrides: HashMap<String, LanguageConfigOverride>,
+    /// A list of characters that Zed should treat as word characters for the
+    /// purpose of features that operate on word boundaries, like 'move to next word end'
+    /// or a whole-word search in buffer search.
+    #[serde(default)]
+    pub word_characters: HashSet<char>,
+    /// Whether to indent lines using tab characters, as opposed to multiple
+    /// spaces.
+    #[serde(default)]
+    pub hard_tabs: Option<bool>,
+    /// How many columns a tab should occupy.
+    #[serde(default)]
+    #[schemars(range(min = 1, max = 128))]
+    pub tab_size: Option<NonZeroU32>,
+    /// How to soft-wrap long lines of text.
+    #[serde(default)]
+    pub soft_wrap: Option<SoftWrap>,
+    /// When set, selections can be wrapped using prefix/suffix pairs on both sides.
+    #[serde(default)]
+    pub wrap_characters: Option<WrapCharactersConfig>,
+    /// The name of a Prettier parser that will be used for this language when no file path is available.
+    /// If there's a parser name in the language settings, that will be used instead.
+    #[serde(default)]
+    pub prettier_parser_name: Option<String>,
+    /// If true, this language is only for syntax highlighting via an injection into other
+    /// languages, but should not appear to the user as a distinct language.
+    #[serde(default)]
+    pub hidden: bool,
+    /// If configured, this language contains JSX style tags, and should support auto-closing of those tags.
+    #[serde(default)]
+    pub jsx_tag_auto_close: Option<JsxTagAutoCloseConfig>,
+    /// A list of characters that Zed should treat as word characters for completion queries.
+    #[serde(default)]
+    pub completion_query_characters: HashSet<char>,
+    /// A list of characters that Zed should treat as word characters for linked edit operations.
+    #[serde(default)]
+    pub linked_edit_characters: HashSet<char>,
+    /// A list of preferred debuggers for this language.
+    #[serde(default)]
+    pub debuggers: IndexSet<SharedString>,
+    /// A list of import namespace segments that aren't expected to appear in file paths. For
+    /// example, "super" and "crate" in Rust.
+    #[serde(default)]
+    pub ignored_import_segments: HashSet<Arc<str>>,
+    /// Regular expression that matches substrings to omit from import paths, to make the paths more
+    /// similar to how they are specified when imported. For example, "/mod\.rs$" or "/__init__\.py$".
+    #[serde(default, deserialize_with = "deserialize_regex")]
+    #[schemars(schema_with = "regex_json_schema")]
+    pub import_path_strip_regex: Option<Regex>,
+}
+
+impl LanguageConfig {
+    pub const FILE_NAME: &str = "config.toml";
+
+    pub fn load(config_path: impl AsRef<Path>) -> anyhow::Result<Self> {
+        let config = std::fs::read_to_string(config_path.as_ref())?;
+        toml::from_str(&config).map_err(Into::into)
+    }
+}
+
+impl Default for LanguageConfig {
+    fn default() -> Self {
+        Self {
+            name: LanguageName::new_static(""),
+            code_fence_block_name: None,
+            kernel_language_names: Default::default(),
+            grammar: None,
+            matcher: LanguageMatcher::default(),
+            brackets: Default::default(),
+            auto_indent_using_last_non_empty_line: auto_indent_using_last_non_empty_line_default(),
+            auto_indent_on_paste: None,
+            increase_indent_pattern: Default::default(),
+            decrease_indent_pattern: Default::default(),
+            decrease_indent_patterns: Default::default(),
+            autoclose_before: Default::default(),
+            line_comments: Default::default(),
+            block_comment: Default::default(),
+            documentation_comment: Default::default(),
+            unordered_list: Default::default(),
+            ordered_list: Default::default(),
+            task_list: Default::default(),
+            rewrap_prefixes: Default::default(),
+            scope_opt_in_language_servers: Default::default(),
+            overrides: Default::default(),
+            word_characters: Default::default(),
+            collapsed_placeholder: Default::default(),
+            hard_tabs: None,
+            tab_size: None,
+            soft_wrap: None,
+            wrap_characters: None,
+            prettier_parser_name: None,
+            hidden: false,
+            jsx_tag_auto_close: None,
+            completion_query_characters: Default::default(),
+            linked_edit_characters: Default::default(),
+            debuggers: Default::default(),
+            ignored_import_segments: Default::default(),
+            import_path_strip_regex: None,
+        }
+    }
+}
+
+#[derive(Clone, Debug, Deserialize, Default, JsonSchema)]
+pub struct DecreaseIndentConfig {
+    #[serde(default, deserialize_with = "deserialize_regex")]
+    #[schemars(schema_with = "regex_json_schema")]
+    pub pattern: Option<Regex>,
+    #[serde(default)]
+    pub valid_after: Vec<String>,
+}
+
+/// Configuration for continuing ordered lists with auto-incrementing numbers.
+#[derive(Clone, Debug, Deserialize, JsonSchema)]
+pub struct OrderedListConfig {
+    /// A regex pattern with a capture group for the number portion (e.g., `(\\d+)\\. `).
+    pub pattern: String,
+    /// A format string where `{1}` is replaced with the incremented number (e.g., `{1}. `).
+    pub format: String,
+}
+
+/// Configuration for continuing task lists on newline.
+#[derive(Clone, Debug, Deserialize, JsonSchema)]
+pub struct TaskListConfig {
+    /// The list markers to match (e.g., `- [ ] `, `- [x] `).
+    pub prefixes: Vec<Arc<str>>,
+    /// The marker to insert when continuing the list on a new line (e.g., `- [ ] `).
+    pub continuation: Arc<str>,
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize, Default, JsonSchema)]
+pub struct LanguageMatcher {
+    /// Given a list of `LanguageConfig`'s, the language of a file can be determined based on the path extension matching any of the `path_suffixes`.
+    #[serde(default)]
+    pub path_suffixes: Vec<String>,
+    /// A regex pattern that determines whether the language should be assigned to a file or not.
+    #[serde(
+        default,
+        serialize_with = "serialize_regex",
+        deserialize_with = "deserialize_regex"
+    )]
+    #[schemars(schema_with = "regex_json_schema")]
+    pub first_line_pattern: Option<Regex>,
+    /// Alternative names for this language used in vim/emacs modelines.
+    /// These are matched case-insensitively against the `mode` (emacs) or
+    /// `filetype`/`ft` (vim) specified in the modeline.
+    #[serde(default)]
+    pub modeline_aliases: Vec<String>,
+}
+
+impl Ord for LanguageMatcher {
+    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
+        self.path_suffixes
+            .cmp(&other.path_suffixes)
+            .then_with(|| {
+                self.first_line_pattern
+                    .as_ref()
+                    .map(Regex::as_str)
+                    .cmp(&other.first_line_pattern.as_ref().map(Regex::as_str))
+            })
+            .then_with(|| self.modeline_aliases.cmp(&other.modeline_aliases))
+    }
+}
+
+impl PartialOrd for LanguageMatcher {
+    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl Eq for LanguageMatcher {}
+
+impl PartialEq for LanguageMatcher {
+    fn eq(&self, other: &Self) -> bool {
+        self.path_suffixes == other.path_suffixes
+            && self.first_line_pattern.as_ref().map(Regex::as_str)
+                == other.first_line_pattern.as_ref().map(Regex::as_str)
+            && self.modeline_aliases == other.modeline_aliases
+    }
+}
+
+/// The configuration for JSX tag auto-closing.
+#[derive(Clone, Deserialize, JsonSchema, Debug)]
+pub struct JsxTagAutoCloseConfig {
+    /// The name of the node for a opening tag
+    pub open_tag_node_name: String,
+    /// The name of the node for an closing tag
+    pub close_tag_node_name: String,
+    /// The name of the node for a complete element with children for open and close tags
+    pub jsx_element_node_name: String,
+    /// The name of the node found within both opening and closing
+    /// tags that describes the tag name
+    pub tag_name_node_name: String,
+    /// Alternate Node names for tag names.
+    /// Specifically needed as TSX represents the name in `<Foo.Bar>`
+    /// as `member_expression` rather than `identifier` as usual
+    #[serde(default)]
+    pub tag_name_node_name_alternates: Vec<String>,
+    /// Some grammars are smart enough to detect a closing tag
+    /// that is not valid i.e. doesn't match it's corresponding
+    /// opening tag or does not have a corresponding opening tag
+    /// This should be set to the name of the node for invalid
+    /// closing tags if the grammar contains such a node, otherwise
+    /// detecting already closed tags will not work properly
+    #[serde(default)]
+    pub erroneous_close_tag_node_name: Option<String>,
+    /// See above for erroneous_close_tag_node_name for details
+    /// This should be set if the node used for the tag name
+    /// within erroneous closing tags is different from the
+    /// normal tag name node name
+    #[serde(default)]
+    pub erroneous_close_tag_name_node_name: Option<String>,
+}
+
+/// The configuration for block comments for this language.
+#[derive(Clone, Debug, JsonSchema, PartialEq)]
+pub struct BlockCommentConfig {
+    /// A start tag of block comment.
+    pub start: Arc<str>,
+    /// A end tag of block comment.
+    pub end: Arc<str>,
+    /// A character to add as a prefix when a new line is added to a block comment.
+    pub prefix: Arc<str>,
+    /// A indent to add for prefix and end line upon new line.
+    #[schemars(range(min = 1, max = 128))]
+    pub tab_size: u32,
+}
+
+impl<'de> Deserialize<'de> for BlockCommentConfig {
+    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
+    where
+        D: Deserializer<'de>,
+    {
+        #[derive(Deserialize)]
+        #[serde(untagged)]
+        enum BlockCommentConfigHelper {
+            New {
+                start: Arc<str>,
+                end: Arc<str>,
+                prefix: Arc<str>,
+                tab_size: u32,
+            },
+            Old([Arc<str>; 2]),
+        }
+
+        match BlockCommentConfigHelper::deserialize(deserializer)? {
+            BlockCommentConfigHelper::New {
+                start,
+                end,
+                prefix,
+                tab_size,
+            } => Ok(BlockCommentConfig {
+                start,
+                end,
+                prefix,
+                tab_size,
+            }),
+            BlockCommentConfigHelper::Old([start, end]) => Ok(BlockCommentConfig {
+                start,
+                end,
+                prefix: "".into(),
+                tab_size: 0,
+            }),
+        }
+    }
+}
+
+#[derive(Clone, Deserialize, Default, Debug, JsonSchema)]
+pub struct LanguageConfigOverride {
+    #[serde(default)]
+    pub line_comments: Override<Vec<Arc<str>>>,
+    #[serde(default)]
+    pub block_comment: Override<BlockCommentConfig>,
+    #[serde(skip)]
+    pub disabled_bracket_ixs: Vec<u16>,
+    #[serde(default)]
+    pub word_characters: Override<HashSet<char>>,
+    #[serde(default)]
+    pub completion_query_characters: Override<HashSet<char>>,
+    #[serde(default)]
+    pub linked_edit_characters: Override<HashSet<char>>,
+    #[serde(default)]
+    pub opt_into_language_servers: Vec<LanguageServerName>,
+    #[serde(default)]
+    pub prefer_label_for_snippet: Option<bool>,
+}
+
+#[derive(Clone, Deserialize, Debug, Serialize, JsonSchema)]
+#[serde(untagged)]
+pub enum Override<T> {
+    Remove { remove: bool },
+    Set(T),
+}
+
+impl<T> Default for Override<T> {
+    fn default() -> Self {
+        Override::Remove { remove: false }
+    }
+}
+
+impl<T> Override<T> {
+    pub fn as_option<'a>(this: Option<&'a Self>, original: Option<&'a T>) -> Option<&'a T> {
+        match this {
+            Some(Self::Set(value)) => Some(value),
+            Some(Self::Remove { remove: true }) => None,
+            Some(Self::Remove { remove: false }) | None => original,
+        }
+    }
+}
+
+/// Configuration of handling bracket pairs for a given language.
+///
+/// This struct includes settings for defining which pairs of characters are considered brackets and
+/// also specifies any language-specific scopes where these pairs should be ignored for bracket matching purposes.
+#[derive(Clone, Debug, Default, JsonSchema)]
+#[schemars(with = "Vec::<BracketPairContent>")]
+pub struct BracketPairConfig {
+    /// A list of character pairs that should be treated as brackets in the context of a given language.
+    pub pairs: Vec<BracketPair>,
+    /// A list of tree-sitter scopes for which a given bracket should not be active.
+    /// N-th entry in `[Self::disabled_scopes_by_bracket_ix]` contains a list of disabled scopes for an n-th entry in `[Self::pairs]`
+    pub disabled_scopes_by_bracket_ix: Vec<Vec<String>>,
+}
+
+impl BracketPairConfig {
+    pub fn is_closing_brace(&self, c: char) -> bool {
+        self.pairs.iter().any(|pair| pair.end.starts_with(c))
+    }
+}
+
+#[derive(Deserialize, JsonSchema)]
+pub struct BracketPairContent {
+    #[serde(flatten)]
+    pub bracket_pair: BracketPair,
+    #[serde(default)]
+    pub not_in: Vec<String>,
+}
+
+impl<'de> Deserialize<'de> for BracketPairConfig {
+    fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
+    where
+        D: Deserializer<'de>,
+    {
+        let result = Vec::<BracketPairContent>::deserialize(deserializer)?;
+        let (brackets, disabled_scopes_by_bracket_ix) = result
+            .into_iter()
+            .map(|entry| (entry.bracket_pair, entry.not_in))
+            .unzip();
+
+        Ok(BracketPairConfig {
+            pairs: brackets,
+            disabled_scopes_by_bracket_ix,
+        })
+    }
+}
+
+/// Describes a single bracket pair and how an editor should react to e.g. inserting
+/// an opening bracket or to a newline character insertion in between `start` and `end` characters.
+#[derive(Clone, Debug, Default, Deserialize, PartialEq, JsonSchema)]
+pub struct BracketPair {
+    /// Starting substring for a bracket.
+    pub start: String,
+    /// Ending substring for a bracket.
+    pub end: String,
+    /// True if `end` should be automatically inserted right after `start` characters.
+    pub close: bool,
+    /// True if selected text should be surrounded by `start` and `end` characters.
+    #[serde(default = "default_true")]
+    pub surround: bool,
+    /// True if an extra newline should be inserted while the cursor is in the middle
+    /// of that bracket pair.
+    pub newline: bool,
+}
+
+#[derive(Clone, Debug, Deserialize, JsonSchema)]
+pub struct WrapCharactersConfig {
+    /// Opening token split into a prefix and suffix. The first caret goes
+    /// after the prefix (i.e., between prefix and suffix).
+    pub start_prefix: String,
+    pub start_suffix: String,
+    /// Closing token split into a prefix and suffix. The second caret goes
+    /// after the prefix (i.e., between prefix and suffix).
+    pub end_prefix: String,
+    pub end_suffix: String,
+}
+
+pub fn auto_indent_using_last_non_empty_line_default() -> bool {
+    true
+}
+
+pub fn deserialize_regex<'de, D: Deserializer<'de>>(d: D) -> Result<Option<Regex>, D::Error> {
+    let source = Option::<String>::deserialize(d)?;
+    if let Some(source) = source {
+        Ok(Some(regex::Regex::new(&source).map_err(de::Error::custom)?))
+    } else {
+        Ok(None)
+    }
+}
+
+pub fn regex_json_schema(_: &mut schemars::SchemaGenerator) -> schemars::Schema {
+    json_schema!({
+        "type": "string"
+    })
+}
+
+pub fn serialize_regex<S>(regex: &Option<Regex>, serializer: S) -> Result<S::Ok, S::Error>
+where
+    S: Serializer,
+{
+    match regex {
+        Some(regex) => serializer.serialize_str(regex.as_str()),
+        None => serializer.serialize_none(),
+    }
+}
+
+pub fn deserialize_regex_vec<'de, D: Deserializer<'de>>(d: D) -> Result<Vec<Regex>, D::Error> {
+    let sources = Vec::<String>::deserialize(d)?;
+    sources
+        .into_iter()
+        .map(|source| regex::Regex::new(&source))
+        .collect::<Result<_, _>>()
+        .map_err(de::Error::custom)
+}
+
+pub fn regex_vec_json_schema(_: &mut SchemaGenerator) -> schemars::Schema {
+    json_schema!({
+        "type": "array",
+        "items": { "type": "string" }
+    })
+}

crates/language_core/src/language_core.rs 🔗

@@ -0,0 +1,39 @@
+// language_core: tree-sitter grammar infrastructure, LSP adapter traits,
+// language configuration, and highlight mapping.
+
+pub mod diagnostic;
+pub mod grammar;
+pub mod highlight_map;
+pub mod language_config;
+
+pub use diagnostic::{Diagnostic, DiagnosticSourceKind};
+pub use grammar::{
+    BracketsConfig, BracketsPatternConfig, DebugVariablesConfig, DebuggerTextObject, Grammar,
+    GrammarId, HighlightsConfig, ImportsConfig, IndentConfig, InjectionConfig,
+    InjectionPatternConfig, NEXT_GRAMMAR_ID, OutlineConfig, OverrideConfig, OverrideEntry,
+    RedactionConfig, RunnableCapture, RunnableConfig, TextObject, TextObjectConfig,
+};
+pub use highlight_map::{HighlightId, HighlightMap};
+pub use language_config::{
+    BlockCommentConfig, BracketPair, BracketPairConfig, BracketPairContent, DecreaseIndentConfig,
+    JsxTagAutoCloseConfig, LanguageConfig, LanguageConfigOverride, LanguageMatcher,
+    OrderedListConfig, Override, SoftWrap, TaskListConfig, WrapCharactersConfig,
+    auto_indent_using_last_non_empty_line_default, deserialize_regex, deserialize_regex_vec,
+    regex_json_schema, regex_vec_json_schema, serialize_regex,
+};
+
+pub mod code_label;
+pub mod language_name;
+pub mod lsp_adapter;
+pub mod manifest;
+pub mod queries;
+pub mod toolchain;
+
+pub use code_label::{CodeLabel, CodeLabelBuilder, Symbol};
+pub use language_name::{LanguageId, LanguageName};
+pub use lsp_adapter::{
+    BinaryStatus, LanguageServerStatusUpdate, PromptResponseContext, ServerHealth, ToLspPosition,
+};
+pub use manifest::ManifestName;
+pub use queries::{LanguageQueries, QUERY_FILENAME_PREFIXES};
+pub use toolchain::{Toolchain, ToolchainList, ToolchainMetadata, ToolchainScope};

crates/language_core/src/language_name.rs 🔗

@@ -0,0 +1,109 @@
+use gpui::SharedString;
+use schemars::JsonSchema;
+use serde::{Deserialize, Serialize};
+use std::{
+    borrow::Borrow,
+    sync::atomic::{AtomicUsize, Ordering::SeqCst},
+};
+
+static NEXT_LANGUAGE_ID: AtomicUsize = AtomicUsize::new(0);
+
+#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
+pub struct LanguageId(usize);
+
+impl LanguageId {
+    pub fn new() -> Self {
+        Self(NEXT_LANGUAGE_ID.fetch_add(1, SeqCst))
+    }
+}
+
+impl Default for LanguageId {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+#[derive(
+    Debug, Clone, Hash, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize, JsonSchema,
+)]
+pub struct LanguageName(pub SharedString);
+
+impl LanguageName {
+    pub fn new(s: &str) -> Self {
+        Self(SharedString::new(s))
+    }
+
+    pub fn new_static(s: &'static str) -> Self {
+        Self(SharedString::new_static(s))
+    }
+
+    pub fn from_proto(s: String) -> Self {
+        Self(SharedString::from(s))
+    }
+
+    pub fn to_proto(&self) -> String {
+        self.0.to_string()
+    }
+
+    pub fn lsp_id(&self) -> String {
+        match self.0.as_ref() {
+            "Plain Text" => "plaintext".to_string(),
+            language_name => language_name.to_lowercase(),
+        }
+    }
+}
+
+impl From<LanguageName> for SharedString {
+    fn from(value: LanguageName) -> Self {
+        value.0
+    }
+}
+
+impl From<SharedString> for LanguageName {
+    fn from(value: SharedString) -> Self {
+        LanguageName(value)
+    }
+}
+
+impl AsRef<str> for LanguageName {
+    fn as_ref(&self) -> &str {
+        self.0.as_ref()
+    }
+}
+
+impl Borrow<str> for LanguageName {
+    fn borrow(&self) -> &str {
+        self.0.as_ref()
+    }
+}
+
+impl PartialEq<str> for LanguageName {
+    fn eq(&self, other: &str) -> bool {
+        self.0.as_ref() == other
+    }
+}
+
+impl PartialEq<&str> for LanguageName {
+    fn eq(&self, other: &&str) -> bool {
+        self.0.as_ref() == *other
+    }
+}
+
+impl std::fmt::Display for LanguageName {
+    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+        write!(f, "{}", self.0)
+    }
+}
+
+impl From<&'static str> for LanguageName {
+    fn from(str: &'static str) -> Self {
+        Self(SharedString::new_static(str))
+    }
+}
+
+impl From<LanguageName> for String {
+    fn from(value: LanguageName) -> Self {
+        let value: &str = &value.0;
+        Self::from(value)
+    }
+}

crates/language_core/src/lsp_adapter.rs 🔗

@@ -0,0 +1,44 @@
+use gpui::SharedString;
+use serde::{Deserialize, Serialize};
+
+/// Converts a value into an LSP position.
+pub trait ToLspPosition {
+    /// Converts the value into an LSP position.
+    fn to_lsp_position(self) -> lsp::Position;
+}
+
+/// Context provided to LSP adapters when a user responds to a ShowMessageRequest prompt.
+/// This allows adapters to intercept preference selections (like "Always" or "Never")
+/// and potentially persist them to Zed's settings.
+#[derive(Debug, Clone)]
+pub struct PromptResponseContext {
+    /// The original message shown to the user
+    pub message: String,
+    /// The action (button) the user selected
+    pub selected_action: lsp::MessageActionItem,
+}
+
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub enum LanguageServerStatusUpdate {
+    Binary(BinaryStatus),
+    Health(ServerHealth, Option<SharedString>),
+}
+
+#[derive(Debug, PartialEq, Eq, Deserialize, Serialize, Clone, Copy)]
+#[serde(rename_all = "camelCase")]
+pub enum ServerHealth {
+    Ok,
+    Warning,
+    Error,
+}
+
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub enum BinaryStatus {
+    None,
+    CheckingForUpdate,
+    Downloading,
+    Starting,
+    Stopping,
+    Stopped,
+    Failed { error: String },
+}

crates/language_core/src/manifest.rs 🔗

@@ -0,0 +1,36 @@
+use std::borrow::Borrow;
+
+use gpui::SharedString;
+
+#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
+pub struct ManifestName(SharedString);
+
+impl Borrow<SharedString> for ManifestName {
+    fn borrow(&self) -> &SharedString {
+        &self.0
+    }
+}
+
+impl Borrow<str> for ManifestName {
+    fn borrow(&self) -> &str {
+        &self.0
+    }
+}
+
+impl From<SharedString> for ManifestName {
+    fn from(value: SharedString) -> Self {
+        Self(value)
+    }
+}
+
+impl From<ManifestName> for SharedString {
+    fn from(value: ManifestName) -> Self {
+        value.0
+    }
+}
+
+impl AsRef<SharedString> for ManifestName {
+    fn as_ref(&self) -> &SharedString {
+        &self.0
+    }
+}

crates/language_core/src/queries.rs 🔗

@@ -0,0 +1,33 @@
+use std::borrow::Cow;
+
+pub type QueryFieldAccessor = fn(&mut LanguageQueries) -> &mut Option<Cow<'static, str>>;
+
+pub const QUERY_FILENAME_PREFIXES: &[(&str, QueryFieldAccessor)] = &[
+    ("highlights", |q| &mut q.highlights),
+    ("brackets", |q| &mut q.brackets),
+    ("outline", |q| &mut q.outline),
+    ("indents", |q| &mut q.indents),
+    ("injections", |q| &mut q.injections),
+    ("overrides", |q| &mut q.overrides),
+    ("redactions", |q| &mut q.redactions),
+    ("runnables", |q| &mut q.runnables),
+    ("debugger", |q| &mut q.debugger),
+    ("textobjects", |q| &mut q.text_objects),
+    ("imports", |q| &mut q.imports),
+];
+
+/// Tree-sitter language queries for a given language.
+#[derive(Debug, Default)]
+pub struct LanguageQueries {
+    pub highlights: Option<Cow<'static, str>>,
+    pub brackets: Option<Cow<'static, str>>,
+    pub indents: Option<Cow<'static, str>>,
+    pub outline: Option<Cow<'static, str>>,
+    pub injections: Option<Cow<'static, str>>,
+    pub overrides: Option<Cow<'static, str>>,
+    pub redactions: Option<Cow<'static, str>>,
+    pub runnables: Option<Cow<'static, str>>,
+    pub text_objects: Option<Cow<'static, str>>,
+    pub debugger: Option<Cow<'static, str>>,
+    pub imports: Option<Cow<'static, str>>,
+}

crates/language_core/src/toolchain.rs 🔗

@@ -0,0 +1,124 @@
+//! Provides core data types for language toolchains.
+//!
+//! A language can have associated toolchains,
+//! which is a set of tools used to interact with the projects written in said language.
+//! For example, a Python project can have an associated virtual environment; a Rust project can have a toolchain override.
+
+use std::{path::Path, sync::Arc};
+
+use gpui::SharedString;
+use util::rel_path::RelPath;
+
+use crate::{LanguageName, ManifestName};
+
+/// Represents a single toolchain.
+#[derive(Clone, Eq, Debug)]
+pub struct Toolchain {
+    /// User-facing label
+    pub name: SharedString,
+    /// Absolute path
+    pub path: SharedString,
+    pub language_name: LanguageName,
+    /// Full toolchain data (including language-specific details)
+    pub as_json: serde_json::Value,
+}
+
+impl std::hash::Hash for Toolchain {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        let Self {
+            name,
+            path,
+            language_name,
+            as_json: _,
+        } = self;
+        name.hash(state);
+        path.hash(state);
+        language_name.hash(state);
+    }
+}
+
+impl PartialEq for Toolchain {
+    fn eq(&self, other: &Self) -> bool {
+        let Self {
+            name,
+            path,
+            language_name,
+            as_json: _,
+        } = self;
+        // Do not use as_json for comparisons; it shouldn't impact equality, as it's not user-surfaced.
+        // Thus, there could be multiple entries that look the same in the UI.
+        (name, path, language_name).eq(&(&other.name, &other.path, &other.language_name))
+    }
+}
+
+/// Declares a scope of a toolchain added by user.
+///
+/// When the user adds a toolchain, we give them an option to see that toolchain in:
+/// - All of their projects
+/// - A project they're currently in.
+/// - Only in the subproject they're currently in.
+#[derive(Clone, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub enum ToolchainScope {
+    Subproject(Arc<Path>, Arc<RelPath>),
+    Project,
+    /// Available in all projects on this box. It wouldn't make sense to show suggestions across machines.
+    Global,
+}
+
+impl ToolchainScope {
+    pub fn label(&self) -> &'static str {
+        match self {
+            ToolchainScope::Subproject(_, _) => "Subproject",
+            ToolchainScope::Project => "Project",
+            ToolchainScope::Global => "Global",
+        }
+    }
+
+    pub fn description(&self) -> &'static str {
+        match self {
+            ToolchainScope::Subproject(_, _) => {
+                "Available only in the subproject you're currently in."
+            }
+            ToolchainScope::Project => "Available in all locations in your current project.",
+            ToolchainScope::Global => "Available in all of your projects on this machine.",
+        }
+    }
+}
+
+#[derive(Clone, PartialEq, Eq, Hash)]
+pub struct ToolchainMetadata {
+    /// Returns a term which we should use in UI to refer to toolchains produced by a given `ToolchainLister`.
+    pub term: SharedString,
+    /// A user-facing placeholder describing the semantic meaning of a path to a new toolchain.
+    pub new_toolchain_placeholder: SharedString,
+    /// The name of the manifest file for this toolchain.
+    pub manifest_name: ManifestName,
+}
+
+type DefaultIndex = usize;
+#[derive(Default, Clone, Debug)]
+pub struct ToolchainList {
+    pub toolchains: Vec<Toolchain>,
+    pub default: Option<DefaultIndex>,
+    pub groups: Box<[(usize, SharedString)]>,
+}
+
+impl ToolchainList {
+    pub fn toolchains(&self) -> &[Toolchain] {
+        &self.toolchains
+    }
+    pub fn default_toolchain(&self) -> Option<Toolchain> {
+        self.default.and_then(|ix| self.toolchains.get(ix)).cloned()
+    }
+    pub fn group_for_index(&self, index: usize) -> Option<(usize, SharedString)> {
+        if index >= self.toolchains.len() {
+            return None;
+        }
+        let first_equal_or_greater = self
+            .groups
+            .partition_point(|(group_lower_bound, _)| group_lower_bound <= &index);
+        self.groups
+            .get(first_equal_or_greater.checked_sub(1)?)
+            .cloned()
+    }
+}

crates/language_tools/src/highlights_tree_view.rs 🔗

@@ -9,6 +9,7 @@ use gpui::{
     Task, UniformListScrollHandle, WeakEntity, Window, actions, div, rems, uniform_list,
 };
 use language::ToOffset;
+
 use menu::{SelectNext, SelectPrevious};
 use std::{mem, ops::Range};
 use theme::ActiveTheme;
@@ -419,12 +420,12 @@ impl HighlightsTreeView {
 
             for capture in captures {
                 let highlight_id = highlight_maps[capture.grammar_index].get(capture.index);
-                let Some(style) = highlight_id.style(&syntax_theme) else {
+                let Some(style) = syntax_theme.get(highlight_id).cloned() else {
                     continue;
                 };
 
-                let theme_key = highlight_id
-                    .name(&syntax_theme)
+                let theme_key = syntax_theme
+                    .get_capture_name(highlight_id)
                     .map(|theme_key| SharedString::from(theme_key.to_string()));
 
                 let capture_name = grammars[capture.grammar_index]

crates/languages/Cargo.toml 🔗

@@ -13,24 +13,9 @@ test-support = [
     "load-grammars"
 ]
 load-grammars = [
+    "grammars/load-grammars",
     "tree-sitter",
-    "tree-sitter-bash",
-    "tree-sitter-c",
-    "tree-sitter-cpp",
-    "tree-sitter-css",
-    "tree-sitter-diff",
     "tree-sitter-gitcommit",
-    "tree-sitter-go",
-    "tree-sitter-go-mod",
-    "tree-sitter-gowork",
-    "tree-sitter-jsdoc",
-    "tree-sitter-json",
-    "tree-sitter-md",
-    "tree-sitter-python",
-    "tree-sitter-regex",
-    "tree-sitter-rust",
-    "tree-sitter-typescript",
-    "tree-sitter-yaml",
 ]
 
 [dependencies]
@@ -44,6 +29,7 @@ collections.workspace = true
 futures.workspace = true
 globset.workspace = true
 gpui.workspace = true
+grammars.workspace = true
 http_client.workspace = true
 itertools.workspace = true
 json_schema_store.workspace = true
@@ -62,7 +48,6 @@ pet.workspace = true
 project.workspace = true
 regex.workspace = true
 rope.workspace = true
-rust-embed.workspace = true
 serde.workspace = true
 serde_json.workspace = true
 serde_json_lenient.workspace = true
@@ -74,29 +59,13 @@ snippet.workspace = true
 task.workspace = true
 terminal.workspace = true
 theme.workspace = true
-toml.workspace = true
 tree-sitter = { workspace = true, optional = true }
-tree-sitter-bash = { workspace = true, optional = true }
-tree-sitter-c = { workspace = true, optional = true }
-tree-sitter-cpp = { workspace = true, optional = true }
-tree-sitter-css = { workspace = true, optional = true }
-tree-sitter-diff = { workspace = true, optional = true }
 tree-sitter-gitcommit = { workspace = true, optional = true }
-tree-sitter-go = { workspace = true, optional = true }
-tree-sitter-go-mod = { workspace = true, optional = true }
-tree-sitter-gowork = { workspace = true, optional = true }
-tree-sitter-jsdoc = { workspace = true, optional = true }
-tree-sitter-json = { workspace = true, optional = true }
-tree-sitter-md = { workspace = true, optional = true }
-tree-sitter-python = { workspace = true, optional = true }
-tree-sitter-regex = { workspace = true, optional = true }
-tree-sitter-rust = { workspace = true, optional = true }
-tree-sitter-typescript = { workspace = true, optional = true }
-tree-sitter-yaml = { workspace = true, optional = true }
 url.workspace = true
 util.workspace = true
 
 [dev-dependencies]
+fs = { workspace = true, features = ["test-support"] }
 pretty_assertions.workspace = true
 theme = { workspace = true, features = ["test-support"] }
 tree-sitter-bash.workspace = true
@@ -105,6 +74,7 @@ tree-sitter-cpp.workspace = true
 tree-sitter-css.workspace = true
 tree-sitter-go.workspace = true
 tree-sitter-python.workspace = true
+tree-sitter-rust.workspace = true
 tree-sitter-typescript.workspace = true
 tree-sitter.workspace = true
 unindent.workspace = true

crates/languages/src/c.rs 🔗

@@ -368,7 +368,7 @@ impl super::LspAdapter for CLspAdapter {
         Ok(original)
     }
 
-    fn retain_old_diagnostic(&self, previous_diagnostic: &Diagnostic, _: &App) -> bool {
+    fn retain_old_diagnostic(&self, previous_diagnostic: &Diagnostic) -> bool {
         clangd_ext::is_inactive_region(previous_diagnostic)
     }
 

crates/languages/src/cpp.rs 🔗

@@ -1,9 +1,7 @@
 use settings::SemanticTokenRules;
 
-use crate::LanguageDir;
-
 pub(crate) fn semantic_token_rules() -> SemanticTokenRules {
-    let content = LanguageDir::get("cpp/semantic_token_rules.json")
+    let content = grammars::get_file("cpp/semantic_token_rules.json")
         .expect("missing cpp/semantic_token_rules.json");
     let json = std::str::from_utf8(&content.data).expect("invalid utf-8 in semantic_token_rules");
     settings::parse_json_with_comments::<SemanticTokenRules>(json)

crates/languages/src/go.rs 🔗

@@ -31,10 +31,8 @@ use std::{
 use task::{TaskTemplate, TaskTemplates, TaskVariables, VariableName};
 use util::{ResultExt, fs::remove_matching, maybe, merge_json_value_into};
 
-use crate::LanguageDir;
-
 pub(crate) fn semantic_token_rules() -> SemanticTokenRules {
-    let content = LanguageDir::get("go/semantic_token_rules.json")
+    let content = grammars::get_file("go/semantic_token_rules.json")
         .expect("missing go/semantic_token_rules.json");
     let json = std::str::from_utf8(&content.data).expect("invalid utf-8 in semantic_token_rules");
     settings::parse_json_with_comments::<SemanticTokenRules>(json)

crates/languages/src/lib.rs 🔗

@@ -1,14 +1,12 @@
-use anyhow::Context as _;
 use gpui::{App, SharedString, UpdateGlobal};
 use node_runtime::NodeRuntime;
 use project::Fs;
 use python::PyprojectTomlManifestProvider;
 use rust::CargoManifestProvider;
-use rust_embed::RustEmbed;
 use settings::{SemanticTokenRules, SettingsStore};
 use smol::stream::StreamExt;
-use std::{str, sync::Arc};
-use util::{ResultExt, asset_str};
+use std::sync::Arc;
+use util::ResultExt;
 
 pub use language::*;
 
@@ -35,11 +33,6 @@ mod yaml;
 
 pub(crate) use package_json::{PackageJson, PackageJsonData};
 
-#[derive(RustEmbed)]
-#[folder = "src/"]
-#[exclude = "*.rs"]
-struct LanguageDir;
-
 /// A shared grammar for plain text, exposed for reuse by downstream crates.
 #[cfg(feature = "tree-sitter-gitcommit")]
 pub static LANGUAGE_GIT_COMMIT: std::sync::LazyLock<Arc<Language>> =
@@ -47,7 +40,7 @@ pub static LANGUAGE_GIT_COMMIT: std::sync::LazyLock<Arc<Language>> =
         Arc::new(Language::new(
             LanguageConfig {
                 name: "Git Commit".into(),
-                soft_wrap: Some(language::language_settings::SoftWrap::EditorWidth),
+                soft_wrap: Some(language::SoftWrap::EditorWidth),
                 matcher: LanguageMatcher {
                     path_suffixes: vec!["COMMIT_EDITMSG".to_owned()],
                     first_line_pattern: None,
@@ -62,28 +55,7 @@ pub static LANGUAGE_GIT_COMMIT: std::sync::LazyLock<Arc<Language>> =
 
 pub fn init(languages: Arc<LanguageRegistry>, fs: Arc<dyn Fs>, node: NodeRuntime, cx: &mut App) {
     #[cfg(feature = "load-grammars")]
-    languages.register_native_grammars([
-        ("bash", tree_sitter_bash::LANGUAGE),
-        ("c", tree_sitter_c::LANGUAGE),
-        ("cpp", tree_sitter_cpp::LANGUAGE),
-        ("css", tree_sitter_css::LANGUAGE),
-        ("diff", tree_sitter_diff::LANGUAGE),
-        ("go", tree_sitter_go::LANGUAGE),
-        ("gomod", tree_sitter_go_mod::LANGUAGE),
-        ("gowork", tree_sitter_gowork::LANGUAGE),
-        ("jsdoc", tree_sitter_jsdoc::LANGUAGE),
-        ("json", tree_sitter_json::LANGUAGE),
-        ("jsonc", tree_sitter_json::LANGUAGE),
-        ("markdown", tree_sitter_md::LANGUAGE),
-        ("markdown-inline", tree_sitter_md::INLINE_LANGUAGE),
-        ("python", tree_sitter_python::LANGUAGE),
-        ("regex", tree_sitter_regex::LANGUAGE),
-        ("rust", tree_sitter_rust::LANGUAGE),
-        ("tsx", tree_sitter_typescript::LANGUAGE_TSX),
-        ("typescript", tree_sitter_typescript::LANGUAGE_TYPESCRIPT),
-        ("yaml", tree_sitter_yaml::LANGUAGE),
-        ("gitcommit", tree_sitter_gitcommit::LANGUAGE),
-    ]);
+    languages.register_native_grammars(grammars::native_grammars());
 
     let c_lsp_adapter = Arc::new(c::CLspAdapter);
     let css_lsp_adapter = Arc::new(css::CssLspAdapter::new(node.clone()));
@@ -99,7 +71,7 @@ pub fn init(languages: Arc<LanguageRegistry>, fs: Arc<dyn Fs>, node: NodeRuntime
     let python_lsp_adapter = Arc::new(python::PyrightLspAdapter::new(node.clone()));
     let basedpyright_lsp_adapter = Arc::new(BasedPyrightLspAdapter::new(node.clone()));
     let ruff_lsp_adapter = Arc::new(RuffLspAdapter::new(fs.clone()));
-    let python_toolchain_provider = Arc::new(python::PythonToolchainProvider);
+    let python_toolchain_provider = Arc::new(python::PythonToolchainProvider::new(fs.clone()));
     let rust_context_provider = Arc::new(rust::RustContextProvider);
     let rust_lsp_adapter = Arc::new(rust::RustLspAdapter);
     let tailwind_adapter = Arc::new(tailwind::TailwindLspAdapter::new(node.clone()));
@@ -402,56 +374,17 @@ fn register_language(
 #[cfg(any(test, feature = "test-support"))]
 pub fn language(name: &str, grammar: tree_sitter::Language) -> Arc<Language> {
     Arc::new(
-        Language::new(load_config(name), Some(grammar))
-            .with_queries(load_queries(name))
+        Language::new(grammars::load_config(name), Some(grammar))
+            .with_queries(grammars::load_queries(name))
             .unwrap(),
     )
 }
 
 fn load_config(name: &str) -> LanguageConfig {
-    let config_toml = String::from_utf8(
-        LanguageDir::get(&format!("{}/config.toml", name))
-            .unwrap_or_else(|| panic!("missing config for language {:?}", name))
-            .data
-            .to_vec(),
-    )
-    .unwrap();
-
-    #[allow(unused_mut)]
-    let mut config: LanguageConfig = ::toml::from_str(&config_toml)
-        .with_context(|| format!("failed to load config.toml for language {name:?}"))
-        .unwrap();
-
-    #[cfg(not(any(feature = "load-grammars", test)))]
-    {
-        config = LanguageConfig {
-            name: config.name,
-            matcher: config.matcher,
-            jsx_tag_auto_close: config.jsx_tag_auto_close,
-            ..Default::default()
-        }
-    }
-
-    config
+    let grammars_loaded = cfg!(any(feature = "load-grammars", test));
+    grammars::load_config_for_feature(name, grammars_loaded)
 }
 
 fn load_queries(name: &str) -> LanguageQueries {
-    let mut result = LanguageQueries::default();
-    for path in LanguageDir::iter() {
-        if let Some(remainder) = path.strip_prefix(name).and_then(|p| p.strip_prefix('/')) {
-            if !remainder.ends_with(".scm") {
-                continue;
-            }
-            for (name, query) in QUERY_FILENAME_PREFIXES {
-                if remainder.starts_with(name) {
-                    let contents = asset_str::<LanguageDir>(path.as_ref());
-                    match query(&mut result) {
-                        None => *query(&mut result) = Some(contents),
-                        Some(r) => r.to_mut().push_str(contents.as_ref()),
-                    }
-                }
-            }
-        }
-    }
-    result
+    grammars::load_queries(name)
 }

crates/languages/src/python.rs 🔗

@@ -39,7 +39,6 @@ use util::fs::{make_file_executable, remove_matching};
 use util::paths::PathStyle;
 use util::rel_path::RelPath;
 
-use crate::LanguageDir;
 use http_client::github_download::{GithubBinaryMetadata, download_server_binary};
 use parking_lot::Mutex;
 use std::str::FromStr;
@@ -53,7 +52,7 @@ use task::{ShellKind, TaskTemplate, TaskTemplates, VariableName};
 use util::{ResultExt, maybe};
 
 pub(crate) fn semantic_token_rules() -> SemanticTokenRules {
-    let content = LanguageDir::get("python/semantic_token_rules.json")
+    let content = grammars::get_file("python/semantic_token_rules.json")
         .expect("missing python/semantic_token_rules.json");
     let json = std::str::from_utf8(&content.data).expect("invalid utf-8 in semantic_token_rules");
     settings::parse_json_with_comments::<SemanticTokenRules>(json)
@@ -1121,7 +1120,15 @@ fn python_env_kind_display(k: &PythonEnvironmentKind) -> &'static str {
     }
 }
 
-pub(crate) struct PythonToolchainProvider;
+pub(crate) struct PythonToolchainProvider {
+    fs: Arc<dyn Fs>,
+}
+
+impl PythonToolchainProvider {
+    pub fn new(fs: Arc<dyn Fs>) -> Self {
+        Self { fs }
+    }
+}
 
 static ENV_PRIORITY_LIST: &[PythonEnvironmentKind] = &[
     // Prioritize non-Conda environments.
@@ -1236,8 +1243,8 @@ impl ToolchainLister for PythonToolchainProvider {
         worktree_root: PathBuf,
         subroot_relative_path: Arc<RelPath>,
         project_env: Option<HashMap<String, String>>,
-        fs: &dyn Fs,
     ) -> ToolchainList {
+        let fs = &*self.fs;
         let env = project_env.unwrap_or_default();
         let environment = EnvironmentApi::from_env(&env);
         let locators = pet::locators::create_locators(
@@ -1368,8 +1375,8 @@ impl ToolchainLister for PythonToolchainProvider {
         &self,
         path: PathBuf,
         env: Option<HashMap<String, String>>,
-        fs: &dyn Fs,
     ) -> anyhow::Result<Toolchain> {
+        let fs = &*self.fs;
         let env = env.unwrap_or_default();
         let environment = EnvironmentApi::from_env(&env);
         let locators = pet::locators::create_locators(
@@ -2664,7 +2671,8 @@ mod tests {
             });
         });
 
-        let provider = PythonToolchainProvider;
+        let fs = project::FakeFs::new(cx.executor());
+        let provider = PythonToolchainProvider::new(fs);
         let malicious_name = "foo; rm -rf /";
 
         let manager_executable = std::env::current_exe().unwrap();

crates/languages/src/rust.rs 🔗

@@ -31,11 +31,10 @@ use util::merge_json_value_into;
 use util::rel_path::RelPath;
 use util::{ResultExt, maybe};
 
-use crate::LanguageDir;
 use crate::language_settings::LanguageSettings;
 
 pub(crate) fn semantic_token_rules() -> SemanticTokenRules {
-    let content = LanguageDir::get("rust/semantic_token_rules.json")
+    let content = grammars::get_file("rust/semantic_token_rules.json")
         .expect("missing rust/semantic_token_rules.json");
     let json = std::str::from_utf8(&content.data).expect("invalid utf-8 in semantic_token_rules");
     settings::parse_json_with_comments::<SemanticTokenRules>(json)
@@ -263,12 +262,7 @@ impl LspAdapter for RustLspAdapter {
         Some("rust-analyzer/flycheck".into())
     }
 
-    fn process_diagnostics(
-        &self,
-        params: &mut lsp::PublishDiagnosticsParams,
-        _: LanguageServerId,
-        _: Option<&'_ Buffer>,
-    ) {
+    fn process_diagnostics(&self, params: &mut lsp::PublishDiagnosticsParams, _: LanguageServerId) {
         static REGEX: LazyLock<Regex> =
             LazyLock::new(|| Regex::new(r"(?m)`([^`]+)\n`$").expect("Failed to create REGEX"));
 
@@ -1358,7 +1352,7 @@ mod tests {
                 },
             ],
         };
-        RustLspAdapter.process_diagnostics(&mut params, LanguageServerId(0), None);
+        RustLspAdapter.process_diagnostics(&mut params, LanguageServerId(0));
 
         assert_eq!(params.diagnostics[0].message, "use of moved value `a`");
 

crates/markdown/src/markdown.rs 🔗

@@ -7,6 +7,7 @@ use gpui::EdgesRefinement;
 use gpui::HitboxBehavior;
 use gpui::UnderlineStyle;
 use language::LanguageName;
+
 use log::Level;
 pub use path_range::{LineCol, PathWithRange};
 use settings::Settings as _;
@@ -1904,9 +1905,10 @@ impl MarkdownElementBuilder {
                 }
 
                 let mut run_style = self.text_style();
-                if let Some(highlight) = highlight_id.style(&self.syntax_theme) {
+                if let Some(highlight) = self.syntax_theme.get(highlight_id).cloned() {
                     run_style = run_style.highlight(highlight);
                 }
+
                 self.pending_line.runs.push(run_style.to_run(range.len()));
                 offset = range.end;
             }

crates/markdown_preview/src/markdown_elements.rs 🔗

@@ -3,6 +3,7 @@ use gpui::{
     UnderlineStyle, px,
 };
 use language::HighlightId;
+
 use std::{fmt::Display, ops::Range, path::PathBuf};
 use urlencoding;
 
@@ -242,7 +243,7 @@ impl MarkdownHighlight {
                 Some(highlight)
             }
 
-            MarkdownHighlight::Code(id) => id.style(theme),
+            MarkdownHighlight::Code(id) => theme.get(*id).cloned(),
         }
     }
 }

crates/markdown_preview/src/markdown_renderer.rs 🔗

@@ -15,6 +15,7 @@ use gpui::{
     Keystroke, Modifiers, ParentElement, Render, RenderImage, Resource, SharedString, Styled,
     StyledText, Task, TextStyle, WeakEntity, Window, div, img, pulsating_between, rems,
 };
+
 use settings::Settings;
 use std::{
     ops::{Mul, Range},
@@ -750,8 +751,9 @@ fn render_markdown_code_block(
         StyledText::new(parsed.contents.clone()).with_default_highlights(
             &cx.buffer_text_style,
             highlights.iter().filter_map(|(range, highlight_id)| {
-                highlight_id
-                    .style(cx.syntax_theme.as_ref())
+                cx.syntax_theme
+                    .get(*highlight_id)
+                    .cloned()
                     .map(|style| (range.clone(), style))
             }),
         )

crates/outline_panel/src/outline_panel.rs 🔗

@@ -25,6 +25,7 @@ use gpui::{
 use itertools::Itertools;
 use language::language_settings::LanguageSettings;
 use language::{Anchor, BufferId, BufferSnapshot, OffsetRangeExt, OutlineItem};
+
 use menu::{Cancel, SelectFirst, SelectLast, SelectNext, SelectPrevious};
 use std::{
     cmp,
@@ -236,7 +237,8 @@ impl SearchState {
                         }
                         let style = chunk
                             .syntax_highlight_id
-                            .and_then(|highlight| highlight.style(&theme));
+                            .and_then(|highlight| theme.get(highlight).cloned());
+
                         if let Some(style) = style {
                             let start = context_text.len();
                             let end = start + chunk.text.len();

crates/project/src/lsp_store.rs 🔗

@@ -71,10 +71,10 @@ use http_client::HttpClient;
 use itertools::Itertools as _;
 use language::{
     Bias, BinaryStatus, Buffer, BufferRow, BufferSnapshot, CachedLspAdapter, Capability, CodeLabel,
-    Diagnostic, DiagnosticEntry, DiagnosticSet, DiagnosticSourceKind, Diff, File as _, Language,
-    LanguageName, LanguageRegistry, LocalFile, LspAdapter, LspAdapterDelegate, LspInstaller,
-    ManifestDelegate, ManifestName, ModelineSettings, Patch, PointUtf16, TextBufferSnapshot,
-    ToOffset, ToPointUtf16, Toolchain, Transaction, Unclipped,
+    CodeLabelExt, Diagnostic, DiagnosticEntry, DiagnosticSet, DiagnosticSourceKind, Diff,
+    File as _, Language, LanguageName, LanguageRegistry, LocalFile, LspAdapter, LspAdapterDelegate,
+    LspInstaller, ManifestDelegate, ManifestName, ModelineSettings, Patch, PointUtf16,
+    TextBufferSnapshot, ToOffset, ToPointUtf16, Toolchain, Transaction, Unclipped,
     language_settings::{
         AllLanguageSettings, FormatOnSave, Formatter, LanguageSettings, all_language_settings,
     },
@@ -822,15 +822,7 @@ impl LocalLspStore {
                     let adapter = adapter.clone();
                     if let Some(this) = this.upgrade() {
                         this.update(cx, |this, cx| {
-                            {
-                                let buffer = params
-                                    .uri
-                                    .to_file_path()
-                                    .map(|file_path| this.get_buffer(&file_path, cx))
-                                    .ok()
-                                    .flatten();
-                                adapter.process_diagnostics(&mut params, server_id, buffer);
-                            }
+                            adapter.process_diagnostics(&mut params, server_id);
 
                             this.merge_lsp_diagnostics(
                                 DiagnosticSourceKind::Pushed,
@@ -843,9 +835,9 @@ impl LocalLspStore {
                                     ),
                                     registration_id: None,
                                 }],
-                                |_, diagnostic, cx| match diagnostic.source_kind {
+                                |_, diagnostic, _cx| match diagnostic.source_kind {
                                     DiagnosticSourceKind::Other | DiagnosticSourceKind::Pushed => {
-                                        adapter.retain_old_diagnostic(diagnostic, cx)
+                                        adapter.retain_old_diagnostic(diagnostic)
                                     }
                                     DiagnosticSourceKind::Pulled => true,
                                 },
@@ -11206,23 +11198,6 @@ impl LspStore {
         cx.background_spawn(futures::future::join_all(tasks).map(|_| ()))
     }
 
-    fn get_buffer<'a>(&self, abs_path: &Path, cx: &'a App) -> Option<&'a Buffer> {
-        let (worktree, relative_path) =
-            self.worktree_store.read(cx).find_worktree(&abs_path, cx)?;
-
-        let project_path = ProjectPath {
-            worktree_id: worktree.read(cx).id(),
-            path: relative_path,
-        };
-
-        Some(
-            self.buffer_store()
-                .read(cx)
-                .get_by_path(&project_path)?
-                .read(cx),
-        )
-    }
-
     #[cfg(any(test, feature = "test-support"))]
     pub fn update_diagnostics(
         &mut self,

crates/project/src/project.rs 🔗

@@ -1186,7 +1186,6 @@ impl Project {
                     worktree_store.clone(),
                     environment.clone(),
                     manifest_tree.clone(),
-                    fs.clone(),
                     cx,
                 )
             });

crates/project/src/toolchain_store.rs 🔗

@@ -4,7 +4,7 @@ use anyhow::{Context as _, Result, bail};
 
 use async_trait::async_trait;
 use collections::{BTreeMap, IndexSet};
-use fs::Fs;
+
 use gpui::{
     App, AppContext as _, AsyncApp, Context, Entity, EventEmitter, Subscription, Task, WeakEntity,
 };
@@ -62,7 +62,6 @@ impl ToolchainStore {
         worktree_store: Entity<WorktreeStore>,
         project_environment: Entity<ProjectEnvironment>,
         manifest_tree: Entity<ManifestTree>,
-        fs: Arc<dyn Fs>,
         cx: &mut Context<Self>,
     ) -> Self {
         let entity = cx.new(|_| LocalToolchainStore {
@@ -71,7 +70,6 @@ impl ToolchainStore {
             project_environment,
             active_toolchains: Default::default(),
             manifest_tree,
-            fs,
         });
         let _sub = cx.subscribe(&entity, |_, _, e: &ToolchainStoreEvent, cx| {
             cx.emit(e.clone())
@@ -418,7 +416,6 @@ pub struct LocalToolchainStore {
     project_environment: Entity<ProjectEnvironment>,
     active_toolchains: BTreeMap<(WorktreeId, LanguageName), BTreeMap<Arc<RelPath>, Toolchain>>,
     manifest_tree: Entity<ManifestTree>,
-    fs: Arc<dyn Fs>,
 }
 
 #[async_trait(?Send)]
@@ -507,7 +504,6 @@ impl LocalToolchainStore {
         let registry = self.languages.clone();
 
         let manifest_tree = self.manifest_tree.downgrade();
-        let fs = self.fs.clone();
 
         let environment = self.project_environment.clone();
         cx.spawn(async move |this, cx| {
@@ -554,12 +550,7 @@ impl LocalToolchainStore {
             cx.background_spawn(async move {
                 Some((
                     toolchains
-                        .list(
-                            worktree_root,
-                            relative_path.path.clone(),
-                            project_env,
-                            fs.as_ref(),
-                        )
+                        .list(worktree_root, relative_path.path.clone(), project_env)
                         .await,
                     relative_path.path,
                 ))
@@ -593,7 +584,6 @@ impl LocalToolchainStore {
     ) -> Task<Result<Toolchain>> {
         let registry = self.languages.clone();
         let environment = self.project_environment.clone();
-        let fs = self.fs.clone();
         cx.spawn(async move |_, cx| {
             let language = cx
                 .background_spawn(registry.language_for_name(&language_name.0))
@@ -612,12 +602,8 @@ impl LocalToolchainStore {
                     )
                 })
                 .await;
-            cx.background_spawn(async move {
-                toolchain_lister
-                    .resolve(path, project_env, fs.as_ref())
-                    .await
-            })
-            .await
+            cx.background_spawn(async move { toolchain_lister.resolve(path, project_env).await })
+                .await
         })
     }
 }

crates/project/tests/integration/project_tests.rs 🔗

@@ -11931,7 +11931,6 @@ fn python_lang(fs: Arc<FakeFs>) -> Arc<Language> {
             worktree_root: PathBuf,
             subroot_relative_path: Arc<RelPath>,
             _: Option<HashMap<String, String>>,
-            _: &dyn Fs,
         ) -> ToolchainList {
             // This lister will always return a path .venv directories within ancestors
             let ancestors = subroot_relative_path.ancestors().collect::<Vec<_>>();
@@ -11956,7 +11955,6 @@ fn python_lang(fs: Arc<FakeFs>) -> Arc<Language> {
             &self,
             _: PathBuf,
             _: Option<HashMap<String, String>>,
-            _: &dyn Fs,
         ) -> anyhow::Result<Toolchain> {
             Err(anyhow::anyhow!("Not implemented"))
         }

crates/theme/src/styles/syntax.rs 🔗

@@ -57,8 +57,8 @@ impl SyntaxTheme {
         )
     }
 
-    pub fn get(&self, highlight_index: usize) -> Option<&HighlightStyle> {
-        self.highlights.get(highlight_index)
+    pub fn get(&self, highlight_index: impl Into<usize>) -> Option<&HighlightStyle> {
+        self.highlights.get(highlight_index.into())
     }
 
     pub fn style_for_name(&self, name: &str) -> Option<HighlightStyle> {
@@ -67,7 +67,8 @@ impl SyntaxTheme {
             .map(|highlight_idx| self.highlights[*highlight_idx])
     }
 
-    pub fn get_capture_name(&self, idx: usize) -> Option<&str> {
+    pub fn get_capture_name(&self, idx: impl Into<usize>) -> Option<&str> {
+        let idx = idx.into();
         self.capture_name_map
             .iter()
             .find(|(_, value)| **value == idx)

crates/vim/src/state.rs 🔗

@@ -18,6 +18,7 @@ use gpui::{
     EntityId, Global, HighlightStyle, StyledText, Subscription, Task, TextStyle, WeakEntity,
 };
 use language::{Buffer, BufferEvent, BufferId, Chunk, Point};
+
 use multi_buffer::MultiBufferRow;
 use picker::{Picker, PickerDelegate};
 use project::{Project, ProjectItem, ProjectPath};
@@ -1402,8 +1403,8 @@ impl MarksMatchInfo {
         let mut offset = 0;
         for chunk in chunks {
             line.push_str(chunk.text);
-            if let Some(highlight_style) = chunk.syntax_highlight_id
-                && let Some(highlight) = highlight_style.style(cx.theme().syntax())
+            if let Some(highlight_id) = chunk.syntax_highlight_id
+                && let Some(highlight) = cx.theme().syntax().get(highlight_id).cloned()
             {
                 highlights.push((offset..offset + chunk.text.len(), highlight))
             }