languages: Improve semantic token highlighting for parameters and Python (#52130)

Xin Zhao created 2 weeks ago

## Context

Zed's semantic token highlighting does not cover all token types
returned by language servers, so the highlighting looks fairly primitive
compared with tree-sitter highlighting, especially for Python language
servers. This PR adds some global and Python-specific rules for better
highlighting.

I need to admit that the built-in Python language servers currently have
weak semantic highlighting implementations. Pylance, the closed-source
Python language server from Microsoft, provides the best highlighting
for now, but I think ty will do better, even though it still has a long
way to go.
## How to Review

Basically, this is a rule-adding change. Some rules are made global, and
some are made Python-specific.

## Self-Review Checklist

<!-- Check before requesting review: -->
- [x] I've reviewed my own diff for quality, security, and reliability
- [x] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable

Release Notes:

- Improved semantic token highlighting for parameters and Python

Change summary

assets/settings/default_semantic_token_rules.json     | 15 +++++++++++++
crates/languages/src/lib.rs                           |  2 
crates/languages/src/python.rs                        | 11 ++++++++
crates/languages/src/python/semantic_token_rules.json | 15 +++++++++++++
4 files changed, 41 insertions(+), 2 deletions(-)

Detailed changes

assets/settings/default_semantic_token_rules.json 🔗

@@ -119,6 +119,16 @@
     "style": ["type"],
   },
   // References
+  {
+    "token_type": "parameter",
+    "token_modifiers": ["declaration"],
+    "style": ["variable.parameter"]
+  },
+  {
+    "token_type": "parameter",
+    "token_modifiers": ["definition"],
+    "style": ["variable.parameter"]
+  },
   {
     "token_type": "parameter",
     "token_modifiers": [],
@@ -201,6 +211,11 @@
     "token_modifiers": [],
     "style": ["comment"],
   },
+  {
+    "token_type": "string",
+    "token_modifiers": ["documentation"],
+    "style": ["string.doc"],
+  },
   {
     "token_type": "string",
     "token_modifiers": [],

crates/languages/src/lib.rs 🔗

@@ -191,7 +191,7 @@ pub fn init(languages: Arc<LanguageRegistry>, fs: Arc<dyn Fs>, node: NodeRuntime
             context: Some(python_context_provider),
             toolchain: Some(python_toolchain_provider),
             manifest_name: Some(SharedString::new_static("pyproject.toml").into()),
-            ..Default::default()
+            semantic_token_rules: Some(python::semantic_token_rules()),
         },
         LanguageInfo {
             name: "rust",

crates/languages/src/python.rs 🔗

@@ -24,7 +24,7 @@ use project::lsp_store::language_server_settings;
 use semver::Version;
 use serde::{Deserialize, Serialize};
 use serde_json::{Value, json};
-use settings::Settings;
+use settings::{SemanticTokenRules, Settings};
 use terminal::terminal_settings::TerminalSettings;
 
 use smol::lock::OnceCell;
@@ -37,6 +37,7 @@ use util::fs::{make_file_executable, remove_matching};
 use util::paths::PathStyle;
 use util::rel_path::RelPath;
 
+use crate::LanguageDir;
 use http_client::github_download::{GithubBinaryMetadata, download_server_binary};
 use parking_lot::Mutex;
 use std::str::FromStr;
@@ -49,6 +50,14 @@ use std::{
 use task::{ShellKind, TaskTemplate, TaskTemplates, VariableName};
 use util::{ResultExt, maybe};
 
+pub(crate) fn semantic_token_rules() -> SemanticTokenRules {
+    let content = LanguageDir::get("python/semantic_token_rules.json")
+        .expect("missing python/semantic_token_rules.json");
+    let json = std::str::from_utf8(&content.data).expect("invalid utf-8 in semantic_token_rules");
+    settings::parse_json_with_comments::<SemanticTokenRules>(json)
+        .expect("failed to parse python semantic_token_rules.json")
+}
+
 #[derive(Debug, Serialize, Deserialize)]
 pub(crate) struct PythonToolchainData {
     #[serde(flatten)]

crates/languages/src/python/semantic_token_rules.json 🔗

@@ -0,0 +1,15 @@
+[
+  {
+    "token_type": "selfParameter",
+    "style": ["variable.special"]
+  },
+  {
+    "token_type": "clsParameter",
+    "style": ["variable.special"]
+  },
+  // ty specific
+  {
+    "token_type": "builtinConstant",
+    "style": ["constant.builtin"]
+  }
+]