buffer: Fix panic when multi-byte character is used in languages like Swift (#25739)

smit created 1 year ago

Closes #25471

In languages like Swift, names can be concatinated in form like `class
Example: UI`, notice here `Example` and `:` are two different words.
Before, `name_ranges`translation of above text would look like:

```
"class" -> [0..5]
" Example" -> [5..13] (Spaces are intentional)
"e:" -> [12..14] (This is incorrect, and should be ":" -> [13..14])
" UI" -> [14..16]
```

Because this translation does not account for concatinated words, this
might affect queries, but most importantly this panics when multi-byte
character (`ф`) is used in place of `e`, as it then tries to access
index which lies inside that multi-byte. For example, it panics on
`class Examplф: UI`.

---

This PR fixes this by handing concatinated words when calculating
`name_ranges`.

Now, the corrected ranges will look like:

```
"class" -> [0..5]
" Example" -> [5..13]
":" -> [13..14] (Now it's correct)
" UI" -> [14..16]
```

and for multi-byte character

```
"class" -> [0..5]
" Examplф" -> [5..14] (Notice ф takes two bytes)
":" -> [14..15]
" UI" -> [15..17]
```

This way, it no longer tries to access a previous index, preventing a
panic when that index contains a multi-byte character.

Release Notes:

- Fixed a panic when Cyrillic characters are used in languages like
Swift.

Change summary

crates/language/src/buffer.rs | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)

Detailed changes

crates/language/src/buffer.rs 🔗

@@ -3507,24 +3507,13 @@ impl BufferSnapshot {
             true,
         );
         let mut last_buffer_range_end = 0;
+
         for (buffer_range, is_name) in buffer_ranges {
-            if !text.is_empty() && buffer_range.start > last_buffer_range_end {
+            let space_added = !text.is_empty() && buffer_range.start > last_buffer_range_end;
+            if space_added {
                 text.push(' ');
             }
-            last_buffer_range_end = buffer_range.end;
-            if is_name {
-                let mut start = text.len();
-                let end = start + buffer_range.len();
-
-                // When multiple names are captured, then the matchable text
-                // includes the whitespace in between the names.
-                if !name_ranges.is_empty() {
-                    start -= 1;
-                }
-
-                name_ranges.push(start..end);
-            }
-
+            let before_append_len = text.len();
             let mut offset = buffer_range.start;
             chunks.seek(buffer_range.clone());
             for mut chunk in chunks.by_ref() {
@@ -3548,6 +3537,16 @@ impl BufferSnapshot {
                     break;
                 }
             }
+            if is_name {
+                let after_append_len = text.len();
+                let start = if space_added && !name_ranges.is_empty() {
+                    before_append_len - 1
+                } else {
+                    before_append_len
+                };
+                name_ranges.push(start..after_append_len);
+            }
+            last_buffer_range_end = buffer_range.end;
         }
 
         Some(OutlineItem {