Rope benchmarks: Generate random strings measured in bytes, not chars (#39951)

Martin Pool created 1 week ago

Follows on from https://github.com/zed-industries/zed/pull/39949.

Again I'm not 100% sure of the intent but I think this is a fix:

`generate_random_string(rng, 4096)` would previously give you a string
of 4096 *chars* which could be anywhere between 4kB and 16kB in bytes.
This seems probably not what was intended, because Ropes generally work
in bytes not chars, including for the offsets used to index into them.

This seems to possibly cause a _regression_ in benchmark performance,
which is surprising because it should generally cause smaller test data.
But, possibly it's doing better at exercising different paths?

cc @mrnugget 

Release Notes:

- N/A

Change summary

crates/rope/benches/rope_benchmark.rs | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

Detailed changes

crates/rope/benches/rope_benchmark.rs 🔗

@@ -9,11 +9,21 @@ use rope::{Point, Rope};
 use sum_tree::Bias;
 use util::RandomCharIter;
 
-/// Generate a random text of the given length using the provided RNG.
+/// Returns a biased random string whose UTF-8 length is close to but no more than `len` bytes.
 ///
-/// *Note*: The length is in *characters*, not bytes.
-fn generate_random_text(rng: &mut StdRng, text_len: usize) -> String {
-    RandomCharIter::new(rng).take(text_len).collect()
+/// The string is biased towards characters expected to occur in text or likely to exercise edge
+/// cases.
+fn generate_random_text(rng: &mut StdRng, len: usize) -> String {
+    let mut str = String::with_capacity(len);
+    let mut chars = RandomCharIter::new(rng);
+    loop {
+        let ch = chars.next().unwrap();
+        if str.len() + ch.len_utf8() > len {
+            break;
+        }
+        str.push(ch);
+    }
+    str
 }
 
 fn generate_random_rope(rng: &mut StdRng, text_len: usize) -> Rope {