Improve content generation prompt to reduce over-generation (#16333)

Nathan Sobo created

I focused on cases where we're inserting doc comments or annotations
above symbols.

I added 5 new examples to the content generation prompt, covering
various scenarios:

1. Inserting documentation for a Rust struct
2. Writing docstrings for a Python class
3. Adding comments to a TypeScript method
4. Adding a derive attribute to a Rust struct
5. Adding a decorator to a Python class

These examples demonstrate how to handle different languages and common
tasks like adding documentation, attributes, and decorators.

To improve context integration, I've made the following changes:

1. Added a `transform_context_range` that includes 3 lines before and
after the transform range
2. Introduced `rewrite_section_prefix` and `rewrite_section_suffix` to
provide more context around the section being rewritten
3. Updated the prompt template to include this additional context in a
separate code snippet

Release Notes:

- Reduced instances of over-generation when inserting docs or
annotations above a symbol.

Change summary

assets/prompts/content_prompt.hbs        | 416 ++++++++++++++++++++++++-
crates/assistant/src/inline_assistant.rs |  11 
crates/assistant/src/prompts.rs          |  22 +
3 files changed, 425 insertions(+), 24 deletions(-)

Detailed changes

assets/prompts/content_prompt.hbs 🔗

@@ -1,52 +1,426 @@
-Here's a text file that I'm going to ask you to make an edit to.
+You are an expert developer assistant working in an AI-enabled text editor.
+Your task is to rewrite a specific section of the provided document based on a user-provided prompt.
 
-{{#if language_name}}
-The file is in {{language_name}}.
-{{/if}}
+<guidelines>
+1. Scope: Modify only content within <rewrite_this> tags. Do not alter anything outside these boundaries.
+2. Precision: Make changes strictly necessary to fulfill the given prompt. Preserve all other content as-is.
+3. Seamless integration: Ensure rewritten sections flow naturally with surrounding text and maintain document structure.
+4. Tag exclusion: Never include <rewrite_this>, </rewrite_this>, <edit_here>, or <insert_here> tags in the output.
+5. Indentation: Maintain the original indentation level of the file in rewritten sections.
+6. Completeness: Rewrite the entire tagged section, even if only partial changes are needed. Avoid omissions or elisions.
+7. Insertions: Replace <insert_here></insert_here> tags with appropriate content as specified by the prompt.
+8. Code integrity: Respect existing code structure and functionality when making changes.
+9. Consistency: Maintain a uniform style and tone throughout the rewritten text.
+</guidelines>
 
-You need to rewrite a portion of it.
+<examples>
+<example>
+<input>
+<document>
+use std::cell::Cell;
+use std::collections::HashMap;
+use std::cmp;
 
-The section you'll need to edit is marked with <rewrite_this></rewrite_this> tags.
+<rewrite_this>
+<insert_here></insert_here>
+</rewrite_this>
+pub struct LruCache<K, V> {
+    /// The maximum number of items the cache can hold.
+    capacity: usize,
+    /// The map storing the cached items.
+    items: HashMap<K, V>,
+}
 
+// The rest of the implementation...
+</document>
+<prompt>
+doc this
+</prompt>
+</input>
+
+<incorrect_output failure="Over-generation. The text starting with `pub struct AabbTree<T> {` is *after* the rewrite_this tag">
+/// Represents an Axis-Aligned Bounding Box (AABB) tree data structure.
+///
+/// This structure is used for efficient spatial queries and collision detection.
+/// It organizes objects in a hierarchical tree structure based on their bounding boxes.
+///
+/// # Type Parameters
+///
+/// * `T`: The type of data associated with each node in the tree.
+pub struct AabbTree<T> {
+  root: Option<usize>,
+</incorrect_output>
+<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
+/// Represents an Axis-Aligned Bounding Box (AABB) tree data structure.
+///
+/// This structure is used for efficient spatial queries and collision detection.
+/// It organizes objects in a hierarchical tree structure based on their bounding boxes.
+///
+/// # Type Parameters
+///
+/// * `T`: The type of data associated with each node in the tree.
+</corrected_output>
+</example>
+
+<example>
+<input>
 <document>
-{{{document_content}}}
+import math
+
+def calculate_circle_area(radius):
+    """Calculate the area of a circle given its radius."""
+    return math.pi * radius ** 2
+
+<rewrite_this>
+<insert_here></insert_here>
+</rewrite_this>
+class Circle:
+    def __init__(self, radius):
+        self.radius = radius
+
+    def area(self):
+        return math.pi * self.radius ** 2
+
+    def circumference(self):
+        return 2 * math.pi * self.radius
+
+# Usage example
+circle = Circle(5)
+print(f"Area: {circle.area():.2f}")
+print(f"Circumference: {circle.circumference():.2f}")
 </document>
+<prompt>
+write docs
+</prompt>
+</input>
 
-{{#if is_truncated}}
-The context around the relevant section has been truncated (possibly in the middle of a line) for brevity.
-{{/if}}
+<incorrect_output failure="Over-generation. The text starting with `class Circle:` is *after* the rewrite_this tag">
+"""
+Represents a circle with methods to calculate its area and circumference.
+
+This class provides a simple way to work with circles in a geometric context.
+It allows for the creation of Circle objects with a specified radius and
+offers methods to compute the circle's area and circumference.
+
+Attributes:
+    radius (float): The radius of the circle.
 
-Rewrite the section of {{content_type}} in <rewrite_this></rewrite_this> tags based on the following prompt:
+Methods:
+    area(): Calculates and returns the area of the circle.
+    circumference(): Calculates and returns the circumference of the circle.
+"""
+class Circle:
+</incorrect_output>
+<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
+"""
+Represents a circle with methods to calculate its area and circumference.
 
+This class provides a simple way to work with circles in a geometric context.
+It allows for the creation of Circle objects with a specified radius and
+offers methods to compute the circle's area and circumference.
+
+Attributes:
+    radius (float): The radius of the circle.
+
+Methods:
+    area(): Calculates and returns the area of the circle.
+    circumference(): Calculates and returns the circumference of the circle.
+"""
+</corrected_output>
+</example>
+
+<example>
+<input>
+<document>
+class BankAccount {
+  private balance: number;
+
+  constructor(initialBalance: number) {
+    this.balance = initialBalance;
+  }
+
+  <rewrite_this>
+  <insert_here></insert_here>
+  </rewrite_this>
+  deposit(amount: number): void {
+    if (amount > 0) {
+      this.balance += amount;
+    }
+  }
+
+  withdraw(amount: number): boolean {
+    if (amount > 0 && this.balance >= amount) {
+      this.balance -= amount;
+      return true;
+    }
+    return false;
+  }
+
+  getBalance(): number {
+    return this.balance;
+  }
+}
+
+// Usage
+const account = new BankAccount(1000);
+account.deposit(500);
+console.log(account.getBalance()); // 1500
+account.withdraw(200);
+console.log(account.getBalance()); // 1300
+</document>
 <prompt>
-{{{user_prompt}}}
+//
 </prompt>
+</input>
 
-Here's the section to edit based on that prompt again for reference:
+<incorrect_output failure="Over-generation. The text starting with `deposit(amount: number): void {` is *after* the rewrite_this tag">
+  /**
+   * Deposits the specified amount into the bank account.
+   *
+   * @param amount The amount to deposit. Must be a positive number.
+   * @throws Error if the amount is not positive.
+   */
+  deposit(amount: number): void {
+    if (amount > 0) {
+      this.balance += amount;
+    } else {
+      throw new Error("Deposit amount must be positive");
+    }
+  }
+</incorrect_output>
+<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
+  /**
+   * Deposits the specified amount into the bank account.
+   *
+   * @param amount The amount to deposit. Must be a positive number.
+   * @throws Error if the amount is not positive.
+   */
+</corrected_output>
+</example>
+
+<example>
+<input>
+<document>
+use std::collections::VecDeque;
+
+pub struct BinaryTree<T> {
+    root: Option<Node<T>>,
+}
 
 <rewrite_this>
-{{{rewrite_section}}}
+<insert_here></insert_here>
 </rewrite_this>
+struct Node<T> {
+    value: T,
+    left: Option<Box<Node<T>>>,
+    right: Option<Box<Node<T>>>,
+}
+</document>
+<prompt>
+derive clone
+</prompt>
+</input>
+
+<incorrect_output failure="Over-generation below the rewrite_this tags. Extra space between derive annotation and struct definition.">
+#[derive(Clone)]
+
+struct Node<T> {
+    value: T,
+    left: Option<Box<Node<T>>>,
+    right: Option<Box<Node<T>>>,
+}
+</incorrect_output>
+
+<incorrect_output failure="Over-generation above the rewrite_this tags">
+pub struct BinaryTree<T> {
+    root: Option<Node<T>>,
+}
+
+#[derive(Clone)]
+</incorrect_output>
+
+<incorrect_output failure="Over-generation below the rewrite_this tags">
+#[derive(Clone)]
+struct Node<T> {
+    value: T,
+    left: Option<Box<Node<T>>>,
+    right: Option<Box<Node<T>>>,
+}
 
-You'll rewrite this entire section, but you will only make changes within certain subsections.
+impl<T> Node<T> {
+    fn new(value: T) -> Self {
+        Node {
+            value,
+            left: None,
+            right: None,
+        }
+    }
+}
+</incorrect_output>
+<corrected_output improvement="Only includes the new content within the rewrite_this tags">
+#[derive(Clone)]
+</corrected_output>
+</example>
 
+<example>
+<input>
+<document>
+import math
+
+def calculate_circle_area(radius):
+    """Calculate the area of a circle given its radius."""
+    return math.pi * radius ** 2
+
+<rewrite_this>
+<insert_here></insert_here>
+</rewrite_this>
+class Circle:
+    def __init__(self, radius):
+        self.radius = radius
+
+    def area(self):
+        return math.pi * self.radius ** 2
+
+    def circumference(self):
+        return 2 * math.pi * self.radius
+
+# Usage example
+circle = Circle(5)
+print(f"Area: {circle.area():.2f}")
+print(f"Circumference: {circle.circumference():.2f}")
+</document>
+<prompt>
+add dataclass decorator
+</prompt>
+</input>
+
+<incorrect_output failure="Over-generation. The text starting with `class Circle:` is *after* the rewrite_this tag">
+@dataclass
+class Circle:
+    radius: float
+
+    def __init__(self, radius):
+        self.radius = radius
+
+    def area(self):
+        return math.pi * self.radius ** 2
+</incorrect_output>
+<corrected_output improvement="Generation stops before repeating content after the rewrite_this section">
+@dataclass
+</corrected_output>
+</example>
+
+<example>
+<input>
+<document>
+interface ShoppingCart {
+  items: string[];
+  total: number;
+}
+
+<rewrite_this>
+<insert_here></insert_here>class ShoppingCartManager {
+</rewrite_this>
+  private cart: ShoppingCart;
+
+  constructor() {
+    this.cart = { items: [], total: 0 };
+  }
+
+  addItem(item: string, price: number): void {
+    this.cart.items.push(item);
+    this.cart.total += price;
+  }
+
+  getTotal(): number {
+    return this.cart.total;
+  }
+}
+
+// Usage
+const manager = new ShoppingCartManager();
+manager.addItem("Book", 15.99);
+console.log(manager.getTotal()); // 15.99
+</document>
+<prompt>
+add readonly modifier
+</prompt>
+</input>
+
+<incorrect_output failure="Over-generation. The line starting with `    items: string[];` is *after* the rewrite_this tag">
+readonly interface ShoppingCart {
+  items: string[];
+  total: number;
+}
+
+class ShoppingCartManager {
+  private readonly cart: ShoppingCart;
+
+  constructor() {
+    this.cart = { items: [], total: 0 };
+  }
+</incorrect_output>
+<corrected_output improvement="Only includes the new content within the rewrite_this tags and integrates cleanly into surrounding code">
+readonly interface ShoppingCart {
+</corrected_output>
+</example>
+
+</examples>
+
+With these examples in mind, edit the following file:
+
+<document language="{{ language_name }}">
+{{{ document_content }}}
+</document>
+
+{{#if is_truncated}}
+The provided document has been truncated (potentially mid-line) for brevity.
+{{/if}}
+
+<instructions>
 {{#if has_insertion}}
-Insert text anywhere you see it marked with with <insert_here></insert_here> tags. Do not include <insert_here> tags in your output.
+Insert text anywhere you see marked with <insert_here></insert_here> tags. It's CRITICAL that you DO NOT include <insert_here> tags in your output.
 {{/if}}
 {{#if has_replacement}}
-Edit edit text that you see surrounded with <edit_here></edit_here> tags. Do not include <edit_here> tags in your output.
+Edit text that you see surrounded with <edit_here>...</edit_here> tags. It's CRITICAL that you DO NOT include <edit_here> tags in your output.
 {{/if}}
+Make no changes to the rewritten content outside these tags.
 
+<snippet language="Rust" annotated="true">
+{{{ rewrite_section_prefix }}}
 <rewrite_this>
-{{{rewrite_section_with_selections}}}
+{{{ rewrite_section_with_edits }}}
 </rewrite_this>
+{{{ rewrite_section_suffix }}}
+</snippet>
+
+Rewrite the lines enclosed within the <rewrite_this></rewrite_this> tags in accordance with the provided instructions and the prompt below.
+
+<prompt>
+{{{ user_prompt }}}
+</prompt>
+
+Do not include <insert_here> or <edit_here> annotations in your output. Here is a clean copy of the snippet without annotations for your reference.
 
-Only make changes that are necessary to fulfill the prompt, leave everything else as-is. All surrounding {{content_type}} will be preserved. Do not output the <rewrite_this></rewrite this> tags or anything outside of them.
+<snippet>
+{{{ rewrite_section_prefix }}}
+{{{ rewrite_section }}}
+{{{ rewrite_section_suffix }}}
+</snippet>
+</instructions>
 
-Start at the indentation level in the original file in the rewritten {{content_type}}. Don't stop until you've rewritten the entire section, even if you have no more changes to make. Always write out the whole section with no unnecessary elisions.
+<guidelines_reminder>
+1. Focus on necessary changes: Modify only what's required to fulfill the prompt.
+2. Preserve context: Maintain all surrounding content as-is, ensuring the rewritten section seamlessly integrates with the existing document structure and flow.
+3. Exclude annotation tags: Do not output <rewrite_this>, </rewrite_this>, <edit_here>, or <insert_here> tags.
+4. Maintain indentation: Begin at the original file's indentation level.
+5. Complete rewrite: Continue until the entire section is rewritten, even if no further changes are needed.
+6. Avoid elisions: Always write out the full section without unnecessary omissions. NEVER say `// ...` or `// ...existing code` in your output.
+7. Respect content boundaries: Preserve code integrity.
+</guidelines_reminder>
 
 Immediately start with the following format with no remarks:
 
 ```
-\{{REWRITTEN_CODE}}
+{{REWRITTEN_CODE}}
 ```

crates/assistant/src/inline_assistant.rs 🔗

@@ -45,6 +45,7 @@ use std::{
     task::{self, Poll},
     time::{Duration, Instant},
 };
+use text::OffsetRangeExt as _;
 use theme::ThemeSettings;
 use ui::{prelude::*, CheckboxWithLabel, IconButtonShape, Popover, Tooltip};
 use util::{RangeExt, ResultExt};
@@ -2354,6 +2355,15 @@ impl Codegen {
             return Err(anyhow::anyhow!("invalid transformation range"));
         };
 
+        let mut transform_context_range = transform_range.to_point(&transform_buffer);
+        transform_context_range.start.row = transform_context_range.start.row.saturating_sub(3);
+        transform_context_range.start.column = 0;
+        transform_context_range.end =
+            (transform_context_range.end + Point::new(3, 0)).min(transform_buffer.max_point());
+        transform_context_range.end.column =
+            transform_buffer.line_len(transform_context_range.end.row);
+        let transform_context_range = transform_context_range.to_offset(&transform_buffer);
+
         let selected_ranges = self
             .selected_ranges
             .iter()
@@ -2376,6 +2386,7 @@ impl Codegen {
                 transform_buffer,
                 transform_range,
                 selected_ranges,
+                transform_context_range,
             )
             .map_err(|e| anyhow::anyhow!("Failed to generate content prompt: {}", e))?;
 

crates/assistant/src/prompts.rs 🔗

@@ -16,7 +16,9 @@ pub struct ContentPromptContext {
     pub document_content: String,
     pub user_prompt: String,
     pub rewrite_section: String,
-    pub rewrite_section_with_selections: String,
+    pub rewrite_section_prefix: String,
+    pub rewrite_section_suffix: String,
+    pub rewrite_section_with_edits: String,
     pub has_insertion: bool,
     pub has_replacement: bool,
 }
@@ -173,6 +175,7 @@ impl PromptBuilder {
         buffer: BufferSnapshot,
         transform_range: Range<usize>,
         selected_ranges: Vec<Range<usize>>,
+        transform_context_range: Range<usize>,
     ) -> Result<String, RenderError> {
         let content_type = match language_name {
             None | Some("Markdown" | "Plain Text") => "text",
@@ -202,6 +205,7 @@ impl PromptBuilder {
         for chunk in buffer.text_for_range(truncated_before) {
             document_content.push_str(chunk);
         }
+
         document_content.push_str("<rewrite_this>\n");
         for chunk in buffer.text_for_range(transform_range.clone()) {
             document_content.push_str(chunk);
@@ -217,7 +221,17 @@ impl PromptBuilder {
             rewrite_section.push_str(chunk);
         }
 
-        let rewrite_section_with_selections = {
+        let mut rewrite_section_prefix = String::new();
+        for chunk in buffer.text_for_range(transform_context_range.start..transform_range.start) {
+            rewrite_section_prefix.push_str(chunk);
+        }
+
+        let mut rewrite_section_suffix = String::new();
+        for chunk in buffer.text_for_range(transform_range.end..transform_context_range.end) {
+            rewrite_section_suffix.push_str(chunk);
+        }
+
+        let rewrite_section_with_edits = {
             let mut section_with_selections = String::new();
             let mut last_end = 0;
             for selected_range in &selected_ranges {
@@ -254,7 +268,9 @@ impl PromptBuilder {
             document_content,
             user_prompt,
             rewrite_section,
-            rewrite_section_with_selections,
+            rewrite_section_prefix,
+            rewrite_section_suffix,
+            rewrite_section_with_edits,
             has_insertion,
             has_replacement,
         };