Code block evals (#29619)

Richard Feldman created

Add a targeted eval for code block formatting, and revise the system
prompt accordingly.

### Eval before, n=8

<img width="728" alt="eval before"
src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2"
/>

### Eval after prompt change, n=8 (excluding the new evals, so just
testing the prompt change)

<img width="717" alt="eval after"
src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e"
/>

Release Notes:

- N/A

Change summary

Cargo.lock                                       |   1 
assets/prompts/assistant_system_prompt.hbs       |  76 ++++
crates/agent/src/active_thread.rs                | 265 +++++++++--------
crates/assistant_tools/src/edit_file_tool.rs     |  16 +
crates/eval/Cargo.toml                           |   1 
crates/eval/src/example.rs                       |  96 ++++++
crates/eval/src/examples/code_block_citations.rs | 191 ++++++++++++
crates/eval/src/examples/mod.rs                  |   2 
crates/markdown/src/markdown.rs                  |   2 
crates/markdown/src/path_range.rs                |  14 
10 files changed, 533 insertions(+), 131 deletions(-)

Detailed changes

Cargo.lock 🔗

@@ -4993,6 +4993,7 @@ dependencies = [
  "language_model",
  "language_models",
  "languages",
+ "markdown",
  "node_runtime",
  "pathdiff",
  "paths",

assets/prompts/assistant_system_prompt.hbs 🔗

@@ -39,18 +39,78 @@ If appropriate, use tool calls to explore the current project, which contains th
 
 ## Code Block Formatting
 
-Whenever you mention a code block, you MUST use ONLY use the following format when the code in the block comes from a file
-in the project:
-
+Whenever you mention a code block, you MUST use ONLY use the following format:
 ```path/to/Something.blah#L123-456
 (code goes here)
 ```
-
 The `#L123-456` means the line number range 123 through 456, and the path/to/Something.blah
-is a path in the project. (If this code block does not come from a file in the project, then you may instead use
-the normal markdown style of three backticks followed by language name. However, you MUST use this format if
-the code in the block comes from a file in the project.)
-
+is a path in the project. (If there is no valid path in the project, then you can use
+/dev/null/path.extension for its path.) This is the ONLY valid way to format code blocks, because the Markdown parser
+does not understand the more common ```language syntax, or bare ``` blocks. It only
+understands this path-based syntax, and if the path is missing, then it will error and you will have to do it over again.
+Just to be really clear about this, if you ever find yourself writing three backticks followed by a language name, STOP!
+You have made a mistake. You can only ever put paths after triple backticks!
+<example>
+Based on all the information I've gathered, here's a summary of how this system works:
+1. The README file is loaded into the system.
+2. The system finds the first two headers, including everything in between. In this case, that would be:
+```path/to/README.md#L8-12
+# First Header
+This is the info under the first header.
+## Sub-header
+```
+3. Then the system finds the last header in the README:
+```path/to/README.md#L27-29
+## Last Header
+This is the last header in the README.
+```
+4. Finally, it passes this information on to the next process.
+</example>
+<example>
+In Markdown, hash marks signify headings. For example:
+```/dev/null/example.md#L1-3
+# Level 1 heading
+## Level 2 heading
+### Level 3 heading
+```
+</example>
+Here are examples of ways you must never render code blocks:
+<bad_example_do_not_do_this>
+In Markdown, hash marks signify headings. For example:
+```
+# Level 1 heading
+## Level 2 heading
+### Level 3 heading
+```
+</bad_example_do_not_do_this>
+This example is unacceptable because it does not include the path.
+<bad_example_do_not_do_this>
+In Markdown, hash marks signify headings. For example:
+```markdown
+# Level 1 heading
+## Level 2 heading
+### Level 3 heading
+```
+</bad_example_do_not_do_this>
+This example is unacceptable because it has the language instead of the path.
+<bad_example_do_not_do_this>
+In Markdown, hash marks signify headings. For example:
+    # Level 1 heading
+    ## Level 2 heading
+    ### Level 3 heading
+</bad_example_do_not_do_this>
+This example is unacceptable because it uses indentation to mark the code block
+instead of backticks with a path.
+<bad_example_do_not_do_this>
+In Markdown, hash marks signify headings. For example:
+```markdown
+/dev/null/example.md#L1-3
+# Level 1 heading
+## Level 2 heading
+### Level 3 heading
+```
+</bad_example_do_not_do_this>
+This example is unacceptable because the path is in the wrong place. The path must be directly after the opening backticks.
 ## Fixing Diagnostics
 
 1. Make 1-2 attempts at fixing diagnostics, then defer to the user.

crates/agent/src/active_thread.rs 🔗

@@ -23,7 +23,7 @@ use gpui::{
     Task, TextStyle, TextStyleRefinement, Transformation, UnderlineStyle, WeakEntity, WindowHandle,
     linear_color_stop, linear_gradient, list, percentage, pulsating_between,
 };
-use language::{Buffer, LanguageRegistry};
+use language::{Buffer, Language, LanguageRegistry};
 use language_model::{
     LanguageModelRequestMessage, LanguageModelToolUseId, MessageContent, RequestUsage, Role,
     StopReason,
@@ -33,6 +33,7 @@ use markdown::{HeadingLevelStyles, Markdown, MarkdownElement, MarkdownStyle, Par
 use project::{ProjectEntryId, ProjectItem as _};
 use rope::Point;
 use settings::{Settings as _, update_settings_file};
+use std::ffi::OsStr;
 use std::path::Path;
 use std::rc::Rc;
 use std::sync::Arc;
@@ -346,130 +347,130 @@ fn render_markdown_code_block(
                 .child(Label::new("untitled").size(LabelSize::Small))
                 .into_any_element(),
         ),
-        CodeBlockKind::FencedLang(raw_language_name) => Some(
-            h_flex()
-                .gap_1()
-                .children(
+        CodeBlockKind::FencedLang(raw_language_name) => Some(render_code_language(
+            parsed_markdown.languages_by_name.get(raw_language_name),
+            raw_language_name.clone(),
+            cx,
+        )),
+        CodeBlockKind::FencedSrc(path_range) => path_range.path.file_name().map(|file_name| {
+            // We tell the model to use /dev/null for the path instead of using ```language
+            // because otherwise it consistently fails to use code citations.
+            if path_range.path.starts_with("/dev/null") {
+                let ext = path_range
+                    .path
+                    .extension()
+                    .and_then(OsStr::to_str)
+                    .map(|str| SharedString::new(str.to_string()))
+                    .unwrap_or_default();
+
+                render_code_language(
                     parsed_markdown
-                        .languages_by_name
-                        .get(raw_language_name)
-                        .and_then(|language| {
-                            language
-                                .config()
-                                .matcher
-                                .path_suffixes
-                                .iter()
-                                .find_map(|extension| {
-                                    file_icons::FileIcons::get_icon(Path::new(extension), cx)
-                                })
-                                .map(Icon::from_path)
-                                .map(|icon| icon.color(Color::Muted).size(IconSize::Small))
-                        }),
-                )
-                .child(
-                    Label::new(
-                        parsed_markdown
-                            .languages_by_name
-                            .get(raw_language_name)
-                            .map(|language| language.name().into())
-                            .clone()
-                            .unwrap_or_else(|| raw_language_name.clone()),
-                    )
-                    .size(LabelSize::Small),
+                        .languages_by_path
+                        .get(&path_range.path)
+                        .or_else(|| parsed_markdown.languages_by_name.get(&ext)),
+                    ext,
+                    cx,
                 )
-                .into_any_element(),
-        ),
-        CodeBlockKind::FencedSrc(path_range) => path_range.path.file_name().map(|file_name| {
-            let content = if let Some(parent) = path_range.path.parent() {
-                h_flex()
-                    .ml_1()
-                    .gap_1()
-                    .child(
-                        Label::new(file_name.to_string_lossy().to_string()).size(LabelSize::Small),
-                    )
-                    .child(
-                        Label::new(parent.to_string_lossy().to_string())
-                            .color(Color::Muted)
-                            .size(LabelSize::Small),
-                    )
-                    .into_any_element()
             } else {
-                Label::new(path_range.path.to_string_lossy().to_string())
-                    .size(LabelSize::Small)
-                    .ml_1()
-                    .into_any_element()
-            };
-
-            h_flex()
-                .id(("code-block-header-label", ix))
-                .w_full()
-                .max_w_full()
-                .px_1()
-                .gap_0p5()
-                .cursor_pointer()
-                .rounded_sm()
-                .hover(|item| item.bg(cx.theme().colors().element_hover.opacity(0.5)))
-                .tooltip(Tooltip::text("Jump to File"))
-                .child(
+                let content = if let Some(parent) = path_range.path.parent() {
                     h_flex()
-                        .gap_0p5()
-                        .children(
-                            file_icons::FileIcons::get_icon(&path_range.path, cx)
-                                .map(Icon::from_path)
-                                .map(|icon| icon.color(Color::Muted).size(IconSize::XSmall)),
+                        .ml_1()
+                        .gap_1()
+                        .child(
+                            Label::new(file_name.to_string_lossy().to_string())
+                                .size(LabelSize::Small),
                         )
-                        .child(content)
                         .child(
-                            Icon::new(IconName::ArrowUpRight)
-                                .size(IconSize::XSmall)
-                                .color(Color::Ignored),
-                        ),
-                )
-                .on_click({
-                    let path_range = path_range.clone();
-                    move |_, window, cx| {
-                        workspace
-                            .update(cx, {
-                                |workspace, cx| {
-                                    let Some(project_path) = workspace
-                                        .project()
-                                        .read(cx)
-                                        .find_project_path(&path_range.path, cx)
-                                    else {
-                                        return;
-                                    };
-                                    let Some(target) = path_range.range.as_ref().map(|range| {
-                                        Point::new(
-                                            // Line number is 1-based
-                                            range.start.line.saturating_sub(1),
-                                            range.start.col.unwrap_or(0),
-                                        )
-                                    }) else {
-                                        return;
-                                    };
-                                    let open_task =
-                                        workspace.open_path(project_path, None, true, window, cx);
-                                    window
-                                        .spawn(cx, async move |cx| {
-                                            let item = open_task.await?;
-                                            if let Some(active_editor) = item.downcast::<Editor>() {
-                                                active_editor
-                                                    .update_in(cx, |editor, window, cx| {
-                                                        editor.go_to_singleton_buffer_point(
-                                                            target, window, cx,
-                                                        );
-                                                    })
-                                                    .ok();
-                                            }
-                                            anyhow::Ok(())
-                                        })
-                                        .detach_and_log_err(cx);
-                                }
-                            })
-                            .ok();
-                    }
-                })
-                .into_any_element()
+                            Label::new(parent.to_string_lossy().to_string())
+                                .color(Color::Muted)
+                                .size(LabelSize::Small),
+                        )
+                        .into_any_element()
+                } else {
+                    Label::new(path_range.path.to_string_lossy().to_string())
+                        .size(LabelSize::Small)
+                        .ml_1()
+                        .into_any_element()
+                };
+
+                h_flex()
+                    .id(("code-block-header-label", ix))
+                    .w_full()
+                    .max_w_full()
+                    .px_1()
+                    .gap_0p5()
+                    .cursor_pointer()
+                    .rounded_sm()
+                    .hover(|item| item.bg(cx.theme().colors().element_hover.opacity(0.5)))
+                    .tooltip(Tooltip::text("Jump to File"))
+                    .child(
+                        h_flex()
+                            .gap_0p5()
+                            .children(
+                                file_icons::FileIcons::get_icon(&path_range.path, cx)
+                                    .map(Icon::from_path)
+                                    .map(|icon| icon.color(Color::Muted).size(IconSize::XSmall)),
+                            )
+                            .child(content)
+                            .child(
+                                Icon::new(IconName::ArrowUpRight)
+                                    .size(IconSize::XSmall)
+                                    .color(Color::Ignored),
+                            ),
+                    )
+                    .on_click({
+                        let path_range = path_range.clone();
+                        move |_, window, cx| {
+                            workspace
+                                .update(cx, {
+                                    |workspace, cx| {
+                                        let Some(project_path) = workspace
+                                            .project()
+                                            .read(cx)
+                                            .find_project_path(&path_range.path, cx)
+                                        else {
+                                            return;
+                                        };
+                                        let Some(target) = path_range.range.as_ref().map(|range| {
+                                            Point::new(
+                                                // Line number is 1-based
+                                                range.start.line.saturating_sub(1),
+                                                range.start.col.unwrap_or(0),
+                                            )
+                                        }) else {
+                                            return;
+                                        };
+                                        let open_task = workspace.open_path(
+                                            project_path,
+                                            None,
+                                            true,
+                                            window,
+                                            cx,
+                                        );
+                                        window
+                                            .spawn(cx, async move |cx| {
+                                                let item = open_task.await?;
+                                                if let Some(active_editor) =
+                                                    item.downcast::<Editor>()
+                                                {
+                                                    active_editor
+                                                        .update_in(cx, |editor, window, cx| {
+                                                            editor.go_to_singleton_buffer_point(
+                                                                target, window, cx,
+                                                            );
+                                                        })
+                                                        .ok();
+                                                }
+                                                anyhow::Ok(())
+                                            })
+                                            .detach_and_log_err(cx);
+                                    }
+                                })
+                                .ok();
+                        }
+                    })
+                    .into_any_element()
+            }
         }),
     };
 
@@ -604,6 +605,32 @@ fn render_markdown_code_block(
         )
 }
 
+fn render_code_language(
+    language: Option<&Arc<Language>>,
+    name_fallback: SharedString,
+    cx: &App,
+) -> AnyElement {
+    let icon_path = language.and_then(|language| {
+        language
+            .config()
+            .matcher
+            .path_suffixes
+            .iter()
+            .find_map(|extension| file_icons::FileIcons::get_icon(Path::new(extension), cx))
+            .map(Icon::from_path)
+    });
+
+    let language_label = language
+        .map(|language| language.name().into())
+        .unwrap_or(name_fallback);
+
+    h_flex()
+        .gap_1()
+        .children(icon_path.map(|icon| icon.color(Color::Muted).size(IconSize::Small)))
+        .child(Label::new(language_label).size(LabelSize::Small))
+        .into_any_element()
+}
+
 fn open_markdown_link(
     text: SharedString,
     workspace: WeakEntity<Workspace>,

crates/assistant_tools/src/edit_file_tool.rs 🔗

@@ -174,6 +174,7 @@ impl Tool for EditFileTool {
                     "The `old_string` and `new_string` are identical, so no changes would be made."
                 ));
             }
+            let old_string = input.old_string.clone();
 
             let result = cx
                 .background_spawn(async move {
@@ -213,6 +214,21 @@ impl Tool for EditFileTool {
                             input.path.display()
                         )
                     } else {
+                        let old_string_with_buffer = format!(
+                            "old_string:\n\n{}\n\n-------file-------\n\n{}",
+                            &old_string,
+                            buffer.text()
+                        );
+                        let path = {
+                            use std::collections::hash_map::DefaultHasher;
+                            use std::hash::{Hash, Hasher};
+
+                            let mut hasher = DefaultHasher::new();
+                            old_string_with_buffer.hash(&mut hasher);
+
+                            PathBuf::from(format!("failed_tool_{}.txt", hasher.finish()))
+                        };
+                        std::fs::write(path, old_string_with_buffer).unwrap();
                         anyhow!("Failed to match the provided `old_string`")
                     }
                 })?;

crates/eval/Cargo.toml 🔗

@@ -44,6 +44,7 @@ language_extension.workspace = true
 language_model.workspace = true
 language_models.workspace = true
 languages = { workspace = true, features = ["load-grammars"] }
+markdown.workspace = true
 node_runtime.workspace = true
 pathdiff.workspace = true
 paths.workspace = true

crates/eval/src/example.rs 🔗

@@ -10,13 +10,13 @@ use crate::{
     ToolMetrics,
     assertions::{AssertionsReport, RanAssertion, RanAssertionResult},
 };
-use agent::{ContextLoadResult, ThreadEvent};
+use agent::{ContextLoadResult, Thread, ThreadEvent};
 use anyhow::{Result, anyhow};
 use async_trait::async_trait;
 use buffer_diff::DiffHunkStatus;
 use collections::HashMap;
 use futures::{FutureExt as _, StreamExt, channel::mpsc, select_biased};
-use gpui::{AppContext, AsyncApp, Entity};
+use gpui::{App, AppContext, AsyncApp, Entity};
 use language_model::{LanguageModel, Role, StopReason};
 
 pub const THREAD_EVENT_TIMEOUT: Duration = Duration::from_secs(60 * 2);
@@ -314,7 +314,7 @@ impl ExampleContext {
             for message in thread.messages().skip(message_count_before) {
                 messages.push(Message {
                     _role: message.role,
-                    _text: message.to_string(),
+                    text: message.to_string(),
                     tool_use: thread
                         .tool_uses_for_message(message.id, cx)
                         .into_iter()
@@ -362,6 +362,90 @@ impl ExampleContext {
             })
             .unwrap()
     }
+
+    pub fn agent_thread(&self) -> Entity<Thread> {
+        self.agent_thread.clone()
+    }
+}
+
+impl AppContext for ExampleContext {
+    type Result<T> = anyhow::Result<T>;
+
+    fn new<T: 'static>(
+        &mut self,
+        build_entity: impl FnOnce(&mut gpui::Context<T>) -> T,
+    ) -> Self::Result<Entity<T>> {
+        self.app.new(build_entity)
+    }
+
+    fn reserve_entity<T: 'static>(&mut self) -> Self::Result<gpui::Reservation<T>> {
+        self.app.reserve_entity()
+    }
+
+    fn insert_entity<T: 'static>(
+        &mut self,
+        reservation: gpui::Reservation<T>,
+        build_entity: impl FnOnce(&mut gpui::Context<T>) -> T,
+    ) -> Self::Result<Entity<T>> {
+        self.app.insert_entity(reservation, build_entity)
+    }
+
+    fn update_entity<T, R>(
+        &mut self,
+        handle: &Entity<T>,
+        update: impl FnOnce(&mut T, &mut gpui::Context<T>) -> R,
+    ) -> Self::Result<R>
+    where
+        T: 'static,
+    {
+        self.app.update_entity(handle, update)
+    }
+
+    fn read_entity<T, R>(
+        &self,
+        handle: &Entity<T>,
+        read: impl FnOnce(&T, &App) -> R,
+    ) -> Self::Result<R>
+    where
+        T: 'static,
+    {
+        self.app.read_entity(handle, read)
+    }
+
+    fn update_window<T, F>(&mut self, window: gpui::AnyWindowHandle, f: F) -> Result<T>
+    where
+        F: FnOnce(gpui::AnyView, &mut gpui::Window, &mut App) -> T,
+    {
+        self.app.update_window(window, f)
+    }
+
+    fn read_window<T, R>(
+        &self,
+        window: &gpui::WindowHandle<T>,
+        read: impl FnOnce(Entity<T>, &App) -> R,
+    ) -> Result<R>
+    where
+        T: 'static,
+    {
+        self.app.read_window(window, read)
+    }
+
+    fn background_spawn<R>(
+        &self,
+        future: impl std::future::Future<Output = R> + Send + 'static,
+    ) -> gpui::Task<R>
+    where
+        R: Send + 'static,
+    {
+        self.app.background_spawn(future)
+    }
+
+    fn read_global<G, R>(&self, callback: impl FnOnce(&G, &App) -> R) -> Self::Result<R>
+    where
+        G: gpui::Global,
+    {
+        self.app.read_global(callback)
+    }
 }
 
 #[derive(Debug)]
@@ -391,12 +475,16 @@ impl Response {
     pub fn tool_uses(&self) -> impl Iterator<Item = &ToolUse> {
         self.messages.iter().flat_map(|msg| &msg.tool_use)
     }
+
+    pub fn texts(&self) -> impl Iterator<Item = String> {
+        self.messages.iter().map(|message| message.text.clone())
+    }
 }
 
 #[derive(Debug)]
 pub struct Message {
     _role: Role,
-    _text: String,
+    text: String,
     tool_use: Vec<ToolUse>,
 }
 

crates/eval/src/examples/code_block_citations.rs 🔗

@@ -0,0 +1,191 @@
+use anyhow::Result;
+use async_trait::async_trait;
+use markdown::PathWithRange;
+
+use crate::example::{Example, ExampleContext, ExampleMetadata, JudgeAssertion, LanguageServer};
+
+pub struct CodeBlockCitations;
+
+const FENCE: &str = "```";
+
+#[async_trait(?Send)]
+impl Example for CodeBlockCitations {
+    fn meta(&self) -> ExampleMetadata {
+        ExampleMetadata {
+            name: "code_block_citations".to_string(),
+            url: "https://github.com/zed-industries/zed.git".to_string(),
+            revision: "f69aeb6311dde3c0b8979c293d019d66498d54f2".to_string(),
+            language_server: Some(LanguageServer {
+                file_extension: "rs".to_string(),
+                allow_preexisting_diagnostics: false,
+            }),
+            max_assertions: None,
+        }
+    }
+
+    async fn conversation(&self, cx: &mut ExampleContext) -> Result<()> {
+        const FILENAME: &str = "assistant_tool.rs";
+        cx.push_user_message(format!(
+            r#"
+            Show me the method bodies of all the methods of the `Tool` trait in {FILENAME}.
+
+            Please show each method in a separate code snippet.
+            "#
+        ));
+
+        // Verify that the messages all have the correct formatting.
+        let texts: Vec<String> = cx.run_to_end().await?.texts().collect();
+        let closing_fence = format!("\n{FENCE}");
+
+        for text in texts.iter() {
+            let mut text = text.as_str();
+
+            while let Some(index) = text.find(FENCE) {
+                // Advance text past the opening backticks.
+                text = &text[index + FENCE.len()..];
+
+                // Find the closing backticks.
+                let content_len = text.find(&closing_fence);
+
+                // Verify the citation format - e.g. ```path/to/foo.txt#L123-456
+                if let Some(citation_len) = text.find('\n') {
+                    let citation = &text[..citation_len];
+
+                    if let Ok(()) =
+                        cx.assert(citation.contains("/"), format!("Slash in {citation:?}",))
+                    {
+                        let path_range = PathWithRange::new(citation);
+                        let path = cx
+                            .agent_thread()
+                            .update(cx, |thread, cx| {
+                                thread
+                                    .project()
+                                    .read(cx)
+                                    .find_project_path(path_range.path, cx)
+                            })
+                            .ok()
+                            .flatten();
+
+                        if let Ok(path) = cx.assert_some(path, format!("Valid path: {citation:?}"))
+                        {
+                            let buffer_text = {
+                                let buffer = match cx.agent_thread().update(cx, |thread, cx| {
+                                    thread
+                                        .project()
+                                        .update(cx, |project, cx| project.open_buffer(path, cx))
+                                }) {
+                                    Ok(buffer_task) => buffer_task.await.ok(),
+                                    Err(err) => {
+                                        cx.assert(
+                                            false,
+                                            format!("Expected Ok(buffer), not {err:?}"),
+                                        )
+                                        .ok();
+                                        break;
+                                    }
+                                };
+
+                                let Ok(buffer_text) = cx.assert_some(
+                                    buffer.and_then(|buffer| {
+                                        buffer.read_with(cx, |buffer, _| buffer.text()).ok()
+                                    }),
+                                    "Reading buffer text succeeded",
+                                ) else {
+                                    continue;
+                                };
+                                buffer_text
+                            };
+
+                            if let Some(content_len) = content_len {
+                                // + 1 because there's a newline character after the citation.
+                                let content =
+                                    &text[(citation.len() + 1)..content_len - (citation.len() + 1)];
+
+                                cx.assert(
+                                    buffer_text.contains(&content),
+                                    "Code block content was found in file",
+                                )
+                                .ok();
+
+                                if let Some(range) = path_range.range {
+                                    let start_line_index = range.start.line.saturating_sub(1);
+                                    let line_count =
+                                        range.end.line.saturating_sub(start_line_index);
+                                    let mut snippet = buffer_text
+                                        .lines()
+                                        .skip(start_line_index as usize)
+                                        .take(line_count as usize)
+                                        .collect::<Vec<&str>>()
+                                        .join("\n");
+
+                                    if let Some(start_col) = range.start.col {
+                                        snippet = snippet[start_col as usize..].to_string();
+                                    }
+
+                                    if let Some(end_col) = range.end.col {
+                                        let last_line = snippet.lines().last().unwrap();
+                                        snippet = snippet
+                                            [..snippet.len() - last_line.len() + end_col as usize]
+                                            .to_string();
+                                    }
+
+                                    cx.assert_eq(
+                                        snippet.as_str(),
+                                        content,
+                                        "Code block snippet was at specified line/col",
+                                    )
+                                    .ok();
+                                }
+                            }
+                        }
+                    }
+                } else {
+                    cx.assert(
+                        false,
+                        format!("Opening {FENCE} did not have a newline anywhere after it."),
+                    )
+                    .ok();
+                }
+
+                if let Some(content_len) = content_len {
+                    // Advance past the closing backticks
+                    text = &text[content_len + FENCE.len()..];
+                } else {
+                    // There were no closing backticks associated with these opening backticks.
+                    cx.assert(
+                        false,
+                        "Code block opening had matching closing backticks.".to_string(),
+                    )
+                    .ok();
+
+                    // There are no more code blocks to parse, so we're done.
+                    break;
+                }
+            }
+        }
+
+        Ok(())
+    }
+
+    fn thread_assertions(&self) -> Vec<JudgeAssertion> {
+        vec![
+            JudgeAssertion {
+                id: "trait method bodies are shown".to_string(),
+                description:
+                    "All method bodies of the Tool trait are shown."
+                        .to_string(),
+            },
+            JudgeAssertion {
+                id: "code blocks used".to_string(),
+                description:
+                   "All code snippets are rendered inside markdown code blocks (as opposed to any other formatting besides code blocks)."
+                        .to_string(),
+            },
+            JudgeAssertion {
+              id: "code blocks use backticks".to_string(),
+              description:
+                  format!("All markdown code blocks use backtick fences ({FENCE}) rather than indentation.")
+            }
+        ]
+    }
+}

crates/eval/src/examples/mod.rs 🔗

@@ -12,12 +12,14 @@ use util::serde::default_true;
 use crate::example::{Example, ExampleContext, ExampleMetadata, JudgeAssertion};
 
 mod add_arg_to_trait_method;
+mod code_block_citations;
 mod file_search;
 
 pub fn all(examples_dir: &Path) -> Vec<Rc<dyn Example>> {
     let mut threads: Vec<Rc<dyn Example>> = vec![
         Rc::new(file_search::FileSearchExample),
         Rc::new(add_arg_to_trait_method::AddArgToTraitMethod),
+        Rc::new(code_block_citations::CodeBlockCitations),
     ];
 
     for example_path in list_declarative_examples(examples_dir).unwrap() {

crates/markdown/src/markdown.rs 🔗

@@ -1,6 +1,8 @@
 pub mod parser;
 mod path_range;
 
+pub use path_range::{LineCol, PathWithRange};
+
 use std::borrow::Cow;
 use std::collections::HashSet;
 use std::iter;

crates/markdown/src/path_range.rs 🔗

@@ -32,6 +32,20 @@ impl LineCol {
 }
 
 impl PathWithRange {
+    // Note: We could try out this as an alternative, and see how it does on evals.
+    //
+    // The closest to a standard way of including a filename is this:
+    // ```rust filename="path/to/file.rs#42:43"
+    // ```
+    //
+    // or, alternatively,
+    // ```rust filename="path/to/file.rs" lines="42:43"
+    // ```
+    //
+    // Examples where it's used this way:
+    // - https://mdxjs.com/guides/syntax-highlighting/#syntax-highlighting-with-the-meta-field
+    // - https://docusaurus.io/docs/markdown-features/code-blocks
+    // - https://spec.commonmark.org/0.31.2/#example-143
     pub fn new(str: impl AsRef<str>) -> Self {
         let str = str.as_ref();
         // Sometimes the model will include a language at the start,