diff --git a/crates/edit_prediction_cli/src/prompts/repair.md b/crates/edit_prediction_cli/src/prompts/repair.md index 4fc0b6b66201ac9dec5147655c073b3defcf7455..8ff3390af29cef83f4b23c141b6a772fb326cfc2 100644 --- a/crates/edit_prediction_cli/src/prompts/repair.md +++ b/crates/edit_prediction_cli/src/prompts/repair.md @@ -1,103 +1,42 @@ -# Instructions - -You are an edit prediction assistant in a code editor. Your task is to generate an improved prediction based on feedback from a quality assessment. - -A previous model generated a prediction that was judged to have issues. Your job is to generate a better prediction that addresses the feedback. - -## Focus on - -- Completing any partially-applied changes made -- Ensuring consistency with the programming style and patterns already established -- Making edits that maintain or improve code quality -- NOT reverting or undoing changes the user intentionally made - -## Rules - -- **NEVER undo or revert the user's recent edits.** Examine the diff in the edit history carefully: - - If a line was removed (starts with `-`), do NOT restore that content—even if the code now appears incomplete or broken without it - - If a line was added (starts with `+`), do NOT delete or significantly modify it - - If code appears broken or incomplete after the user's edit, output `NO_EDITS` rather than "fixing" it by reverting - - Only add NEW content that extends the user's work forward; never restore what they removed - - **Key test**: if your prediction would make the code more similar to what it was BEFORE the user's edit, output `NO_EDITS` instead - - **Never assume a deletion was accidental.** Even if removing content breaks the code, breaks a pattern, or leaves text looking "incomplete", respect it. The user may be mid-rewrite. Do NOT "complete" partial text by restoring what was deleted. -- Do not just mechanically apply patterns - reason about what changes make sense given the context and the programmer's apparent goals. -- Do not just fix syntax errors - look for the broader refactoring pattern and apply it systematically throughout the code. -- Keep existing formatting unless it's absolutely necessary -- When edit history and surrounding code suggest different edits, prioritize the most recent edits in the history as they best reflect current intent. -- When uncertain, predict only the minimal, high-confidence portion of the edit. Prefer a small, correct prediction over a large, speculative one. -- Don't write a lot of code if you're not sure what to do -- Do not delete or remove text that was just added in the edit history. If a recent edit introduces incomplete or incorrect code, finish or fix it in place, or simply output `NO_EDITS` rather than removing it. Only remove a recent edit if the history explicitly shows the user undoing it themselves. -- Treat partial text at or near the cursor as the beginning of something the user is actively typing. Complete the code the user appears to be creating based on context. - -# Input Format - -You will be provided with: -1. The user's *edit history*, in chronological order. Use this to infer the user's trajectory and predict the next most logical edit. -2. A set of *related excerpts* from the user's codebase. Some of these may be needed for correctly predicting the next edit. - - `…` may appear within a related file to indicate that some code has been skipped. -3. An excerpt from the user's *current file*. - - Within the user's current file, there is an *editable region* delimited by the `<|editable_region_start|>` and `<|editable_region_end|>` tags. You can only predict edits in this region. - - The `<|user_cursor|>` tag marks the user's current cursor position, as it stands after the last edit in the history. -4. The *previous prediction* that was generated and needs improvement. -5. *Quality feedback* explaining why the previous prediction was problematic. - -# Output Format - -- Briefly explain what was wrong with the previous prediction and how you'll improve it. -- Output the entire editable region, applying the edits that you predict the user will make next. -- If you're unsure about some portion of the next edit, you may still predict the surrounding code (such as a function definition, `for` loop, etc) and place the `<|user_cursor|>` within it for the user to fill in. -- Wrap the edited code in a codeblock with exactly five backticks. -- There are two special outputs for when you don't want to generate a new prediction. **These have different meanings — use the correct one:** - - 1. **`NO_EDITS`** — The code is already complete and correct as-is. No edits should be made at all. The editable region should remain unchanged. Use this when: - - The code needs no modifications whatsoever - - Any prediction would revert or undo the user's intentional changes - - You are unsure what edit to make and prefer to do nothing - - ````` - NO_EDITS - ````` - - 2. **`KEEP_PREVIOUS`** — The previous prediction was actually correct and should be used as-is. Use this when: - - After reviewing the quality feedback, you determine the previous prediction is good - - You cannot find a meaningful improvement over the previous prediction - - The quality feedback was too cautious and the previous prediction correctly addresses the user's intent - - ````` - KEEP_PREVIOUS - ````` - - **Important:** `NO_EDITS` and `KEEP_PREVIOUS` are NOT interchangeable. - - `NO_EDITS` means "make zero changes to the code" (empty prediction). - - `KEEP_PREVIOUS` means "the previous prediction is correct, use it" (reuse the previous prediction). - - If you believe the previous prediction was correct, you MUST use `KEEP_PREVIOUS`, not `NO_EDITS`. Using `NO_EDITS` would discard the previous prediction entirely. - -# 1. User Edits History +# Repair Request + +Your previous prediction has quality issues that need to be addressed. Please generate an improved prediction. + +## Quality Feedback + +{quality_feedback} + +## Your Previous Prediction (word-diff format) ````` -{edit_history} +{actual_patch_word_diff} ````` -# 2. Related excerpts +## Instructions -{context} +Generate an improved prediction following the same rules and output format from the original instructions. The key rules remain: -# 3. Current File +- **NEVER undo or revert the user's recent edits** — if a line was removed in the edit history, do NOT restore it +- If your prediction would make the code more similar to what it was BEFORE the user's edit, output `NO_EDITS` instead +- When uncertain, predict only the minimal, high-confidence portion of the edit -{cursor_excerpt} +## Output Format -# 4. Previous Prediction (needs improvement) +Follow the same output format as before, with one addition: -The previous model generated the following edit (in word-diff format): +- If the code is complete as-is and no edits should be made, output `NO_EDITS` +- **NEW: If your previous prediction was actually correct** (the quality feedback was overly cautious), output `KEEP_PREVIOUS`: -````` -{actual_patch_word_diff} -````` + ````` + KEEP_PREVIOUS + ````` -# 5. Quality Feedback + Use `KEEP_PREVIOUS` when you determine the original prediction correctly addresses the user's intent despite the feedback. -{quality_feedback} +**Important:** `NO_EDITS` and `KEEP_PREVIOUS` are NOT interchangeable: +- `NO_EDITS` = make zero changes to the code (discard the previous prediction) +- `KEEP_PREVIOUS` = the previous prediction is correct, use it as-is -# Your Improved Prediction +## Your Improved Prediction -Based on the feedback above, generate an improved prediction. Address the issues identified in the quality feedback. If the previous prediction was actually correct, output `KEEP_PREVIOUS`. If no edits should be made at all, output `NO_EDITS`. \ No newline at end of file +Briefly explain what was wrong with your previous prediction (or why it was actually correct), then provide the improved output. \ No newline at end of file diff --git a/crates/edit_prediction_cli/src/repair.rs b/crates/edit_prediction_cli/src/repair.rs index 910f2449f7589d9165a881e7758df09c1b256200..e1e588b0174ed9db5fdf52470cead38eea28019d 100644 --- a/crates/edit_prediction_cli/src/repair.rs +++ b/crates/edit_prediction_cli/src/repair.rs @@ -10,7 +10,7 @@ use crate::{ BatchProvider, PredictionProvider, anthropic_client::AnthropicClient, example::{ActualCursor, Example, ExamplePrediction}, - format_prompt::{TeacherPrompt, extract_cursor_excerpt_from_example, extract_last_codeblock}, + format_prompt::{TeacherPrompt, extract_last_codeblock}, openai_client::OpenAiClient, parse_output::run_parse_output, paths::LLM_CACHE_DB, @@ -148,16 +148,15 @@ fn build_score_feedback(example: &Example) -> Option { Some(feedback) } -/// Build the repair prompt for an example that needs improvement. -pub fn build_repair_prompt(example: &Example) -> Result { +/// Build the repair message (Turn 3) for a multi-turn conversation. +/// +/// This message is sent after the original teacher prompt (Turn 1) and +/// teacher response (Turn 2) to request an improved prediction. +pub fn build_repair_message(example: &Example) -> Result { let prediction = example .predictions .first() .context("no predictions available")?; - let prompt_inputs = example - .prompt_inputs - .as_ref() - .context("prompt_inputs missing (run context retrieval first)")?; let actual_patch = prediction .actual_patch .as_ref() @@ -169,35 +168,8 @@ pub fn build_repair_prompt(example: &Example) -> Result { let actual_patch_word_diff = unified_to_word_diff(actual_patch); - let mut edit_history = String::new(); - for event in &prompt_inputs.edit_history { - match event.as_ref() { - zeta_prompt::Event::BufferChange { - path, - old_path, - diff, - predicted: _, - in_open_source_repo: _, - } => { - edit_history.push_str(&format!("--- a{}\n", old_path.display())); - edit_history.push_str(&format!("+++ b{}\n", path.display())); - let diff_word_diff = unified_to_word_diff(diff); - edit_history.push_str(&diff_word_diff); - edit_history.push_str("\n\n"); - } - } - } - - let context = TeacherPrompt::format_context(example); - - let cursor_excerpt = - extract_cursor_excerpt_from_example(example).context("failed to extract cursor excerpt")?; - let prompt_template = crate::prompt_assets::get_prompt("repair.md"); Ok(prompt_template - .replace("{edit_history}", &edit_history) - .replace("{context}", &context) - .replace("{cursor_excerpt}", &cursor_excerpt) .replace("{actual_patch_word_diff}", &actual_patch_word_diff) .replace("{quality_feedback}", &quality_feedback)) } @@ -266,6 +238,12 @@ static OPENAI_CLIENT_BATCH: OnceLock = OnceLock::new(); static OPENAI_CLIENT_PLAIN: OnceLock = OnceLock::new(); /// Run repair for a single example. +/// +/// This sends a multi-turn conversation to the LLM: +/// - Turn 1 (User): Original teacher prompt +/// - Turn 2 (Assistant): Original teacher response +/// - Turn 3 (User): Repair critique and instructions +/// - Turn 4 (Assistant): Improved prediction (the response we parse) pub async fn run_repair( example: &mut Example, args: &RepairArgs, @@ -289,10 +267,20 @@ pub async fn run_repair( anyhow::bail!("no predictions available (run predict first)"); } + let teacher_prompt = example + .prompt + .as_ref() + .context("prompt missing (run format_prompt first)")?; + + let teacher_response = &example.predictions[0].actual_output; + if teacher_response.is_empty() { + anyhow::bail!("teacher response is empty (run predict first)"); + } + let step_progress = example_progress.start(Step::Repair); let model = model_for_backend(args.backend); - let prompt = build_repair_prompt(example).context("Failed to build repair prompt")?; + let repair_message = build_repair_message(example).context("Failed to build repair message")?; step_progress.set_substatus("generating"); @@ -309,13 +297,32 @@ pub async fn run_repair( }) }; - let messages = vec![anthropic::Message { - role: anthropic::Role::User, - content: vec![anthropic::RequestContent::Text { - text: prompt, - cache_control: None, - }], - }]; + let messages = vec![ + // Turn 1: Original teacher prompt + anthropic::Message { + role: anthropic::Role::User, + content: vec![anthropic::RequestContent::Text { + text: teacher_prompt.input.clone(), + cache_control: None, + }], + }, + // Turn 2: Original teacher response + anthropic::Message { + role: anthropic::Role::Assistant, + content: vec![anthropic::RequestContent::Text { + text: teacher_response.clone(), + cache_control: None, + }], + }, + // Turn 3: Repair critique and instructions + anthropic::Message { + role: anthropic::Role::User, + content: vec![anthropic::RequestContent::Text { + text: repair_message, + cache_control: None, + }], + }, + ]; let Some(response) = client.generate(model, 16384, messages, None, false).await? else { return Ok(()); @@ -341,9 +348,21 @@ pub async fn run_repair( }) }; - let messages = vec![open_ai::RequestMessage::User { - content: open_ai::MessageContent::Plain(prompt), - }]; + let messages = vec![ + // Turn 1: Original teacher prompt + open_ai::RequestMessage::User { + content: open_ai::MessageContent::Plain(teacher_prompt.input.clone()), + }, + // Turn 2: Original teacher response + open_ai::RequestMessage::Assistant { + content: Some(open_ai::MessageContent::Plain(teacher_response.clone())), + tool_calls: vec![], + }, + // Turn 3: Repair critique and instructions + open_ai::RequestMessage::User { + content: open_ai::MessageContent::Plain(repair_message), + }, + ]; let Some(response) = client.generate(model, 16384, messages, None, false).await? else { return Ok(());