@@ -1,103 +1,42 @@
-# Instructions
-
-You are an edit prediction assistant in a code editor. Your task is to generate an improved prediction based on feedback from a quality assessment.
-
-A previous model generated a prediction that was judged to have issues. Your job is to generate a better prediction that addresses the feedback.
-
-## Focus on
-
-- Completing any partially-applied changes made
-- Ensuring consistency with the programming style and patterns already established
-- Making edits that maintain or improve code quality
-- NOT reverting or undoing changes the user intentionally made
-
-## Rules
-
-- **NEVER undo or revert the user's recent edits.** Examine the diff in the edit history carefully:
- - If a line was removed (starts with `-`), do NOT restore that contentโeven if the code now appears incomplete or broken without it
- - If a line was added (starts with `+`), do NOT delete or significantly modify it
- - If code appears broken or incomplete after the user's edit, output `NO_EDITS` rather than "fixing" it by reverting
- - Only add NEW content that extends the user's work forward; never restore what they removed
- - **Key test**: if your prediction would make the code more similar to what it was BEFORE the user's edit, output `NO_EDITS` instead
- - **Never assume a deletion was accidental.** Even if removing content breaks the code, breaks a pattern, or leaves text looking "incomplete", respect it. The user may be mid-rewrite. Do NOT "complete" partial text by restoring what was deleted.
-- Do not just mechanically apply patterns - reason about what changes make sense given the context and the programmer's apparent goals.
-- Do not just fix syntax errors - look for the broader refactoring pattern and apply it systematically throughout the code.
-- Keep existing formatting unless it's absolutely necessary
-- When edit history and surrounding code suggest different edits, prioritize the most recent edits in the history as they best reflect current intent.
-- When uncertain, predict only the minimal, high-confidence portion of the edit. Prefer a small, correct prediction over a large, speculative one.
-- Don't write a lot of code if you're not sure what to do
-- Do not delete or remove text that was just added in the edit history. If a recent edit introduces incomplete or incorrect code, finish or fix it in place, or simply output `NO_EDITS` rather than removing it. Only remove a recent edit if the history explicitly shows the user undoing it themselves.
-- Treat partial text at or near the cursor as the beginning of something the user is actively typing. Complete the code the user appears to be creating based on context.
-
-# Input Format
-
-You will be provided with:
-1. The user's *edit history*, in chronological order. Use this to infer the user's trajectory and predict the next most logical edit.
-2. A set of *related excerpts* from the user's codebase. Some of these may be needed for correctly predicting the next edit.
- - `โฆ` may appear within a related file to indicate that some code has been skipped.
-3. An excerpt from the user's *current file*.
- - Within the user's current file, there is an *editable region* delimited by the `<|editable_region_start|>` and `<|editable_region_end|>` tags. You can only predict edits in this region.
- - The `<|user_cursor|>` tag marks the user's current cursor position, as it stands after the last edit in the history.
-4. The *previous prediction* that was generated and needs improvement.
-5. *Quality feedback* explaining why the previous prediction was problematic.
-
-# Output Format
-
-- Briefly explain what was wrong with the previous prediction and how you'll improve it.
-- Output the entire editable region, applying the edits that you predict the user will make next.
-- If you're unsure about some portion of the next edit, you may still predict the surrounding code (such as a function definition, `for` loop, etc) and place the `<|user_cursor|>` within it for the user to fill in.
-- Wrap the edited code in a codeblock with exactly five backticks.
-- There are two special outputs for when you don't want to generate a new prediction. **These have different meanings โ use the correct one:**
-
- 1. **`NO_EDITS`** โ The code is already complete and correct as-is. No edits should be made at all. The editable region should remain unchanged. Use this when:
- - The code needs no modifications whatsoever
- - Any prediction would revert or undo the user's intentional changes
- - You are unsure what edit to make and prefer to do nothing
-
- `````
- NO_EDITS
- `````
-
- 2. **`KEEP_PREVIOUS`** โ The previous prediction was actually correct and should be used as-is. Use this when:
- - After reviewing the quality feedback, you determine the previous prediction is good
- - You cannot find a meaningful improvement over the previous prediction
- - The quality feedback was too cautious and the previous prediction correctly addresses the user's intent
-
- `````
- KEEP_PREVIOUS
- `````
-
- **Important:** `NO_EDITS` and `KEEP_PREVIOUS` are NOT interchangeable.
- - `NO_EDITS` means "make zero changes to the code" (empty prediction).
- - `KEEP_PREVIOUS` means "the previous prediction is correct, use it" (reuse the previous prediction).
- - If you believe the previous prediction was correct, you MUST use `KEEP_PREVIOUS`, not `NO_EDITS`. Using `NO_EDITS` would discard the previous prediction entirely.
-
-# 1. User Edits History
+# Repair Request
+
+Your previous prediction has quality issues that need to be addressed. Please generate an improved prediction.
+
+## Quality Feedback
+
+{quality_feedback}
+
+## Your Previous Prediction (word-diff format)
`````
-{edit_history}
+{actual_patch_word_diff}
`````
-# 2. Related excerpts
+## Instructions
-{context}
+Generate an improved prediction following the same rules and output format from the original instructions. The key rules remain:
-# 3. Current File
+- **NEVER undo or revert the user's recent edits** โ if a line was removed in the edit history, do NOT restore it
+- If your prediction would make the code more similar to what it was BEFORE the user's edit, output `NO_EDITS` instead
+- When uncertain, predict only the minimal, high-confidence portion of the edit
-{cursor_excerpt}
+## Output Format
-# 4. Previous Prediction (needs improvement)
+Follow the same output format as before, with one addition:
-The previous model generated the following edit (in word-diff format):
+- If the code is complete as-is and no edits should be made, output `NO_EDITS`
+- **NEW: If your previous prediction was actually correct** (the quality feedback was overly cautious), output `KEEP_PREVIOUS`:
-`````
-{actual_patch_word_diff}
-`````
+ `````
+ KEEP_PREVIOUS
+ `````
-# 5. Quality Feedback
+ Use `KEEP_PREVIOUS` when you determine the original prediction correctly addresses the user's intent despite the feedback.
-{quality_feedback}
+**Important:** `NO_EDITS` and `KEEP_PREVIOUS` are NOT interchangeable:
+- `NO_EDITS` = make zero changes to the code (discard the previous prediction)
+- `KEEP_PREVIOUS` = the previous prediction is correct, use it as-is
-# Your Improved Prediction
+## Your Improved Prediction
-Based on the feedback above, generate an improved prediction. Address the issues identified in the quality feedback. If the previous prediction was actually correct, output `KEEP_PREVIOUS`. If no edits should be made at all, output `NO_EDITS`.
+Briefly explain what was wrong with your previous prediction (or why it was actually correct), then provide the improved output.
@@ -10,7 +10,7 @@ use crate::{
BatchProvider, PredictionProvider,
anthropic_client::AnthropicClient,
example::{ActualCursor, Example, ExamplePrediction},
- format_prompt::{TeacherPrompt, extract_cursor_excerpt_from_example, extract_last_codeblock},
+ format_prompt::{TeacherPrompt, extract_last_codeblock},
openai_client::OpenAiClient,
parse_output::run_parse_output,
paths::LLM_CACHE_DB,
@@ -148,16 +148,15 @@ fn build_score_feedback(example: &Example) -> Option<String> {
Some(feedback)
}
-/// Build the repair prompt for an example that needs improvement.
-pub fn build_repair_prompt(example: &Example) -> Result<String> {
+/// Build the repair message (Turn 3) for a multi-turn conversation.
+///
+/// This message is sent after the original teacher prompt (Turn 1) and
+/// teacher response (Turn 2) to request an improved prediction.
+pub fn build_repair_message(example: &Example) -> Result<String> {
let prediction = example
.predictions
.first()
.context("no predictions available")?;
- let prompt_inputs = example
- .prompt_inputs
- .as_ref()
- .context("prompt_inputs missing (run context retrieval first)")?;
let actual_patch = prediction
.actual_patch
.as_ref()
@@ -169,35 +168,8 @@ pub fn build_repair_prompt(example: &Example) -> Result<String> {
let actual_patch_word_diff = unified_to_word_diff(actual_patch);
- let mut edit_history = String::new();
- for event in &prompt_inputs.edit_history {
- match event.as_ref() {
- zeta_prompt::Event::BufferChange {
- path,
- old_path,
- diff,
- predicted: _,
- in_open_source_repo: _,
- } => {
- edit_history.push_str(&format!("--- a{}\n", old_path.display()));
- edit_history.push_str(&format!("+++ b{}\n", path.display()));
- let diff_word_diff = unified_to_word_diff(diff);
- edit_history.push_str(&diff_word_diff);
- edit_history.push_str("\n\n");
- }
- }
- }
-
- let context = TeacherPrompt::format_context(example);
-
- let cursor_excerpt =
- extract_cursor_excerpt_from_example(example).context("failed to extract cursor excerpt")?;
-
let prompt_template = crate::prompt_assets::get_prompt("repair.md");
Ok(prompt_template
- .replace("{edit_history}", &edit_history)
- .replace("{context}", &context)
- .replace("{cursor_excerpt}", &cursor_excerpt)
.replace("{actual_patch_word_diff}", &actual_patch_word_diff)
.replace("{quality_feedback}", &quality_feedback))
}
@@ -266,6 +238,12 @@ static OPENAI_CLIENT_BATCH: OnceLock<OpenAiClient> = OnceLock::new();
static OPENAI_CLIENT_PLAIN: OnceLock<OpenAiClient> = OnceLock::new();
/// Run repair for a single example.
+///
+/// This sends a multi-turn conversation to the LLM:
+/// - Turn 1 (User): Original teacher prompt
+/// - Turn 2 (Assistant): Original teacher response
+/// - Turn 3 (User): Repair critique and instructions
+/// - Turn 4 (Assistant): Improved prediction (the response we parse)
pub async fn run_repair(
example: &mut Example,
args: &RepairArgs,
@@ -289,10 +267,20 @@ pub async fn run_repair(
anyhow::bail!("no predictions available (run predict first)");
}
+ let teacher_prompt = example
+ .prompt
+ .as_ref()
+ .context("prompt missing (run format_prompt first)")?;
+
+ let teacher_response = &example.predictions[0].actual_output;
+ if teacher_response.is_empty() {
+ anyhow::bail!("teacher response is empty (run predict first)");
+ }
+
let step_progress = example_progress.start(Step::Repair);
let model = model_for_backend(args.backend);
- let prompt = build_repair_prompt(example).context("Failed to build repair prompt")?;
+ let repair_message = build_repair_message(example).context("Failed to build repair message")?;
step_progress.set_substatus("generating");
@@ -309,13 +297,32 @@ pub async fn run_repair(
})
};
- let messages = vec![anthropic::Message {
- role: anthropic::Role::User,
- content: vec![anthropic::RequestContent::Text {
- text: prompt,
- cache_control: None,
- }],
- }];
+ let messages = vec![
+ // Turn 1: Original teacher prompt
+ anthropic::Message {
+ role: anthropic::Role::User,
+ content: vec![anthropic::RequestContent::Text {
+ text: teacher_prompt.input.clone(),
+ cache_control: None,
+ }],
+ },
+ // Turn 2: Original teacher response
+ anthropic::Message {
+ role: anthropic::Role::Assistant,
+ content: vec![anthropic::RequestContent::Text {
+ text: teacher_response.clone(),
+ cache_control: None,
+ }],
+ },
+ // Turn 3: Repair critique and instructions
+ anthropic::Message {
+ role: anthropic::Role::User,
+ content: vec![anthropic::RequestContent::Text {
+ text: repair_message,
+ cache_control: None,
+ }],
+ },
+ ];
let Some(response) = client.generate(model, 16384, messages, None, false).await? else {
return Ok(());
@@ -341,9 +348,21 @@ pub async fn run_repair(
})
};
- let messages = vec![open_ai::RequestMessage::User {
- content: open_ai::MessageContent::Plain(prompt),
- }];
+ let messages = vec![
+ // Turn 1: Original teacher prompt
+ open_ai::RequestMessage::User {
+ content: open_ai::MessageContent::Plain(teacher_prompt.input.clone()),
+ },
+ // Turn 2: Original teacher response
+ open_ai::RequestMessage::Assistant {
+ content: Some(open_ai::MessageContent::Plain(teacher_response.clone())),
+ tool_calls: vec![],
+ },
+ // Turn 3: Repair critique and instructions
+ open_ai::RequestMessage::User {
+ content: open_ai::MessageContent::Plain(repair_message),
+ },
+ ];
let Some(response) = client.generate(model, 16384, messages, None, false).await? else {
return Ok(());