Re-add code block formatting instructions (#29574)
Richard Feldman
created 7 months ago
Re-enabled instructions about code block formatting.
In practice, the model doesn't seem to use these very often, but there's
no negative effect on evals. In a future PR, I'll experiment with adding
more evals around the model actually using the code blocks.
2 runs before: (`--repetitions=8`)
```
=================================================================
AGGREGATE
=================================================================
4 examples failed to run!
Average programmatic score: 37%
Average diff score: 66%
Average thread score: 93%
-----------------------------------------------------------------
CUMULATIVE TOOL METRICS
-----------------------------------------------------------------
ββββββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ
β Tool β Uses β Failures β Rate β
ββββββββββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββ€
βedit_file β 398 β 53 β 13% β
βterminal β 11 β 1 β 9% β
βcreate_file β 40 β 2 β 5% β
βread_file β 245 β 8 β 3% β
βfind_path β 48 β 0 β 0% β
βlist_directory β 13 β 0 β 0% β
βgrep β 133 β 0 β 0% β
βthinking β 18 β 0 β 0% β
βdiagnostics β 130 β 0 β 0% β
```
```
=================================================================
AGGREGATE
=================================================================
1 examples failed to run!
Average programmatic score: 41%
Average diff score: 68%
Average thread score: 96%
-----------------------------------------------------------------
CUMULATIVE TOOL METRICS
-----------------------------------------------------------------
ββββββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ
β Tool β Uses β Failures β Rate β
ββββββββββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββ€
βfetch β 1 β 1 β 100% β
βedit_file β 553 β 63 β 11% β
βread_file β 349 β 3 β 1% β
βdiagnostics β 158 β 0 β 0% β
βfind_path β 70 β 0 β 0% β
βlist_directory β 10 β 0 β 0% β
βthinking β 45 β 0 β 0% β
βgrep β 213 β 0 β 0% β
βcreate_file β 24 β 0 β 0% β
βterminal β 17 β 0 β 0% β
ββββββββββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ
```
1 run after this change:
```
=================================================================
AGGREGATE
=================================================================
Average programmatic score: 42%
Average diff score: 74%
Average thread score: 100%
-----------------------------------------------------------------
CUMULATIVE TOOL METRICS
-----------------------------------------------------------------
ββββββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ
β Tool β Uses β Failures β Rate β
ββββββββββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββ€
βedit_file β 534 β 92 β 17% β
βread_file β 325 β 6 β 2% β
βlist_directory β 6 β 0 β 0% β
βthinking β 12 β 0 β 0% β
βcreate_file β 16 β 0 β 0% β
βdiagnostics β 49 β 0 β 0% β
βgrep β 234 β 0 β 0% β
βfind_path β 65 β 0 β 0% β
βterminal β 38 β 0 β 0% β
ββββββββββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ
```
Release Notes:
- N/A
Change summary
assets/prompts/assistant_system_prompt.hbs | 14 ++++++++++++++
1 file changed, 14 insertions(+)
Detailed changes
@@ -36,6 +36,20 @@ If appropriate, use tool calls to explore the current project, which contains th
- The user might specify a partial file path. If you don't know the full path, use `find_path` (not `grep`) before you read the file.
{{/if}}
+## Code Block Formatting
+
+Whenever you mention a code block, you MUST use ONLY use the following format when the code in the block comes from a file
+in the project:
+
+```path/to/Something.blah#L123-456
+(code goes here)
+```
+
+The `#L123-456` means the line number range 123 through 456, and the path/to/Something.blah
+is a path in the project. (If this code block does not come from a file in the project, then you may instead use
+the normal markdown style of three backticks followed by language name. However, you MUST use this format if
+the code in the block comes from a file in the project.)
+
## Fixing Diagnostics
1. Make 1-2 attempts at fixing diagnostics, then defer to the user.