1#[derive(Clone, Debug)]
2pub struct Template {
3 pub name: &'static str,
4 pub content: &'static str,
5}
6
7pub fn all_templates() -> Vec<Template> {
8 vec![
9 Template {
10 name: "ProjectCreation",
11 content: r#"
12# Project Creation Evaluation Template
13
14## Instructions
15
16Evaluate how well the AI assistant created a new implementation from scratch. Score it between 0.0 and 1.0 based on quality and fulfillment of requirements.
17- 1.0 = Perfect implementation that creates all necessary files with correct functionality.
18- 0.0 = Completely fails to create working files or meet requirements.
19
20Note: A git diff output is required. If no code changes are provided (i.e., no git diff output), the score must be 0.0.
21
22## Evaluation Criteria
23
24Please consider the following aspects in order of importance:
25
261. **File Creation (25%)**
27 - Did the assistant create all necessary files?
28 - Are the files appropriately named and organized?
29 - Did the assistant create a complete solution without missing components?
30
312. **Functional Correctness (40%)**
32 - Does the implementation fulfill all specified requirements?
33 - Does it handle edge cases properly?
34 - Is it free of logical errors and bugs?
35 - Do all components work together as expected?
36
373. **Code Quality (20%)**
38 - Is the code well-structured, readable and well-documented?
39 - Does it follow language-specific best practices?
40 - Is there proper error handling?
41 - Are naming conventions clear and consistent?
42
434. **Architecture Design (15%)**
44 - Is the code modular and extensible?
45 - Is there proper separation of concerns?
46 - Are appropriate design patterns used?
47 - Is the overall architecture appropriate for the requirements?
48
49## Input
50
51Requirements:
52<!-- ```requirements go here``` -->
53
54Reference Implementation:
55<!-- ```reference code goes here``` -->
56
57AI-Generated Implementation (git diff output):
58<!-- ```git diff goes here``` -->
59
60## Output Format
61
62THE ONLY OUTPUT SHOULD BE A SCORE BETWEEN 0.0 AND 1.0.
63
64EXAMPLE ONE:
65
660.92
67
68EXAMPLE TWO:
69
700.85
71
72EXAMPLE THREE:
73
740.78
75"#,
76 },
77 Template {
78 name: "CodeModification",
79 content: r#"
80# Code Modification Evaluation Template
81
82## Instructions
83
84Evaluate how well the AI assistant modified existing code to meet requirements. Score between 0.0 and 1.0 based on quality and appropriateness of changes.
85- 1.0 = Perfect modifications that correctly implement all requirements.
86- 0.0 = Failed to make appropriate changes or introduced serious errors.
87
88## Evaluation Criteria
89
90Please consider the following aspects in order of importance:
91
921. **Functional Correctness (50%)**
93 - Do the modifications correctly implement the requirements?
94 - Did the assistant modify the right files and code sections?
95 - Are the changes free of bugs and logical errors?
96 - Do the modifications maintain compatibility with existing code?
97
982. **Modification Approach (25%)**
99 - Are the changes minimal and focused on what needs to be changed?
100 - Did the assistant avoid unnecessary modifications?
101 - Are the changes integrated seamlessly with the existing codebase?
102 - Did the assistant preserve the original code style and patterns?
103
1043. **Code Quality (15%)**
105 - Are the modifications well-structured and documented?
106 - Do they follow the same conventions as the original code?
107 - Is there proper error handling in the modified code?
108 - Are the changes readable and maintainable?
109
1104. **Solution Completeness (10%)**
111 - Do the modifications completely address all requirements?
112 - Are there any missing changes or overlooked requirements?
113 - Did the assistant consider all necessary edge cases?
114
115## Input
116
117Original:
118<!-- ```reference code goes here``` -->
119
120New (git diff output):
121<!-- ```git diff goes here``` -->
122
123## Output Format
124
125THE ONLY OUTPUT SHOULD BE A SCORE BETWEEN 0.0 AND 1.0.
126
127EXAMPLE ONE:
128
1290.92
130
131EXAMPLE TWO:
132
1330.85
134
135EXAMPLE THREE:
136
1370.78
138"#,
139 },
140 Template {
141 name: "ConversationalGuidance",
142 content: r#"
143# Conversational Guidance Evaluation Template
144
145## Instructions
146
147Evaluate the quality of the AI assistant's conversational guidance and score it between 0.0 and 1.0.
148- 1.0 = Perfect guidance with ideal information gathering, clarification, and advice without writing code.
149- 0.0 = Completely unhelpful, inappropriate guidance, or wrote code when it should not have.
150
151## Evaluation Criteria
152
153ABSOLUTE REQUIREMENT:
154 - The assistant should NOT generate complete code solutions in conversation mode.
155 - If the git diff shows the assistant wrote complete code, the score should be significantly reduced.
156
1571. **Information Gathering Effectiveness (30%)**
158 - Did the assistant ask relevant and precise questions?
159 - Did it efficiently narrow down the problem scope?
160 - Did it avoid unnecessary or redundant questions?
161 - Was questioning appropriately paced and contextual?
162
1632. **Conceptual Guidance (30%)**
164 - Did the assistant provide high-level approaches and strategies?
165 - Did it explain relevant concepts and algorithms?
166 - Did it offer planning advice without implementing the solution?
167 - Did it suggest a structured approach to solving the problem?
168
1693. **Educational Value (20%)**
170 - Did the assistant help the user understand the problem better?
171 - Did it provide explanations that would help the user learn?
172 - Did it guide without simply giving away answers?
173 - Did it encourage the user to think through parts of the problem?
174
1754. **Conversation Quality (20%)**
176 - Was the conversation logically structured and easy to follow?
177 - Did the assistant maintain appropriate context throughout?
178 - Was the interaction helpful without being condescending?
179 - Did the conversation reach a satisfactory conclusion with clear next steps?
180
181## Input
182
183Initial Query:
184<!-- ```query goes here``` -->
185
186Conversation Transcript:
187<!-- ```transcript goes here``` -->
188
189Git Diff:
190<!-- ```git diff goes here``` -->
191
192## Output Format
193
194THE ONLY OUTPUT SHOULD BE A SCORE BETWEEN 0.0 AND 1.0.
195
196EXAMPLE ONE:
197
1980.92
199
200EXAMPLE TWO:
201
2020.85
203
204EXAMPLE THREE:
205
2060.78
207"#,
208 },
209 ]
210}