The CURE Pipeline
▶ Auto-Play
↺ Reset
1
2
3
4
5
1
Generate Rollouts
Policy generates
n code solutions
and
m unit tests
for each task. Both the coder and unit tester share the same policy model.
Execution Matrix B*
UT
1
UT
2
UT
3
UT
4
GT
1
GT
2
Code
1
✓
✓
✓
✗
✓
✓
✓
Code
2
✓
✓
✓
✗
✓
✓
✓
Code
3
✓
✗
✓
✓
✓
✗
Code
4
✗
✗
✓
✓
✗
✓
Pass (generated UT)
Fail (generated UT)
Pass (ground truth)
Fail (ground truth)
UT
1
:
+1
UT
2
:
+2
UT
3
:
-2
UT
4
:
0