The CURE Pipeline
1
2
3
4
5
1
Generate Rollouts
Policy generates n code solutions and m unit tests for each task. Both the coder and unit tester share the same policy model.
Execution Matrix B*
UT1
UT2
UT3
UT4
GT1
GT2
Code1
Code2
Code3
Code4
Pass (generated UT)
Fail (generated UT)
Pass (ground truth)
Fail (ground truth)
UT1: +1
UT2: +2
UT3: -2
UT4: 0