ADARFT: Adaptive Curriculum Learning
🎯 T = Target Difficulty ADAPTS
Starts easy (T=0), increases as model improves
β = Target Reward FIXED
Always 0.5 — 50% success maximizes learning
📚
Sample Problems
at difficulty ≈ T
🤖
Model Solves
gets reward Ravg
⚖️
Compare
Ravg vs β = 0.5
🎚️
Adjust T
↑ or ↓ difficulty
R > 0.5 (too easy) Succeeds too often → Increase T → Harder problems
R ≈ 0.5 (just right) Perfect balance → T stays same → Optimal learning
R < 0.5 (too hard) Fails too often → Decrease T → Easier problems
Speed:
Step
0
Difficulty (T)
0.0
Target (β)
0.5
CONSTANT
Last Reward
T' = clip(T + η · tanh(α · (Rβ)), 0, 100)
η = 50 step size α = 2 sensitivity β = 0.5 target
Step Reward vs β ΔT New T