ADARFT: Adaptive Curriculum Learning
🎯
T
= Target Difficulty
ADAPTS
Starts easy (T=0), increases as model improves
⚡
β
= Target Reward
FIXED
Always 0.5 — 50% success maximizes learning
📚
Sample Problems
at difficulty ≈ T
→
🤖
Model Solves
gets reward R
avg
→
⚖️
Compare
R
avg
vs β = 0.5
→
🎚️
Adjust T
↑ or ↓ difficulty
R > 0.5 (too easy)
Succeeds too often → Increase T → Harder problems
R ≈ 0.5 (just right)
Perfect balance → T stays same → Optimal learning
R < 0.5 (too hard)
Fails too often → Decrease T → Easier problems
▶ Run
⏭ Step
↺ Reset
Speed:
Step
0
Difficulty (T)
0.0
Target (β)
0.5
CONSTANT
Last Reward
—
T'
= clip(
T
+
η
· tanh(
α
· (
R
−
β
)),
0, 100
)
η = 50
step size
α = 2
sensitivity
β = 0.5
target
Step
Reward
vs β
ΔT
New T