Lightweight Guide to understanding GRPO and RL principles
A beginner-friendly guide to Group Relative Policy Optimization (GRPO) training workflow without assuming prior RL knowledge.
A beginner-friendly guide to Group Relative Policy Optimization (GRPO) training workflow without assuming prior RL knowledge.
A practical framework for spotting and fixing evaluation blind spots in agentic LLM pipelines, based on Shankar et al.’s Three Gulfs model.
Interactive evaluations: lightweight, automated tests that use agents to measure multi-turn chatbot quality at scale.
Update traditional CUDA matrix multiplication kernel for constrained decoding
Optimizing CUDA matrix multiplication using tiling and shared memory, with detailed explanations of memory access patterns and performance improvements