Musings of Murali

An Overview of RL Environments

20 minute read

Everything that happens in an RL environment between the policy update and the next rollout - verification, reward shaping, tool calling, curriculum design, ...

Lightweight Guide to understanding GRPO and RL principles

8 minute read

A beginner-friendly guide to Group Relative Policy Optimization (GRPO) training workflow without assuming prior RL knowledge.

Bridging the Three Gulfs of Agentic Development (and how they shape evals)

3 minute read

A practical framework for spotting and fixing evaluation blind spots in agentic LLM pipelines, based on Shankar et al.’s Three Gulfs model.

Let Agents do the talking: A Scalable Way to Evaluate Multi-Turn Chatbots

5 minute read

Interactive evaluations: lightweight, automated tests that use agents to measure multi-turn chatbot quality at scale.

CUDA Study Log 4: Optimizing Constrained Decoding with Triton Kernel

8 minute read

Update traditional CUDA matrix multiplication kernel for constrained decoding

Murali Manohar

Recent posts

An Overview of RL Environments

Lightweight Guide to understanding GRPO and RL principles

Bridging the Three Gulfs of Agentic Development (and how they shape evals)

Let Agents do the talking: A Scalable Way to Evaluate Multi-Turn Chatbots

CUDA Study Log 4: Optimizing Constrained Decoding with Triton Kernel