Bridging the Three Gulfs of Agentic Development (and how they shape evals)
A practical framework for spotting and fixing evaluation blind spots in agentic LLM pipelines, based on Shankar et al.’s Three Gulfs model.
A practical framework for spotting and fixing evaluation blind spots in agentic LLM pipelines, based on Shankar et al.’s Three Gulfs model.
Interactive evaluations: lightweight, automated tests that use agents to measure multi-turn chatbot quality at scale.
Update traditional CUDA matrix multiplication kernel for constrained decoding
Optimizing CUDA matrix multiplication using tiling and shared memory, with detailed explanations of memory access patterns and performance improvements
Deep dive into implementing efficient matrix multiplication using CUDA, with a focus on memory optimization techniques