Let Agents do the talking: A Scalable Way to Evaluate Multi-Turn Chatbots
Interactive evaluations: lightweight, automated tests that use agents to measure multi-turn chatbot quality at scale.
Interactive evaluations: lightweight, automated tests that use agents to measure multi-turn chatbot quality at scale.
Update traditional CUDA matrix multiplication kernel for constrained decoding
Optimizing CUDA matrix multiplication using tiling and shared memory, with detailed explanations of memory access patterns and performance improvements
Deep dive into implementing efficient matrix multiplication using CUDA, with a focus on memory optimization techniques
A Introduction Guide for ML Engineers. Learn the fundamentals and practical implementations needed to get started with CUDA kernels