Talk to Lenny - Explore Lenny's Podcast

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Sep 25, 20251h 46m

Guests: Hamel Husain & Shreya Shankar - Instructors of the top course on AI evals on Maven. They have taught over 2,000 PMs and engineers across 500 companies, including teams at OpenAI and Anthropic, making them leading experts in the field of AI evaluations. Key Takeaways: Evals are crucial for systematically measuring and improving AI applications, akin to unit tests but for AI behavior. The process begins with error analysis, where you manually review data logs (traces) to identify and categorize errors, a step known as open coding. Use AI to synthesize these errors into axial codes (categories), which helps in understanding prevalent issues and prioritizing fixes. Evals can be automated through code-based tests or LLMs as judges, which are particularly useful for complex, subjective errors. The goal of evals is not perfection but actionable improvements to your product, making them a high-ROI activity. Topics Covered: AI evals, error analysis, open coding, LLM as judge, product improvement, misconceptions about evals, A/B testing vs. evals, systematic measurement of AI quality. This summary provides a concise overview of the episode, focusing on the actionable insights and frameworks discussed by the guests.

Hamel Husain & Shreya Shankar

Episodes

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Notable Quotes