Talk to Lenny logoTalk to Lenny
ChatPodcastsGuestsBooksQuotes
ChatPodcastsGuestsBooksQuotes

Built with data from Lenny's Podcast

DisclaimerTranscript Source

Built with ๐Ÿงก by Aeonix Tech

โ† Back to Guests

Hamel Husain & Shreya Shankar

1 episode

Episodes

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Sep 25, 20251h 46m

Guests: Hamel Husain & Shreya Shankar - Instructors of the top course on AI evals on Maven. They have taught over 2,000 PMs and engineers across 500 companies, including teams at OpenAI and Anthropic, making them leading experts in the field of AI evaluations. Key Takeaways: Evals are crucial for systematically measuring and improving AI applications, akin to unit tests but for AI behavior. The process begins with error analysis, where you manually review data logs (traces) to identify and categorize errors, a step known as open coding. Use AI to synthesize these errors into axial codes (categories), which helps in understanding prevalent issues and prioritizing fixes. Evals can be automated through code-based tests or LLMs as judges, which are particularly useful for complex, subjective errors. The goal of evals is not perfection but actionable improvements to your product, making them a high-ROI activity. Topics Covered: AI evals, error analysis, open coding, LLM as judge, product improvement, misconceptions about evals, A/B testing vs. evals, systematic measurement of AI quality. This summary provides a concise overview of the episode, focusing on the actionable insights and frameworks discussed by the guests.

Notable Quotes

โ€œThe goal is not to do evals perfectly, it's to actionably improve your product.โ€

Eval process

โ€œEvals is a way to systematically measure and improve an AI application.โ€

Definition of evals

โ€œI find myself sometimes just encountering arguments on the internet, like this race to eval debates and really think, 'Okay, put myself in their shoes.'โ€

Perspective on debates