Talk to Lenny logoTalk to Lenny
ChatPodcastsGuestsBooksQuotes
ChatPodcastsGuestsBooksQuotes

Built with data from Lenny's Podcast

DisclaimerTranscript Source

Built with ๐Ÿงก by Aeonix Tech

โ† Back to Guests

Sander Schulhoff

1 episode

Episodes

Why securing AI is harder than anyone expected and guardrails are failing | HackAPrompt CEO

Dec 21, 20251h 32m

Guest: Sander Schulhoff - CEO of HackAPrompt. Sander is a leading researcher in adversarial robustness and AI security, known for organizing the first and largest AI red teaming competition and collaborating with top AI labs on model defenses. Key Takeaways: Guardrails Ineffectiveness: Current AI guardrails are largely ineffective against determined attackers. They often give a false sense of security and can be easily bypassed. Automated Red Teaming Limitations: Automated red teaming systems always find vulnerabilities in AI models, but they don't offer novel insights since all transformer-based models are inherently vulnerable. Focus on Classical Security: For AI systems, especially those that are agentic, focus on classical cybersecurity measures like proper data and action permissioning rather than relying solely on AI-specific defenses. Education and Awareness: Increasing awareness and understanding of AI security issues is crucial. Having AI security researchers on your team can help navigate these challenges. Potential Solutions: Implementing frameworks like CAMEL can help manage permissions and reduce risks in agentic systems, though they are not foolproof. Topics Covered: AI security challenges, guardrails inefficacy, automated red teaming, adversarial robustness, classical cybersecurity integration, CAMEL framework, AI education and awareness.

Notable Quotes

โ€œGuardrails do not work. If someone is determined enough to trick GPT-5, they're going to deal with that guardrail.โ€

AI security

โ€œYou can patch a bug, but you can't patch a brain.โ€

AI vulnerabilities

โ€œGuardrails don't work, they just don't work. They really don't work.โ€

AI guardrails