Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards
Key Points:
- Anthropic has released Claude Fable 5, its most capable AI model, as a public product with embedded safety classifiers, while its twin model, Claude Mythos 5, which has cyber capabilities unrestricted, remains limited to vetted cybersecurity professionals and critical infrastructure operators.
- Fable 5 uses classifiers to detect and redirect potentially harmful cyber, biology, chemistry, and distillation requests to a less capable model, Claude Opus 4.8, to prevent misuse, whereas Mythos 5 retains full cyber capabilities for trusted users.
- The safety classifiers effectively block offensive cyber tasks such as exploit development and reconnaissance, with fallback occurring in less than 5% of sessions, though some false positives remain as Anthropic plans to refine the system post-launch.
- Anthropic's red team demonstrated that Mythos Preview, a predecessor to Mythos 5, autonomously identified and exploited zero-day vulnerabilities across major operating systems and browsers, highlighting the dual-use risk of advanced AI models in cybersecurity.
- The rapid discovery of thousands of high-severity vulnerabilities by Anthropic and partners underscores a shift where patching and verification lag behind bug discovery, increasing the urgency for automated updates and robust defense measures, while Anthropic introduces a 30-day data retention policy to monitor for novel attacks.