Anthropic says its most powerful AI cyber model is too dangerous to release publicly - so it built Project Glasswing
Key Points:
- Anthropic launched Project Glasswing, a cybersecurity initiative using its unreleased AI model, Claude Mythos Preview, in partnership with 12 major tech and finance companies to identify and patch critical software vulnerabilities before exploitation by adversaries.
- Claude Mythos Preview autonomously discovered thousands of high-severity zero-day vulnerabilities across major operating systems and software, including a 27-year-old OpenBSD flaw and complex Linux kernel exploits, with most findings already patched or in remediation.
- To responsibly manage the disclosure of numerous vulnerabilities, Anthropic developed a triage process involving human validation and coordinated disclosure timelines, aiming to avoid overwhelming open-source maintainers and ensure patch quality.
- Despite its advanced capabilities, Anthropic has faced recent operational security lapses, including a source code leak and CMS misconfiguration, raising concerns about trustworthiness as it handles a powerful cybersecurity tool.
- Anthropic emphasizes the urgency of Project Glasswing, warning that AI-driven offensive capabilities will proliferate within months, and presents the initiative as a critical step to give defenders an advantage in the rapidly evolving AI cybersecurity landscape.