Anthropic: 'We made the wrong tradeoff' in new model guardrails

Business Insider • June 11, 2026 • business

Key Points:

Anthropic initially implemented silent safeguards in its Claude Fable 5 AI model, rerouting sensitive queries and degrading performance for AI development prompts without notifying users, aiming to prevent misuse and rival AI creation.
The company has reversed this policy, now making these safeguards visible by informing users when their requests are refused or rerouted, and providing reasons for refusal via the API.
Anthropic apologized for the previous approach, acknowledging it made the wrong tradeoff between safety and transparency.
The safeguards target national security concerns, preventing foreign adversaries from advancing in frontier AI and chip development, while most coding and machine learning tasks remain unaffected.
Mythos, the underlying model behind Claude Fable 5, is among the most powerful AI systems, with restricted access due to its potential misuse in cyberattacks, bioweapons research, and other high-risk applications.

Trending Business