Anthropic: 'We made the wrong tradeoff' in new model guardrails
Key Points:
- Anthropic initially implemented silent safeguards in its Claude Fable 5 AI model, rerouting sensitive queries and degrading performance for AI development prompts without notifying users, aiming to prevent misuse and rival AI creation.
- The company has reversed this policy, now making these safeguards visible by informing users when their requests are refused or rerouted, and providing reasons for refusal via the API.
- Anthropic apologized for the previous approach, acknowledging it made the wrong tradeoff between safety and transparency.
- The safeguards target national security concerns, preventing foreign adversaries from advancing in frontier AI and chip development, while most coding and machine learning tasks remain unaffected.
- Mythos, the underlying model behind Claude Fable 5, is among the most powerful AI systems, with restricted access due to its potential misuse in cyberattacks, bioweapons research, and other high-risk applications.