Anthropic: 'We made the wrong tradeoff' in new model guardrails

Anthropic: 'We made the wrong tradeoff' in new model guardrails

Business Insider business

Key Points:

  • Anthropic initially implemented silent safeguards in its Claude Fable 5 AI model, rerouting sensitive queries and degrading performance for AI development prompts without notifying users, aiming to prevent misuse and rival AI creation.
  • The company has reversed this policy, now making these safeguards visible by informing users when their requests are refused or rerouted, and providing reasons for refusal via the API.
  • Anthropic apologized for the previous approach, acknowledging it made the wrong tradeoff between safety and transparency.
  • The safeguards target national security concerns, preventing foreign adversaries from advancing in frontier AI and chip development, while most coding and machine learning tasks remain unaffected.
  • Mythos, the underlying model behind Claude Fable 5, is among the most powerful AI systems, with restricted access due to its potential misuse in cyberattacks, bioweapons research, and other high-risk applications.

Trending Business

Trending Technology

Trending Health