Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

WIRED business

Key Points:

  • Anthropic has reversed its policy of covertly degrading the performance of its AI model Claude Fable 5 to prevent competitors from using it for developing other AI models, following strong backlash from the AI research community.
  • Previously, Anthropic implemented hidden safeguards that would secretly reduce model performance for users suspected of frontier AI development, without alerting them, which critics argued was hostile and hindered open AI research collaboration.
  • The company now plans to make these safeguards visible, notifying users when requests are refused or rerouted to less capable models, aiming for greater transparency while still preventing misuse by foreign adversaries or unsafe applications.
  • Anthropic justified its original approach by expressing concerns that AI capabilities could advance faster than societal readiness and safety measures, emphasizing the need to maintain a strategic edge in AI technology among US and allied entities.
  • The company acknowledges that making safeguards visible may lead to more benign user requests being flagged, and it is actively working to improve the precision of its classifiers to reduce unnecessary restrictions.

Trending Business

Trending Technology

Trending Health