Humanitarian policies roll back a policy that could have “sabotaged” AI researchers using Claude

💥 Explore this must-read post from WIRED 📖

📂 **Category**: Business,Business / Artificial Intelligence,Course Correct

✅ **What You’ll Learn**:

Anthropy is declining On a policy that would secretly prevent competitors from using its new AI model, Claude Fable 5, to develop other AI models. The company changed course after the move received significant backlash from the AI research community.

“We are changing Fable 5’s safeguards for borderline LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made the wrong trade-off and apologize for not striking the right balance.”

Anthropic released Claude Fable 5, a version of its latest AI model with additional safety barriers designed to prevent misuse, earlier this week. Some of the safeguards Anthropic decided on weren’t surprising: The company said it would redirect users who asked questions about cybersecurity, biology, or chemistry to a less capable AI model to reduce the chances of someone using advanced AI to carry out a cyberattack or build a bioweapon.

But for researchers trying to use Claude Fable 5 to develop frontier AI, Anthropic has outlined a different approach. The company will intentionally degrade the model’s performance in ways that are not visible to the user. The move would effectively sabotage researchers trying to use Cloud to train competing AI models, which Anthropic explicitly prohibits in its terms of service.

Anthropic now says it’s changing course, and that Claude Fable 5’s AI development guarantees will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI, it will alert them and either deny the request or redirect the user to a less capable model.

Anthropic reversed this policy after receiving backlash from the AI research community. Anthropic has already taken steps to prevent competitors from using Cloud to build closed and open source AI models, but critics say the quietly deteriorating model performance for some users has gone too far. Cloud’s programming agent has become a favorite tool among developers, including those working on open source AI research projects, and researchers told WIRED that the company’s latest policy could have led to a troubling future in which only a few leading AI labs can conduct advanced AI research.

“Performing in machine learning research *without telling the user* is shockingly offensive and a terrible sight,” Dean Paul, a senior fellow at the Foundation for American Innovation and a former White House advisor on artificial intelligence, wrote in a post on X. He continued in another post that the “covert sabotage” policy undermines Anthropic’s overall position, because it limits AI researchers’ collaboration in the field of AI safety.

“It was as if Anthropic was saying to the public: ‘We don’t trust anyone else to do AI research,’” says Will Brown, research lead at open-source AI startup Prime Intellect. “We’re the only ones who have to do AI research.” “It’s like they’re starting to drag the ladder behind them.”

This policy would have left developers in the dark about whether they were violating Anthropic’s rules, Brown said, because the company wouldn’t alert them when its protections were triggered. He added that the restrictions could have had wide-ranging consequences. For example, he pointed to the growing ecosystem of third-party evaluation companies that test leading models for safety, performance and reliability, work that would have been hampered if Anthropic secretly degraded its model.

💬 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Humanitarian #policies #roll #policy #sabotaged #researchers #Claude**

🕒 **Posted on**: 1781170326

🌟 **Want more?** Click here for more info! 🌟

Humanitarian policies roll back a policy that could have “sabotaged” AI researchers using Claude

By

Leave a Reply Cancel reply