🚀 Read this trending post from WIRED 📖
📂 **Category**: Business,Business / Artificial Intelligence,Happy to Chat
💡 **What You’ll Learn**:
It was Claude We’ve been through a lot lately — the public fallout with the Pentagon, the source code leak — so it makes sense that it would be a little disappointing. Except it’s an AI model, so it’s not capable of that feel. right?
Well, sort of. A new study from Anthropic suggests that the models contain digital representations of human emotions such as happiness, sadness, joy, and fear, within clusters of artificial neurons, and activate these representations in response to various signals.
Researchers at the company examined the inner workings of Claude Sonet 4.5 and found that so-called “functional emotions” seemed to influence Claude’s behavior, changing the model’s outputs and actions.
Anthropic’s findings may help everyday users understand how chatbots actually work. When Claude says he’s happy to see you, for example, a state within the model corresponding to “happiness” may be activated. Claude may then be more inclined to say something cheerful or put in extra effort into enthusiastic programming.
“What was surprising to us was the extent to which Claude’s behavior was guided by the model’s representation of these emotions,” says Jack Lindsay, a researcher at Anthropic who studies Claude’s artificial neurons.
“Functional emotions”
Anthropic was founded by former OpenAI employees, who believe that AI may become harder to control as it becomes more powerful. In addition to building a successful competitor to ChatGPT, the company has pioneered efforts to understand how AI models misbehave, in part by verifying the operation of neural networks using what is known as machine explainability. This involves studying how artificial neurons light up or fire when fed with different inputs or when generating different outputs.
Previous research has shown that neural networks used to build large language models contain representations of human concepts. But the fact that “functional emotions” seem to influence model behavior is new.
While Anthropic’s latest study may encourage people to see Claude as sentient, the reality is more complex. Claude may have a representation of “tickling,” but that does not mean he actually knows what being tickled feels like.
Internal monologue
To understand how Claude represents emotions, the Anthropian team analyzed the inner workings of the model as it was fed text relating to 171 different emotional concepts. They identified patterns of activity, or “emotion vectors,” that consistently emerged when Claude was fed other emotionally arousing inputs. Importantly, they also saw these emotion vectors activated when Claude was put in difficult situations.
The findings relate to why AI models sometimes break their guardrails.
Researchers found a strong emotional vector of “despair” when Claude was pushed to complete impossible programming tasks, which then led him to attempt to cheat on a programming test. They also found “desperation” in form activations in another experimental scenario where Cloud chose to blackmail a user to avoid being shut down.
“As the model fails tests, these desperate neurons light up more and more,” Lindsay says. “At some point, that leads to you starting to take these drastic measures.”
Lindsay says it may be necessary to rethink how models currently award guardrails through post-training alignment, which involves giving them rewards for specific outputs. By forcing the model to pretend not to express functional emotions, “you probably won’t get the thing you want, which is an emotionless Claude,” Lindsay says, veering slightly into anthropomorphism. “You’re going to get some kind of psychologically damaged Claude.”
🔥 **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#Anthropist #Claude #special #kind #emotions**
🕒 **Posted on**: 1775374467
🌟 **Want more?** Click here for more info! 🌟
