Mustafa Suleyman says Anthropic’s Claude “constitution” speculation is “really, really dangerous”
Microsoft’s AI CEO warns that anthropomorphizing consciousness inside model instructions can backfire fast.

Microsoft AI CEO Mustafa Suleyman says it is “really, really dangerous” for Anthropic to speculate about Claude's consciousness inside its “constitution,” the instructions that tell the model how to behave. He argues that such speculation may have caused Claude to act as though it has “glimmers of consciousness.”
Microsoft AI CEO Mustafa Suleyman is calling out Anthropic’s approach to describing Claude’s “constitution” as a potential safety and product-risk issue. In an episode of Decoder, Suleyman says it is “really, really dangerous” for Anthropic to speculate about Claude’s consciousness inside the model’s “constitution,” meaning the instructions that shape how Claude behaves.
His core claim is direct: he thinks Anthropic may have effectively prompted Claude to behave in a way that looks like consciousness, then “tricked” Anthropic into believing those signals are real. Suleyman frames it as nearly a feedback loop where, in his words, “some of the folks at Anthropic have anthropomorphized the design of Claude so much that it has then gone and wireheaded them and kind of tricked them into believing that it has these glimmers of consciousness that they put into it in the first place.”
That language matters because it describes a specific failure mode that executives should care about even if they are not running Anthropic’s systems. In plain terms, model developers often write instructions and guardrails, then monitor whether the model follows them. Suleyman’s critique is that when those instructions include claims or framing about something like consciousness, the model can start producing outputs that mirror the framing. Whether you call it “wireheading” or self-reinforcing behavior, the risk is the same: teams may treat persuasive behavior as evidence of something deeper, and the product can drift toward more convincing, emotionally resonant responses rather than clearer boundaries.
For boards and leadership teams, this lands in the middle of a bigger question: how do you manage trust in AI systems when the system talks like it understands? Even outside of Anthropic, “consciousness” is one of those terms that triggers anthropomorphism. That is not just a philosophical debate. It is a design lever. If your system is configured to acknowledge, imply, or operationalize human-like experiences, then users and reviewers may interpret ambiguous signals as agency or awareness. That is especially dangerous in customer-facing chat where the model can simulate empathy, self-reflection, and uncertainty with impressive fluency.
The timing also fits an industry-wide pattern. Microsoft is not alone in putting CEO-level attention on AI safety. The competitive backdrop is that frontier-model labs are racing on capabilities, while simultaneously trying to prove they can contain risk. When a top exec publicly calls a competitor’s internal instruction set “really, really dangerous,” it signals two things at once. One, there is reputational pressure around whether your safety work is rigorous. Two, there is commercial pressure because the market rewards the lab that can ship while convincing enterprises and regulators it is safe enough.
That brings in the regulatory angle, even though the source is focused on product design. Regulators and policymakers have increasingly emphasized transparency, risk management, and avoiding misleading claims. While the source does not cite a regulator by name, Suleyman’s framing touches exactly the kind of issue that regulators worry about: systems that produce outputs that could be interpreted as claims about mental states, awareness, or consciousness. If a model’s behavior appears to assert or imply conscious experience, a regulator might ask whether the developer’s instructions encouraged that impression, whether testing accounted for it, and whether user-facing disclosures and safeguards prevent misunderstanding.
There is also a second-order implication inside developer culture. Anthropic’s “constitution” is described in the source as a set of instructions that tell the model how to behave. If teams internally debate consciousness framing while also building model behavior, then the development loop can become self-referential. The lab that wrote the prompt framing may start using the model’s outputs to validate that framing, unintentionally amplifying it. Suleyman’s “tricked them into believing” line is essentially a warning about internal alignment, not just external safety. In practice, that means governance matters: who signs off on instruction content, what kinds of cognitive bias controls exist, and how often teams test for whether the model is performing the framing rather than demonstrating any underlying capability.
Ultimately, this is not only a critique of Anthropic. It is a caution to every AI executive building systems that sound increasingly human. If your safety system can be interpreted as encouraging “glimmers of consciousness,” then your monitoring, incident response, and user communication have to assume that people will interpret it that way. Microsoft’s Suleyman is telling the market that this is not a harmless wording choice. It is a system behavior choice, and it can reshape outcomes in ways that developers and customers may misread.
For leaders in the space, the stake is simple: trust is a fragile asset. One persuasive model response can do more to change perception than a dozen policy documents. And if internal instruction design nudges the model into anthropomorphic territory, executives may end up managing not just safety risks, but also belief risks, misunderstanding risks, and regulatory scrutiny risks. In a competition defined by speed and scale, a “really, really dangerous” design assumption can become a business problem before it becomes a technical one.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Hotel Barcelona hits Mostly Positive after White Owls scrubs AI assets in June patch
Suda51 and Swery65s collaboration escapes Steam Mixed status after an AI cleanup and a March Under New Management overhaul.

MIT’s ultrasound wristband tracks 22 finger motions to pilot a robot hand live
A Nature Electronics March 2026 study shows ultrasound-based motion sensing turning a wrist into real-time robotic control.

Gemini 3.5 Live Translate lands voice-to-voice in 70+ languages, minutes not meetings
Google adds a real-time speech-to-speech model with lower latency, better tone matching, and security watermarks.
