White House tells Anthropic to block jailbreaks for Fable 5 release, security experts say no

If Anthropic wants to rerelease Fable 5, officials say every jailbreak must fail. Experts argue that standard is impossible.

ByOmar Al-BalawiTechnology Correspondent, The Executives Brief

about 5 hours ago·3 min read

White House tells Anthropic to block jailbreaks for Fable 5 release, security experts say no

Executive summary

Trump administration officials told WIRED that Anthropic must make the model's guardrails impossible to bypass to rerelease Fable 5. Security experts WIRED spoke with say that requirement cannot be met.

The Trump administration wants Anthropic to do something that security experts say is not feasible: ensure its guardrails cannot be circumvented, in connection with a potential rerelease of Fable 5.

That is the core demand WIRED reports from Trump administration officials. If Anthropic wants to put Fable 5 back into circulation, it will need to ensure the model's safety controls cannot be bypassed. And according to the security experts WIRED spoke with, that specific goal cannot be achieved. In other words, the administration is asking for a level of jailbreak-proofing that the security community believes the current state of model security does not allow.

This is not just a technical dispute. It is a governance collision between how regulators and procurement-minded officials want safety to work, and how adversarial systems behave in the real world. With AI models, “guardrails” are rarely a single on-off switch. They are layers: training choices, refusal policies, filters, monitoring, and ongoing updates. The key problem is that jailbreak attempts adapt. If users can find one weakness, they probe for the next. Even robust systems can end up facing new evasion patterns as capabilities and attacker behavior evolve.

That dynamic is why WIRED’s framing matters. The demand described by officials is absolute, not probabilistic. “Can’t be circumvented” is the kind of requirement that sounds clean in policy language and feels ugly in practice. Security experts, as WIRED reports, are skeptical because adversarial bypasses are a moving target. The moment you publish a model, you invite creative misuse attempts. Even if you reduce the number of successful jailbreaks, the question becomes whether you can ever say the bypasses are impossible. WIRED’s experts suggest the answer is no.

Now add the incentive structure. Anthropic, like other frontier AI labs, wants releases that prove capability, expand research impact, and support commercial and ecosystem goals. The White House officials, by contrast, are facing political and public pressure to prevent harmful misuse. The administration’s position, as described to WIRED, signals that it is treating jailbreak resistance as a condition for moving forward. That means Anthropic is not only building a safer model; it is negotiating the boundary of what safety looks like to the government.

There is also a broader regulatory subtext. Across AI policy conversations, governments increasingly want measurable safety outcomes. Procurement and release controls become a de facto policy tool: if you want access, you must meet the standard. But when the standard is framed in terms of “no bypasses,” it can become impossible to certify, which creates a different risk. If no model can satisfy the requirement, then the system can freeze in limbo, leaving the lab stuck between the desire to improve and the need to demonstrate a standard that experts say cannot be proven.

Second-order implications hit boards and risk committees first. When safety requirements are framed as absolute and tied to release, executives may find themselves in a constant compliance escalation loop. More testing, more controls, more restrictions, and potentially more operational friction. That can slow product timelines and increase governance overhead, even if overall safety improves. Boards will want to understand what “meets the requirement” means in practical terms: what evidence would satisfy the government, what failure modes are still acceptable, and who assumes liability when adversaries eventually find new seams.

Peer labs face the same strategic fork. They can treat this as a warning about how quickly safety expectations can harden into something unachievable, or they can use it as a planning catalyst: build documentation, test harnesses, and reporting routines that translate safety work into decision-ready artifacts. The objective is not to “win” a jailbreak contest. It is to reduce harm while aligning with whatever safety gating the political system demands.

The WIRED report lands on an uncomfortable truth: a government standard of “guardrails that can’t be circumvented” collides with security experts’ view that such a standard cannot be delivered. For executives, that collision is the story. It affects release timelines, investor narratives around safety, and how regulators set conditions that may not map cleanly to what current security methods can guarantee. If this framing sticks, it will reshape AI governance not as a gradual trust-building process, but as a threshold that may be impossible to clear.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Always free for Executives Club members. Join the Club

Taggedanthropic fable-5 jailbreaks ai-safety white-house security-experts regulation model-guardrails

White House tells Anthropic to block jailbreaks for Fable 5 release, security experts say no

This story's Key Insights and Take-aways are locked.

More in Technology

VSCO’s Studio Pro arrives on iOS with $500/year subscription, aiming straight at Adobe

NEA’s Tiffany Luck: Enterprises are still unsure of AI ROI, and budgets are tightening

PsiQuantum breaks ground in Queensland for a utility-scale, fault-tolerant quantum computer