The Full Story Behind the Fable 5 Suspension: Inside the Jailbreak

In three days, Anthropic's most guarded AI model was launched, declared jailbroken, accused of quietly downgrading its own users, and then switched off by the US government. This is how Claude Fable 5 went from flagship to suspended, and what it revealed about how AI safety really works and who controls a frontier model once it ships.

It is the deeper companion to our shorter take on why the Fable 5 suspension matters.

Day one: a model built to be hard to break

Anthropic released Fable 5 on 9 June as the most capable model it had ever put in public hands, and leaned hard on safety. The design matters, because it sits under everything that followed. Each request runs through classifiers, small models trained to spot sensitive topics like cybersecurity, biology, and chemistry. When one fires, the request is not refused. It is quietly handed to the older, weaker Claude Opus 4.8, which answers instead. Anthropic says this happens in under 5% of sessions.

So the safety layer was never a locked door. It was a router. Most of the time you got the full Fable 5; some of the time, on a guess, a weaker model, and you were not always told which one replied.

Forty-eight hours in: the Fable 5 jailbreak claim

A jailbreak is a trick that pushes a model past its safety rules. Within two days of launch, a researcher known as Pliny claimed he had broken Fable 5. By his account it was a stack of techniques: splitting a banned request across multiple steps and agents so no single one looked harmful, wrapping the pieces in academic or fictional framing, and using character tricks to slip past filters. He also claimed to have leaked the model's full system prompt. Screenshots spread fast.

Anthropic pushed back. It said this was not a real jailbreak, just coaxing the model past its own refusals, a weakness in almost every large language model, and that nothing genuinely dangerous had been unlocked. It drew a line between a narrow jailbreak, which pries loose specific information in a specific setup, and a universal one that breaks the safeguards across the board. By its account, no one had found a universal jailbreak for Fable 5.

Both can be true. The trick probably handed no one a real weapon, but it exposed the core weakness of this kind of safety: break a harmful request into small, innocent-looking steps and you can slip past filters that would catch it asked directly. Each step reads as harmless. Only the whole is dangerous, and the classifiers judge the steps. That is the state of AI alignment today, the work of keeping a model inside its limits. Better than last year. Not a wall.

12 June: the government pulls the plug

Then it stopped being about Anthropic and its users. On 12 June, the US government issued an export-control order citing a narrow jailbreak, the kind that asks the model to read a codebase and find software flaws. It targeted foreign-national access, and the only way to comply was to disable Fable 5 and its sibling Mythos 5 for every customer worldwide. The full account of that order sits in our companion piece on the suspension.

Anthropic's official graphic for its public statement on the US government directive to suspend access to Fable 5 and Mythos 5 Anthropic's public statement on the order to suspend access to Fable 5 and Mythos 5. Source: Anthropic

Three days after launch, the most powerful model Anthropic had ever shipped was gone.

The trigger was a rival, and what it signals

The order came wrapped in national-security language, but the reality reported since is stranger. According to the Wall Street Journal, the jailbreak that alarmed the government was found by researchers at Amazon, Anthropic's major investor, who reportedly took it to the Commerce Department rather than to Anthropic first. Early speculation had blamed OpenAI. The administration had also already asked Anthropic to delay the launch and been refused, so the report became the lever to stop the models anyway.

It is hard not to read this as a competitor using the government's power to take a rival's flagship offline, though no one has said so on the record. Anthropic argued the flaw was narrow, but once a model is framed as a national-security risk, the call leaves the engineers. That makes a frontier model less an ordinary product than a strategic asset, one that can be switched off overnight for reasons that have nothing to do with its code. It is one of the first times an export-control order has pulled a commercial model worldwide, and the mechanism will be used again. We will keep following it.