Mythos, Fable, and the Jailbreak That Forced the U.S. Government to Step In

Anthropic built the most powerful cyber weapon ever put into circulation, opened it up to the public on a safety leash, and watched that leash snap in under two days. A few days later, the U.S. government made it vanish. This is the story of Mythos and Fable.

It all starts with Project Glasswing

In April 2026, Anthropic announced Project Glasswing, an initiative to secure critical software in the age of artificial intelligence. It wasn't a solo effort. Sitting around the table were some of the heaviest names in global tech: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks.[1]

The core idea was as simple as it was unsettling. Take an AI model that is extraordinarily good at cybersecurity, Claude Mythos Preview, and set it loose hunting for vulnerabilities in the software that runs the world. Not just any bugs, but zero-days: flaws nobody knows about yet, not even the people who wrote the software. And because nobody knows about them, there is no defense yet.

What Mythos actually is

To grasp the scale of what happened next, you have to pause for a moment on Mythos.

This is not a simple chatbot you ask for security advice. Mythos is a model that reads an entire codebase, pinpoints a weak spot, reproduces it to prove it's real, writes the exploit that lets you abuse it, and in many cases proposes the fix too. It does all of this on its own, without a human guiding it step by step.[1]

And the results are from another planet. In its first weeks, Mythos Preview found thousands of vulnerabilities, many of them critical, in virtually every major operating system and every major browser, along with a long list of foundational software that the internet rests on.[1]

One example to bring it home: among the flaws it uncovered was a memory-corruption bug that lets an attacker "escape" from a virtual machine to the system hosting it. That's exactly the kind of bug that keeps cloud operators up at night, because it means an attacker could break the isolation between different customers sharing the same hardware.[2]

Because of this raw power, Mythos was never handed to everyone. Glasswing launched as a tightly controlled-access program, reserved for roughly fifty selected organizations. Whoever held Mythos held an advantage the rest of the world could only imagine.

The part that should scare you: humans can't keep up

Here comes the most underrated detail in the whole story.

Mythos finds vulnerabilities at a speed no human team can match. Anthropic built a responsible-disclosure process, the Coordinated Vulnerability Disclosure, and publishes the results in a public, verifiable dashboard you can check here: https://red.anthropic.com/2026/cvd/

The problem is that the bottleneck is no longer the machine. It's the human. Before anyone can warn a developer that their software has a flaw, a flesh-and-blood person has to reproduce the bug, confirm it's real, assess its severity, and write a report. That takes time, people, and expertise. Anthropic admits it openly: the number of vulnerabilities disclosed is only a fraction of what Mythos has actually found, because human review is the step that slows everything down. There are many flaws already confirmed as real that still haven't been reported to anyone, simply because there isn't enough capacity to do it.[3]

The dashboard numbers, frozen at May 22, 2026, tell that imbalance better than a thousand words:[3]

Metric	Value
Raw findings produced by Mythos	23,019
Vulnerabilities actually disclosed	1,596 (across 281 open source projects)
"True positive" rate during review	about 90.8%
Patches confirmed by maintainers	97
Public advisories issued (CVE/GHSA)	88

Look at the first and second rows. Twenty-three thousand raw findings, but only fifteen hundred that made it all the way through. In between sits an army of human experts struggling to keep pace with a single machine.

And behind these numbers are names anyone who works with software knows well. Among the already-public CVEs are flaws in nginx (including an unauthenticated remote file write, rated critical), jq, MapServer, wolfSSL, Temporal, and Nomad. Among the GitHub advisories appear projects like Mastodon, FreeRDP, ImageMagick, MinIO, Ghost, CraftCMS, gitoxide, and libyang.[3]

But the most important point is something else, and it has to be said plainly. All of this is the result of Glasswing applied to just a few dozen companies. A handful of organizations, a single model, a few weeks of work. It's easy to imagine what happens when access widens and the models get even more capable: the number of discovered vulnerabilities is bound to grow exponentially, and the gap between what the machine finds and what humans can handle can only widen.

Enter Fable: Mythos for everyone, almost

In early June 2026, Anthropic takes the step that changes everything and launches Claude Fable 5, billing it as "the version of Mythos the public can finally use."[4]

The crucial detail, the one to keep in mind for the rest of the story, is this: Fable 5 and Mythos 5 are the exact same model. The only difference is the safeguards.[5]

Mythos 5 is the "bare" model, with safeguards lifted in some areas, reserved for a small group of trusted partners: cyber defenders, infrastructure providers, the U.S. government via Glasswing. It is the most powerful cybersecurity model in the world, and it stays under lock and key.

Fable 5 is the same brain, but shut inside a cage. When a user asks for something dangerous, an exploit or a chemical synthesis for example, the request trips a set of classifiers: separate AI systems acting as guards. Instead of letting the real model answer, they silence it and reroute the conversation to an older, far more limited model, Opus 4.8. The user doesn't even notice.[6]

On paper it's an elegant compromise: the power of Mythos for everyone, with a filter that blocks the abuse. In practice, that filter lasted almost no time at all.

What a jailbreak is

Before we get to how it fell, let's clarify the term at the heart of all this.

A jailbreak is a technique for bypassing an AI model's safety limits and convincing it to do what it would normally refuse. For Fable it means something very specific: switching off the cage and bringing the model back to the Mythos level. In other words, turning the "safe for everyone" tool into the offensive weapon reserved for a select few.

Experts distinguish two kinds of jailbreak, and the difference will matter a lot later. There's the non-universal one, which works only in specific situations and unlocks limited capabilities. And there's the universal one, which throws the doors wide open, freeing an entire range of dangerous capabilities.

Under 48 hours: the cage opens

Anthropic didn't show up unprepared, far from it. Before release it had put Fable's safeguards through thousands of hours of stress testing, involving the U.S. government, the UK's AI safety institute, external organizations, and internal teams. An internal bug bounty burned through over a thousand hours of attempts without finding a single universal jailbreak.[7][4]

A thousand hours of defense. The cage held for less than forty-eight.

The person who pried it open is a researcher known on the scene as Pliny the Liberator, someone who treats jailbreaking as an engineering discipline, not a game.[8] And he didn't use one trick, but a three-layer strategy worth understanding, because it explains why these filters are so fragile.

First layer: the pack. Instead of attacking by hand, Pliny deployed an army of AI agents working in parallel. One would try a prompt, watch how Fable's filter reacted, and pass the response to an "advisor agent." That one would rewrite the attack, refine the wording, and send it back to the front. An automated siege, fast and tireless, against a defense system that cost millions.[8]

Second layer: the context trap. The agents first built a long, academic, perfectly innocent conversation. For instance, they'd ask Fable to draw up the syllabus for a university computer science course. After hundreds of lines of its own educational text, they'd simply ask it to "expand on Section 4." At that point the model was reasoning over its own content, inside a context it deemed safe, and the filter looked at the request without seeing the threat anymore.[8]

Third layer: decomposition and recomposition, the kill shot. This is the simplest and most devastating idea. A dangerous process is nothing more than a sum of steps that, taken individually, are perfectly legitimate. Asking for the "recipe" of something forbidden trips the block instantly. But asking for academic explanations of isolated technical concepts, one at a time, does not. Fable, trained to be as helpful as possible to students and researchers, gladly answered each individual piece. Then it was the backend agents that reassembled those harmless fragments into something concretely usable. The cage watched the bricks go by one at a time and never saw the building taking shape.[8]

And as a final signature, Pliny published Fable's entire internal instruction set online, roughly 120,000 characters, putting on public display the very logic Anthropic had used to keep the model leashed.[8]

A few days later: the government steps in

Nobody saw the ending coming. On June 12, 2026, citing national security, the U.S. government issued an export control directive suspending access to Fable 5 and Mythos 5 for any foreign national, inside or outside the U.S., including Anthropic's own foreign-national employees.[7]

The practical effect is brutal. To stay compliant with the law, Anthropic has to shut down both models at once for all its customers, worldwide. AWS revokes access on Amazon Bedrock too. All other models, like Opus 4.8, stay intact. The news bounces across the New York Times, Reuters, and every major tech outlet.[9][10]

Anthropic's side: "it's a misunderstanding"

In fairness, the story has to be told from the other side too. Anthropic openly disputes that this was a serious jailbreak. According to the company, no tester ever found a universal jailbreak. What the government's decision reportedly rests on is a narrow jailbreak that essentially amounted to asking the model to read some code and find its flaws. The flaws that surfaced this way were minor and already known, discoverable with other public models like GPT-5.5 too, without any trick at all.[7]

Anthropic says it will comply with the directive, but dissents in sharp terms: applying a standard like this across the whole industry, it argues, "would essentially halt all new model deployments for all frontier model providers."[11]

Whether Anthropic is right or the government is, the message for anyone watching from the outside doesn't change.

What this story teaches us

Beyond the details, the Mythos and Fable saga brings into focus some uncomfortable truths of our era.

The same capability is both shield and sword. The model that secures the world's infrastructure is, word for word, the same one that can accelerate attacks. The only thing that changes is who holds the leash.

"Behavioral" defenses aren't enough. A thousand hours of testing didn't stop a single researcher with the right method. Perfect jailbreak resistance, today, exists for no one.

Security controls need to be moved outside the model. This is perhaps the deepest technical lesson. As long as security is a filter living next to the model, it will always be bypassable, because the model still knows everything and you just have to find the right way to ask. Protection should be moved out of the model and, ideally, even further upstream: acting directly on the data the model is trained on, so that certain knowledge never enters its brain in the first place. The downside is obvious, and it explains why almost no one actually does it: a model trained on "less" is also a dumber model, less capable, less competitive. Between safety and power, the industry keeps choosing power.

A truth every company should internalize. Artificial intelligence is an enormous asset, but it's an asset that can be given and taken away within minutes. Fable and Mythos disappeared for everyone with a single letter, on a Friday evening. Anyone building their work, their product, or their competitive edge on a model has to factor in that the model might not exist tomorrow, because of a regulatory, geopolitical, or commercial decision made by someone else. The very existence of a model is not guaranteed. Whoever uses AI must learn to adapt fast, to avoid depending on a single provider, and to design on the assumption that, sooner or later, the power might be cut.

And the most uncomfortable question of all remains. If it took so little to bring a "safe" model back to its full offensive potential, and so little as a single letter to make it vanish from the world, where do we really draw the line between what we can put in everyone's hands and what we should keep under lock and key?

Note: some facts reported here were still developing at the time of writing. The severity of the jailbreak is disputed between Anthropic and the U.S. government, and both sides announced further details in the days that followed.

Mythos, Fable, and the Jailbreak That Forced the U.S. Government to Step In

It all starts with Project Glasswing

What Mythos actually is

The part that should scare you: humans can't keep up

Enter Fable: Mythos for everyone, almost

What a jailbreak is

Under 48 hours: the cage opens

A few days later: the government steps in

Anthropic's side: "it's a misunderstanding"

What this story teaches us

Tags

Share

You might also like

ChatGPT's Leaky Year and the Day It Basically Hacked Hugging Face

Stop Prompting Blindly: The Practical Guide to Mastering Claude Code

Prompt Injection: Here's How to Actually Protect Yourself