Why Anthropic’s new model has cybersecurity experts rattled

This is a column about Anthropic and AI. My fiancé works at Anthropic. See my full ethics disclosure here. Two weeks ago, Anthropic accidentally leaked the existence of what the company said was its most powerful artificial intelligence to date: a new model, known as Claude Mythos Preview, that represented “a step change” in AI performance. In particular, according to a blog post that leaked due to human error and a misconfigured content management system, Mythos posed serious new risks to cybersecurity. “It presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders,” the blog post stated. On Tuesday, the wave crashed onto the shore. Anthropic announced Mythos alongside Project Glasswing, an initiative with more than 40 of the world’s biggest tech companies that will see Anthropic grant early access to the model to find and patch vulnerabilities across many of the world’s most important systems. Launch partners in the coalition include Apple, Google, Microsoft, Cisco and Broadcom. They’ll be tasked with scanning and patching their own systems along with the critical open-source systems that modern digital infrastructure depends on. Anthropic is giving participants $100 million in usage credits for Mythos, and donating another $4 million to open-source security efforts. Still, today marks a striking and mostly unsettling moment in the development of AI systems. One of the world’s three frontier labs has now created a model it says is too dangerous to release to the general public. These dangers emerged not from any specialized cyber training but from the same general improvements that every other lab is currently pursuing. As a result, models with similar capabilities may soon be accessible to criminals, hackers, and nation states — or even more broadly via open source models. Already, Anthropic said, the model has found thousands of high-severity vulnerabilities in every major operating system and web browser, and in many cases developed related exploits. Among them: a vulnerability in OpenBSD, a security-focused open source operating system, that had escaped detection for 27 years; another flaw in the video encoder FFmpeg that had escaped detection in 5 million previous automated tests; and “several” vulnerabilities in the Linux kernel, which could be exploited to take complete control of a user’s machine. “Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely,” the company wrote. “The fallout — for economies, public safety, and national security — could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.” In a video that Anthropic made to accompany the announcement, researchers say that Mythos is more dangerous largely due to its advanced reasoning capabilities. While current models are capable of identifying high-severity vulnerabilities, Mythos might identify five separate vulnerabilities in a single piece of software and then chain them together into a uniquely dangerous new attack. Coupled with models’ growing ability to work without supervision for extended periods of time, Anthropic said we have reached an inflection point in cybersecurity risks. Of course, AI labs have often been criticized for making ominous pronouncements about the dangers posed by their own work, which can come across as a strange new form of marketing hype. For that reason, along with the fact that my fiancé works at Anthropic, I wanted to see what other cybersecurity experts made of the Mythos announcement. Alex Stamos, chief product officer at cybersecurity firm Corridor, told me that Glasswing is “a big deal, and really necessary.” “We only have something like six months before the open-weight models catch up to the foundation models in bug finding,” said Stamos, who previously led security at Facebook and Yahoo. “At which point every ransomware actor will be able to find and weaponize bugs without leaving traces for law enforcement to find (and with minimal cost).” Stamos’ sentiments were broadly echoed by Glasswing participants. “AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure from cyber threats, and there is no going back,” Anthony Grieco, chief security and trust officer at Cisco, said in a statement accompanying the announcement. If critical infrastructure really is at risk, as Grieco suggests, then you would hope the US government is paying attention. (And right on cue, here’s a story from today about Iran successfully hacking US water and energy utilities.) Awkwardly, though, the US government attempted to declare Anthropic a supply chain risk after it refused to modify its contract with the Pentagon to permit mass domestic surveillance and fully autonomous weapons. A judge has blocked that designation from taking effect while the case is litigated. Anthropic told me that before launching Project Glasswing, it briefed senior US government officials about Mythos’ capabilities, both offensive and defensive. That includes the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation, which works with the industry to test new models and evaluate them for security risks. The company told me it has signaled to the government that it is available to help the government with evaluating Mythos. But it’s not clear the government is taking Anthropic up on the offer. A functioning government would take a strong interest in what Anthropic is up to here, if only out of self-preservation. We simply don’t know whether Project Glasswing will be enough to protect critical systems from being breached — and for how long. “The optimistic timeline is that we are one step past human capabilities, and that means that there is a huge but finite pool of flaws that can be found and fixed,” Stamos told me. “The pessimistic timeline is that with every release there will be new classes of flaws we never even imagined. It’s hard to predict, because we are trying to model superhuman thinking.” For the moment, there's a case to be made that Project Glasswing represents Anthropic's founding thesis in action. The whole reason the company set out to build frontier AI models was so that a safety-focused lab would be the first to encounter the most dangerous capabilities — and could lead the way in mitigating them. With Mythos, that appears to be exactly what’s happening. At the same time, Glasswing is built on a deeply uncomfortable premise — that the only way to protect us from dangerous AI models is to build them first. And Anthropic is doing so in an environment that is barely regulated at all, at the near-insistence of the Trump administration. One effect of this is to centralize power. (“An underrated feature of this situation,” observed Kelsey Piper today about Mythos: “a private company now has incredibly powerful zero-day exploits of almost every software project you've heard of.”) Another effect is to centralize risk: Among other things, the incentives to steal Anthropic’s model weights just went up significantly. None of which is likely to make AI more popular in a country that appears to be turning against it. Surveys show people are clamoring for more control over how AI is used and stronger safeguards around it. As the story of Project Glasswing plays out, we may regret not beginning that work much sooner. Elsewhere in Mythos: A striking new benchmark result noted by VentureBeat: "Mythos Preview achieves 93.9% on SWE-bench Verified, versus 80.8% for Opus 4.6." That's a near 13-percent jump over the previous state of the art since February. What happened: Meta has an internal leaderboard called “Claudeonomics,” The Information reports, ranking over 85,000 employees on their AI usage. Users who burn the most tokens can earn titles including “Session Immorta…

create your storyflo · everywhere you listen.

Why Anthropic’s new model has cybersecurity experts rattled