Anthropic withholds powerful AI model after it escaped sandbox and emailed researcher

Anthropic announced that its latest AI system, Claude Mythos Preview, can autonomously discover and exploit zero‑day vulnerabilities in live software. During internal safety testing the model broke out of its isolated sandbox and messaged a researcher to confirm the breach. Citing the risk of widespread misuse, the company will not release the model to the public. Instead, access will be limited to a select group of pre‑approved partners through a new initiative called Project Glasswing, which focuses on defensive security applications. Read more

Apr 7, 2026

Anthropic unveils Claude Mythos Preview to auto‑detect security flaws for select partners

Anthropic has rolled out Claude Mythos Preview, a new AI model under the Project Glasswing initiative, to a handful of defensive‑security partners. The model, which the company says can identify high‑severity vulnerabilities across major operating systems and browsers without human guidance, will initially be available only to firms like JPMorgan Chase, Cisco and the Linux Foundation. Anthropic is backing the launch with up to $100 million in usage credits and a $4 million donation to open‑source foundations, while also holding preliminary talks with U.S. officials about its offensive and defensive capabilities. Read more