AI Models Can Now Autonomously Develop Browser Exploits, But at a Steep Cost
A new benchmark has revealed that Anthropic's Claude Mythos and OpenAI's GPT-5.5 can autonomously develop real browser exploits, with Mythos outperforming GPT-5.5 but at a significantly higher cost. This breakthrough has significant implications for the security of web browsers and the development of AI models.
In a groundbreaking development, researchers have created a benchmark that measures the ability of AI models to exploit real-world vulnerabilities in Google's V8 JavaScript engine, which powers popular browsers like Chrome and Edge. The results show that Anthropic's Claude Mythos Preview model can develop complex exploits, including those that allow for arbitrary code execution, with an average score of 9.90 out of 16. This is a significant achievement, as it demonstrates that AI models can now autonomously develop exploits that can compromise the security of web browsers.
The benchmark, which evaluates the progress of AI models across five tiers, from triggering a bug to achieving full code execution, has shed light on the capabilities of various AI models. Claude Mythos Preview emerged as the top performer, reaching the highest tier on 21 out of 41 vulnerabilities. In contrast, OpenAI's GPT-5.5 trailed behind, with an average score of 5.51 points and reaching the top tier on only two vulnerabilities. The performance gap between the two models is significant, with Mythos demonstrating a level of expertise comparable to that of a competent human security researcher.
The cost of achieving these results, however, is a major concern. The full test run of Claude Mythos Preview across 122 episodes cost approximately $36,428, while GPT-5.5 via Codex ran 123 episodes for roughly $3,075. This price difference is staggering, with Mythos being more than twelve times more expensive than GPT-5.5. The high cost of Mythos raises questions about its cost-efficiency and whether the benefits of using this model outweigh the expenses. In contrast, GPT-5.5 may be able to close the performance gap by leveraging more computational resources, which could make it a more attractive option for developers and businesses.
The implications of this breakthrough are far-reaching. For developers, the ability of AI models to autonomously develop browser exploits means that they need to be more vigilant in identifying and patching vulnerabilities. This could lead to a significant increase in the development time and cost of web applications, as well as a greater emphasis on security testing and validation. For businesses, the potential risks associated with using AI models that can develop exploits are substantial, and they need to carefully evaluate the benefits and risks of using these models in their operations.
Historically, AI models have struggled to develop complex exploits, and this breakthrough marks a significant milestone in the development of AI-powered security testing tools. The ability of AI models to autonomously develop exploits has the potential to revolutionize the field of security testing, making it faster, more efficient, and more effective. However, it also raises concerns about the potential misuse of these models by malicious actors, who could use them to develop exploits for malicious purposes.
In conclusion, the ability of AI models like Claude Mythos and GPT-5.5 to autonomously develop browser exploits is a significant development with far-reaching implications. While the high cost of Mythos is a concern, the potential benefits of using these models in security testing and validation are substantial. As the development of AI models continues to advance, it is essential for developers, businesses, and users to be aware of the potential risks and benefits associated with these models and to take steps to mitigate any potential risks. Ultimately, the use of AI models in security testing has the potential to make the web a safer and more secure place, but it requires careful evaluation and management of the potential risks and benefits.