BenchmarkMay 10, 20261 min read

AI Model Capabilities Surpass Human Evaluation Limits, Exposing New Cybersecurity Risks

The Claude Mythos AI model has reached an unprecedented level of capability, outpacing the evaluation methods of assessment organization METR, while cybersecurity firm Palo Alto Networks warns of a new era of autonomous AI-powered attacks. This development marks a significant shift in the AI landscape, with profound implications for cybersecurity and the future of AI model development.

METR can barely measure Claude Mythos Preview with its current test suite. Only five out of 228 tasks cover the relevant capability range. Meanwhile, Palo Alto Networks reports that frontier models autonomously chain vulnerabilities, shrinking the time from initial access to data exfiltration to just 25 minutes. Evaluation methods are growing more slowly than the models themselves, and that may be the bigger problem. The article METR says it can barely measure Claude Mythos, Palo Alto Networks warns of autonomous AI attackers appeared first on The Decoder.

Browse Models Compare All News

AI Model Capabilities Surpass Human Evaluation Limits, Exposing New Cybersecurity Risks

AI-Powered Students See 24% Drop in Exam Scores After Two Years

Explore