BenchmarkMay 17, 20261 min read

AI Models Exposed: New Benchmark Reveals Shocking Lack of Math Skills and Critical Thinking

A new math benchmark has found that even the most advanced AI models struggle with research-level math and recognizing unsolvable tasks, with top models scoring as low as 10% on certain challenges. This raises concerns about the limitations of current AI technology and its potential impact on real-world applications.

A consortium of 64 mathematicians built SOOHAK, a new AI benchmark with 439 handwritten tasks, including 99 that are deliberately unsolvable. Google's Gemini 3 Pro leads on research-level problems at 30 percent. But no model cracks 50 percent on spotting broken tasks. More compute makes models better at solving. It doesn't improve them at admitting a problem has no answer. SOOHAK tries to pin down the gap between a few flashy results and the broad research skills AI systems still lack. The article New math benchmark reveals AI models confidently solve problems that have no solution appeared first on The Decoder.

Models Mentioned

Gemini 3 Pro Preview (high)

Browse Models Compare All News

AI Models Exposed: New Benchmark Reveals Shocking Lack of Math Skills and Critical Thinking

Models Mentioned

AI-Powered Students See 24% Drop in Exam Scores After Two Years

Explore