BenchmarkJuly 5, 20261 min read

AI Search Agents Flunk Ambiguity Test, Scoring Below 50% on New Benchmark

A new benchmark has revealed that leading AI search agents struggle to ask clarifying questions when faced with ambiguous queries, with top models scoring below 50%. This shortcoming can lead to a cascade of errors and incorrect results, highlighting the need for improved uncertainty recognition and user dialogue in future AI systems.

AI search agents rarely fail at multi-step research because of the search itself. Their real problem is not asking the user for clarification when queries are ambiguous. A new benchmark called DiscoBench shows that models searching repeatedly instead of asking follow-up questions actually perform worse, at 51.9 percent, than those that just guess. Even the best model only hits 43 percent overall accuracy. When ambiguity is removed from the queries, accuracy jumps by up to 40 points. The article AI search agents don't fail at searching, they fail at asking the right questions when queries get ambiguous appeared first on The Decoder.

Browse Models Compare All News

AI Search Agents Flunk Ambiguity Test, Scoring Below 50% on New Benchmark

Revolutionizing Drug Discovery: Anthropic Takes on Big Pharma's Most Neglected Diseases

Explore