Gemini 3.1 Pro and GPT-5.4 are tied at the top. Here's what that actually means for you.
For the first time in AI model history, two models from different companies share the #1 spot. Google's Gemini 3.1 Pro and OpenAI's GPT-5.4 both score 57.2 on the Artificial Analysis Intelligence Index — a composite that measures reasoning, knowledge, coding, and math. Behind them, the gap to third place (GPT-5.3 Codex at 54.0) is the largest we've seen, suggesting the frontier has pulled away from the pack.
The Artificial Analysis Intelligence Index is a weighted composite of multiple benchmarks: MMLU-Pro (broad knowledge), GPQA (graduate-level science), LiveCodeBench (competitive coding), MATH-500 (mathematical reasoning), and several others. It's designed to capture general-purpose capability rather than narrow performance on any single task.
This matters because a model that scores 90% on MMLU but fails at math is less useful than one that scores 80% across the board. The Intelligence Index rewards consistent, well-rounded performance.
Both score 57.2, but they get there differently. GPT-5.4 edges ahead on coding (57.3 vs 55.5), while Gemini 3.1 Pro has shown stronger performance on ARC-AGI-2 reasoning tasks (77.1% — more than double its predecessor). Gemini also runs 47% faster at 113 vs 77 tokens per second and costs 20% less ($2/$12 vs $2.50/$15 per million tokens).
In practice, both models produce excellent results across all tasks. The choice between them comes down to ecosystem: if you're in Google Cloud, Gemini integrates seamlessly. If you're using OpenAI's API ecosystem with function calling and assistants, GPT-5.4 has richer tooling.
For raw capability with no ecosystem preference, Gemini 3.1 Pro offers better value — same intelligence, lower price, higher speed.
Claude Opus 4.6 scores 53.0 on the Intelligence Index — fourth overall. This surprises many people who consider it the best model available. The explanation lies in what the index measures versus what Opus excels at.
Opus 4.6 was designed for sustained agentic performance. Its METR-estimated task-completion horizon of 14.5 hours is the longest of any model — it can work autonomously on complex multi-step tasks without degrading. On SWE-bench Verified, it scores 80.8%, the highest of any model. These strengths don't fully show up in the Intelligence Index's benchmark mix.
If you're building agents, running long coding sessions, or need a model that maintains quality over thousands of interactions, Opus 4.6 is arguably still the best. For quick Q&A, one-shot tasks, and standard chat, GPT-5.4 and Gemini 3.1 Pro score higher.
For the first time, Chinese AI labs have placed models in the top 10. GLM-5 from Zhipu AI (49.8) and MiMo-V2-Pro from Xiaomi (49.2) both crack the top 10, ahead of established names like Qwen and Llama variants.
GLM-5 is particularly notable for its pricing: $1/$3.20 per million tokens makes it one of the cheapest frontier-adjacent models available. MiMo-V2-Pro gained attention after running anonymously as 'Hunter Alpha' on OpenRouter, processing over a trillion tokens before Xiaomi revealed its identity.
These models represent a shift in the competitive landscape. A year ago, only OpenAI, Anthropic, and Google had models above 45 on the Intelligence Index. Now, five different providers do.
Among the top 10 models by intelligence, the price spread is significant:
Gemini 3.1 Pro at $2/$12 is the cheapest frontier model. GPT-5.4 at $2.50/$15 is close. Claude Sonnet 4.6 at $3/$15 offers strong intelligence (51.7) at a mid-range price. GLM-5 at $1/$3.20 is the budget frontier option. Claude Opus 4.6 at $5/$25 is the most expensive.
For most use cases, the $3-4 difference between the cheapest and most expensive frontier model matters less than getting the right model for your task. But at scale — millions of API calls per month — Gemini's pricing advantage compounds significantly.
Rankings use the Artificial Analysis Intelligence Index, a composite of MMLU-Pro, GPQA, LiveCodeBench, MATH-500, and related benchmarks. Data is updated daily. Pricing reflects standard API tier at each provider.
Gemini 3.1 Pro is our overall pick: tied at #1 on intelligence, 20% cheaper than GPT-5.4, and 47% faster. GPT-5.4 is better for coding-heavy work. Claude Opus 4.6 is the best for agentic workflows and sustained performance. GLM-5 is the value dark horse at $1/1M input tokens.
Published March 28, 2026. Data updated daily from independent benchmarks and API providers.