AIToolRank
ModelsCompareReviewsNewsQuizCalculator
AIToolRank

AI model specs, pricing & comparisons.

© 2026 AIToolRank

Explore

ModelsCompareCalculatorLeaderboardMethodology

Top Providers

OpenAIAnthropicGoogleDeepSeekMeta

Resources

ReviewsNewsModel QuizMethodology
Home/Fastest AI Models -- Speed Rankings

Fastest AI Models -- Speed Rankings

When response time matters. Real-time applications, chatbots, and interactive tools need speed above all. Ranked by tokens per second output speed.

Our Top Pick
IMercury 2Intelligence32.8

Mercury 2 delivers the fastest output at 894 tok/s, ideal for real-time and interactive applications.

Methodology

Ranked by output speed in tokens per second, measured across standardized prompts across standardized evaluations.

#ModelProviderSpeedLatencyIntelligencePrice/1M
1IMercury 2Inception894 tok/s3.81s32.8$0.25
2IGranite 3.3 8B (Non-reasoning)IBM402 tok/s9.53s7.0$0.03
3googleGemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)Google389 tok/s

Frequently Asked Questions

What is the best AI for real-time applications in 2026?

Based on our benchmark rankings, Mercury 2 is currently the top-ranked model for real-time applications. See the full rankings above for alternatives.

How much does Mercury 2 cost for real-time applications?

Mercury 2 costs $0.25/1M input tokens. For 100 requests per day at 2,000 tokens each, that's approximately $1.50/month.

Is there a free AI model good for real-time applications?

Currently, there are no free models that rank highly for real-time applications. Check our free models page for the best zero-cost options.

Related:Mercury 2 vs Granite 3.3 8B (Non-reasoning)·Full Pricing Comparison·Leaderboard

Not sure which model to pick?

Take our 30-second quiz and get a personalized recommendation.

Take the Quiz
3.77s
21.6
$0.10
4nvidiaNVIDIA Nemotron 3 Super 120B A12B (Reasoning)NVIDIA365 tok/s0.54s36.0$0.30
5googleGemini 2.5 Flash-Lite (Reasoning)Google329 tok/s11.34s17.6$0.10
6IGranite 4.0 H SmallIBM320 tok/s8.66s10.8$0.06
7MMinistral 3 3BMistral293 tok/s0.26s11.2$0.10
8ANova MicroAmazon293 tok/s0.36s10.3$0.04
9openaigpt-oss-20B (high)OpenAI281 tok/s0.48s24.5$0.06
10openaigpt-oss-120B (high)OpenAI254 tok/s0.50s33.3$0.15
11XGrok 4.20 Beta 0309 (Reasoning)xAI246 tok/s11.75s48.5$2.00
12ANova 2.0 Lite (medium)Amazon235 tok/s11.81s29.7$0.30
13LLFM2 24B A2BLiquid AI223 tok/s0.23s10.5$0.03
14openaiGPT-5.4 mini (xhigh)OpenAI218 tok/s7.45s48.1$0.75
15AQwen3 0.6B (Reasoning)Alibaba217 tok/s0.90s6.5$0.11
16openaiGPT-5.4 nano (xhigh)OpenAI216 tok/s2.31s44.4$0.20
17googleGemini 2.5 Flash (Reasoning)Google213 tok/s13.50s27.0$0.30
18googleGemini 3.1 Flash-Lite PreviewGoogle208 tok/s8.03s33.5$0.25
19SSarvam 30B (high)FreeSarvam206 tok/s1.35s12.3Free
20XGrok 3 mini Reasoning (high)xAI198 tok/s0.37s32.1$0.30
21ANova LiteAmazon198 tok/s0.41s12.7$0.06
22XGrok Code Fast 1xAI195 tok/s3.65s28.7$0.20
23MDevstral Small 2FreeMistral193 tok/s0.34s19.5Free
24googleGemini 3 Flash Preview (Reasoning)Google192 tok/s6.11s46.4$0.50
25MLlama 3.1 Instruct 8BMeta191 tok/s0.47s11.8$0.10
26MMistral Small 3.2Mistral184 tok/s0.30s15.1$0.10
27MMinistral 3 8BMistral184 tok/s0.27s14.8$0.15
28AJamba 1.6 MiniAI21 Labs183 tok/s0.60s7.9$0.20
29openaiGPT-5 Codex (high)OpenAI180 tok/s9.17s44.6$1.25
30nvidiaNVIDIA Nemotron 3 Nano 30B A3B (Reasoning)NVIDIA176 tok/s0.72s24.3$0.06
31openaiGPT-5.1 Codex mini (high)OpenAI176 tok/s5.11s38.6$0.25
32MMistral 7B InstructMistral173 tok/s0.27s7.4$0.25
33MMistral Small (Sep '24)Mistral168 tok/s0.43s10.2$0.20
34AQwen3 Coder NextAlibaba161 tok/s0.81s28.3$0.35
35XGrok 4.1 Fast (Reasoning)xAI160 tok/s9.47s38.6$0.20
36MMistral Small 3.1Mistral160 tok/s0.41s14.5$0.10
37MMistral Small 3Mistral157 tok/s0.41s12.7$0.10
38ANova 2.0 Pro Preview (medium)Amazon152 tok/s11.82s35.7$1.25
39anthropicClaude 4.5 Haiku (Reasoning)Anthropic144 tok/s11.72s37.1$1.00
40AQwen3 Next 80B A3B (Reasoning)Alibaba142 tok/s1.03s26.7$0.50
41SApertus 8B InstructSwiss AI Initiative141 tok/s2.14s5.9$0.10
42openaiGPT-5 (ChatGPT)OpenAI141 tok/s0.56s21.8$1.25
43AQwen3 Next 80B A3B InstructAlibaba141 tok/s0.95s20.1$0.50
44SApriel-v1.5-15B-ThinkerFreeServiceNow141 tok/s0.20s28.3Free
45AQwen3 30B A3B 2507 (Reasoning)Alibaba140 tok/s0.97s22.4$0.20
46openaio3-miniOpenAI139 tok/s7.40s25.9$1.10
47AQwen3 1.7B (Reasoning)Alibaba139 tok/s0.90s8.0$0.11
48AMolmo2-8BFreeAllen Institute for AI138 tok/s0.41s7.3Free
49AQwen3 VL 8B InstructAlibaba137 tok/s1.01s14.3$0.18
50SApriel-v1.6-15B-ThinkerFreeServiceNow135 tok/s0.24s27.6Free