When response time matters. Real-time applications, chatbots, and interactive tools need speed above all. Ranked by tokens per second output speed.
Mercury 2 delivers the fastest output at 894 tok/s, ideal for real-time and interactive applications.
Ranked by output speed in tokens per second, measured across standardized prompts across standardized evaluations.
| # | Model | Provider | Speed | Latency | Intelligence | Price/1M | |
|---|---|---|---|---|---|---|---|
| 1 | I | Mercury 2 | Inception | 894 tok/s | 3.81s | 32.8 | $0.25 |
| 2 | I | Granite 3.3 8B (Non-reasoning) | IBM | 402 tok/s | 9.53s | 7.0 | $0.03 |
| 3 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | 389 tok/s |
Based on our benchmark rankings, Mercury 2 is currently the top-ranked model for real-time applications. See the full rankings above for alternatives.
Mercury 2 costs $0.25/1M input tokens. For 100 requests per day at 2,000 tokens each, that's approximately $1.50/month.
Currently, there are no free models that rank highly for real-time applications. Check our free models page for the best zero-cost options.
Take our 30-second quiz and get a personalized recommendation.
Take the Quiz| 3.77s |
| 21.6 |
| $0.10 |
| 4 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | 365 tok/s | 0.54s | 36.0 | $0.30 |
| 5 | Gemini 2.5 Flash-Lite (Reasoning) | 329 tok/s | 11.34s | 17.6 | $0.10 |
| 6 | I | Granite 4.0 H Small | IBM | 320 tok/s | 8.66s | 10.8 | $0.06 |
| 7 | M | Ministral 3 3B | Mistral | 293 tok/s | 0.26s | 11.2 | $0.10 |
| 8 | A | Nova Micro | Amazon | 293 tok/s | 0.36s | 10.3 | $0.04 |
| 9 | gpt-oss-20B (high) | OpenAI | 281 tok/s | 0.48s | 24.5 | $0.06 |
| 10 | gpt-oss-120B (high) | OpenAI | 254 tok/s | 0.50s | 33.3 | $0.15 |
| 11 | X | Grok 4.20 Beta 0309 (Reasoning) | xAI | 246 tok/s | 11.75s | 48.5 | $2.00 |
| 12 | A | Nova 2.0 Lite (medium) | Amazon | 235 tok/s | 11.81s | 29.7 | $0.30 |
| 13 | L | LFM2 24B A2B | Liquid AI | 223 tok/s | 0.23s | 10.5 | $0.03 |
| 14 | GPT-5.4 mini (xhigh) | OpenAI | 218 tok/s | 7.45s | 48.1 | $0.75 |
| 15 | A | Qwen3 0.6B (Reasoning) | Alibaba | 217 tok/s | 0.90s | 6.5 | $0.11 |
| 16 | GPT-5.4 nano (xhigh) | OpenAI | 216 tok/s | 2.31s | 44.4 | $0.20 |
| 17 | Gemini 2.5 Flash (Reasoning) | 213 tok/s | 13.50s | 27.0 | $0.30 |
| 18 | Gemini 3.1 Flash-Lite Preview | 208 tok/s | 8.03s | 33.5 | $0.25 |
| 19 | S | Sarvam 30B (high)Free | Sarvam | 206 tok/s | 1.35s | 12.3 | Free |
| 20 | X | Grok 3 mini Reasoning (high) | xAI | 198 tok/s | 0.37s | 32.1 | $0.30 |
| 21 | A | Nova Lite | Amazon | 198 tok/s | 0.41s | 12.7 | $0.06 |
| 22 | X | Grok Code Fast 1 | xAI | 195 tok/s | 3.65s | 28.7 | $0.20 |
| 23 | M | Devstral Small 2Free | Mistral | 193 tok/s | 0.34s | 19.5 | Free |
| 24 | Gemini 3 Flash Preview (Reasoning) | 192 tok/s | 6.11s | 46.4 | $0.50 |
| 25 | M | Llama 3.1 Instruct 8B | Meta | 191 tok/s | 0.47s | 11.8 | $0.10 |
| 26 | M | Mistral Small 3.2 | Mistral | 184 tok/s | 0.30s | 15.1 | $0.10 |
| 27 | M | Ministral 3 8B | Mistral | 184 tok/s | 0.27s | 14.8 | $0.15 |
| 28 | A | Jamba 1.6 Mini | AI21 Labs | 183 tok/s | 0.60s | 7.9 | $0.20 |
| 29 | GPT-5 Codex (high) | OpenAI | 180 tok/s | 9.17s | 44.6 | $1.25 |
| 30 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | 176 tok/s | 0.72s | 24.3 | $0.06 |
| 31 | GPT-5.1 Codex mini (high) | OpenAI | 176 tok/s | 5.11s | 38.6 | $0.25 |
| 32 | M | Mistral 7B Instruct | Mistral | 173 tok/s | 0.27s | 7.4 | $0.25 |
| 33 | M | Mistral Small (Sep '24) | Mistral | 168 tok/s | 0.43s | 10.2 | $0.20 |
| 34 | A | Qwen3 Coder Next | Alibaba | 161 tok/s | 0.81s | 28.3 | $0.35 |
| 35 | X | Grok 4.1 Fast (Reasoning) | xAI | 160 tok/s | 9.47s | 38.6 | $0.20 |
| 36 | M | Mistral Small 3.1 | Mistral | 160 tok/s | 0.41s | 14.5 | $0.10 |
| 37 | M | Mistral Small 3 | Mistral | 157 tok/s | 0.41s | 12.7 | $0.10 |
| 38 | A | Nova 2.0 Pro Preview (medium) | Amazon | 152 tok/s | 11.82s | 35.7 | $1.25 |
| 39 | Claude 4.5 Haiku (Reasoning) | Anthropic | 144 tok/s | 11.72s | 37.1 | $1.00 |
| 40 | A | Qwen3 Next 80B A3B (Reasoning) | Alibaba | 142 tok/s | 1.03s | 26.7 | $0.50 |
| 41 | S | Apertus 8B Instruct | Swiss AI Initiative | 141 tok/s | 2.14s | 5.9 | $0.10 |
| 42 | GPT-5 (ChatGPT) | OpenAI | 141 tok/s | 0.56s | 21.8 | $1.25 |
| 43 | A | Qwen3 Next 80B A3B Instruct | Alibaba | 141 tok/s | 0.95s | 20.1 | $0.50 |
| 44 | S | Apriel-v1.5-15B-ThinkerFree | ServiceNow | 141 tok/s | 0.20s | 28.3 | Free |
| 45 | A | Qwen3 30B A3B 2507 (Reasoning) | Alibaba | 140 tok/s | 0.97s | 22.4 | $0.20 |
| 46 | o3-mini | OpenAI | 139 tok/s | 7.40s | 25.9 | $1.10 |
| 47 | A | Qwen3 1.7B (Reasoning) | Alibaba | 139 tok/s | 0.90s | 8.0 | $0.11 |
| 48 | A | Molmo2-8BFree | Allen Institute for AI | 138 tok/s | 0.41s | 7.3 | Free |
| 49 | A | Qwen3 VL 8B Instruct | Alibaba | 137 tok/s | 1.01s | 14.3 | $0.18 |
| 50 | S | Apriel-v1.6-15B-ThinkerFree | ServiceNow | 135 tok/s | 0.24s | 27.6 | Free |