Fastest AI Models -- Speed Rankings

When response time matters. Real-time applications, chatbots, and interactive tools need speed above all. Ranked by tokens per second output speed.

Our Top Pick

IMercury 2Intelligence32.8

Mercury 2 delivers the fastest output at 894 tok/s, ideal for real-time and interactive applications.

Methodology

Ranked by output speed in tokens per second, measured across standardized prompts across standardized evaluations.

#		Model	Provider	Speed	Latency	Intelligence	Price/1M
1	I	Mercury 2	Inception	894 tok/s	3.81s	32.8	$0.25
2	I	Granite 3.3 8B (Non-reasoning)	IBM	402 tok/s	9.53s	7.0	$0.03
3		Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)	Google	389 tok/s

Frequently Asked Questions

What is the best AI for real-time applications in 2026?

Based on our benchmark rankings, Mercury 2 is currently the top-ranked model for real-time applications. See the full rankings above for alternatives.

How much does Mercury 2 cost for real-time applications?

Mercury 2 costs $0.25/1M input tokens. For 100 requests per day at 2,000 tokens each, that's approximately $1.50/month.

Is there a free AI model good for real-time applications?

Currently, there are no free models that rank highly for real-time applications. Check our free models page for the best zero-cost options.

Not sure which model to pick?

Take our 30-second quiz and get a personalized recommendation.

Take the Quiz