Frontier quality isn't cheap, but these budget models handle 80% of tasks for pennies.
You don't always need a $5/million-token model. For many tasks — classification, extraction, summarization, basic coding, and routine Q&A — models under $1 per million input tokens deliver perfectly adequate results. We ranked every model in this price tier to find the best values.
A year ago, models under $1/1M tokens were barely useful for serious work. Today, the budget tier includes models with intelligence scores above 45 — a level that handles most practical tasks competently.
The improvement comes from two directions: frontier models from 12-18 months ago getting cheaper as newer models replace them, and new efficiency-focused architectures (like MoE) that deliver high capability at low cost.
For startups, hobby projects, and high-volume production workloads, the budget tier is now the smart default. You should only step up to frontier pricing when the task specifically requires it.
Zhipu's GLM-5 leads the budget tier with a 49.8 intelligence score — higher than Claude Opus 4.5 and just 3 points below Claude Sonnet 4.6. At $1 per million input tokens and $3.20 per million output tokens, it costs 60-80% less than the frontier models.
The coding score of 44.2 is strong for this price point. GLM-5 handles software development tasks, analytical reasoning, and creative writing at a quality level that was frontier-only six months ago.
The main limitation is that GLM-5 is from a Chinese provider, which means API latency from Western locations can be higher and documentation is less comprehensive than US providers.
GPT-5.4 Mini's 51.5 coding score makes it the best coding model in the budget tier — and it actually outscores several models that cost 5-10x more. At 218 tok/s, it's also blazingly fast.
For code completion, test generation, documentation, debugging, and routine refactoring, Mini handles the work that GPT-5.4 handles but at 30% of the cost and 3x the speed. The tradeoff is a lower intelligence score (48.1) that shows on complex architectural decisions.
This is the model to use for your IDE coding assistant, automated test generation, and code review at scale.
Google's Gemini 3 Flash delivers 192 tokens per second with a 46.4 intelligence score. That combination of speed and quality at $0.50 per million input tokens makes it the best choice for real-time applications where responsiveness matters.
Chatbots, interactive assistants, real-time search, and live coding suggestions all benefit from Flash's speed. The quality is good enough for these interactive use cases where users value a fast, good-enough response over a slow, perfect one.
For the highest-volume, lowest-complexity tasks:
GPT-5.4 Nano ($0.20/$1.25): 44.4 intelligence. Handles classification, extraction, formatting, and simple Q&A. The cheapest major-provider model with meaningful capability.
GPT-5 Nano ($0.05/$0.25): 26.8 intelligence at minimal quality. For basic routing and classification only.
Gemma 3n ($0.02/$0.07): The cheapest benchmarked model. Intelligence of 6.4 means it's only useful for the simplest tasks, but at $0.02/1M tokens, the cost is essentially zero.
For production systems processing millions of requests daily, these micro-models run the simple routing and classification layers while expensive models handle the hard tasks.
All models priced under $1 per million input tokens. Rankings from Artificial Analysis Intelligence and Coding indices. Speed measurements from AA median P50.
GLM-5 for best quality under $1. GPT-5.4 Mini for coding on a budget. Gemini 3 Flash for speed-critical applications. GPT-5.4 Nano for rock-bottom pricing. The budget tier now delivers what was frontier-only a year ago.
Published June 5, 2026. Data updated daily from independent benchmarks and API providers.