The 25 cheapest AI model APIs ranked by per-token price. Updated May 2026 from live pricing data across 461 models.
The cheapest AI APIs cluster around $0.01โ$0.10 per million tokens for input. Chinese providers (Zhipu, Tencent, Alibaba, ByteDance) dominate the budget tier alongside OpenRouter community-hosted models. Most cheap models are open-weight (Llama, Qwen, Mistral) โ you trade frontier reasoning for affordability.
All prices below are live. Click any model name to jump to the AI Cost Calculator for a full monthly cost breakdown with your usage volume.
| # | Model | Provider | Type | Input $/1M | Output $/1M | Combined | Context |
|---|---|---|---|---|---|---|---|
| 1 | GLM-4 Air | Zhipu AI | Direct | $0.01 | $0.01 | $0.03 | 128K |
| 2 | GLM-4 Long | Zhipu AI | Direct | $0.01 | $0.01 | $0.03 | 1M |
| 3 | Hunyuan Standard | Tencent | Direct | $0.01 | $0.01 | $0.03 | 32K |
| 4 | Mistral Nemo | Mistral | OpenRouter | $0.02 | $0.03 | $0.05 | 131K |
| 5 | Llama 3.1 8B | Meta | OpenRouter | $0.02 | $0.05 | $0.07 | 16K |
| 6 | Llama 3 8B | Meta | OpenRouter | $0.03 | $0.04 | $0.07 | 8K |
| 7 | Qwen Turbo | Alibaba Cloud | Direct | $0.02 | $0.06 | $0.08 | 131K |
| 8 | MiniMax M2.5 | MiniMax | Direct | $0.04 | $0.04 | $0.08 | 131K |
| 9 | ABAB 6.5 | MiniMax | Direct | $0.04 | $0.04 | $0.08 | 245K |
| 10 | Llama 3 8B Lunaris | Sao10K | OpenRouter | $0.04 | $0.05 | $0.09 | 8K |
| 11 | Doubao Lite | ByteDance | Direct | $0.04 | $0.08 | $0.12 | 131K |
| 12 | Yi Medium | 01.AI | Direct | $0.04 | $0.08 | $0.12 | 32K |
| 13 | Mixtral 8x7B (Groq) | Groq | Direct | $0.06 | $0.06 | $0.12 | 32K |
| 14 | Gemma 3 4B | OpenRouter | $0.04 | $0.08 | $0.12 | 131K | |
| 15 | MythoMax 13B | Gryphe | OpenRouter | $0.06 | $0.06 | $0.12 | 4K |
| 16 | Granite 4.0 Micro | IBM | OpenRouter | $0.02 | $0.11 | $0.13 | 131K |
| 17 | Mistral Small 3 | Mistral | OpenRouter | $0.05 | $0.08 | $0.13 | 32K |
| 18 | Step-1V (Vision) | StepFun | Direct | $0.07 | $0.07 | $0.14 | 32K |
| 19 | Qwen 2.5 7B | Qwen | OpenRouter | $0.04 | $0.10 | $0.14 | 32K |
| 20 | LFM2-24B-A2B | Liquid AI | OpenRouter | $0.03 | $0.12 | $0.15 | 32K |
| 21 | MiniMax M2.7 | MiniMax | Direct | $0.08 | $0.08 | $0.16 | 131K |
| 22 | Qwen-Turbo | Qwen | OpenRouter | $0.03 | $0.13 | $0.16 | 131K |
| 23 | Gemma 3 12B | OpenRouter | $0.04 | $0.13 | $0.17 | 131K | |
| 24 | GPT-OSS-20B | OpenAI | OpenRouter | $0.03 | $0.14 | $0.17 | 131K |
| 25 | Qwen3 235B A22B | Qwen | OpenRouter | $0.07 | $0.10 | $0.17 | 262K |
Several providers offer completely free tiers:
Use prompt caching. OpenAI and Anthropic offer ~50% discount on cached input tokens. If your system prompt or context is repetitive, enable caching in the calculator and set 50-80% cache hit rate.
Choose shorter context models. Models with massive context windows (1M+) charge for every token in the window, even if you only use a fraction. Match the context to your task.
Batch processing. Anthropic and OpenAI offer 50% discount on batch requests with 24-hour turnaround. Perfect for eval runs, data labeling, and bulk summarization.
Fine-tune with the cheapest model. Use a free/cheap model for experimentation and prototyping, then switch to a frontier model only for final production runs.
Enter your actual usage volume and get a personalized monthly cost comparison across all 461 models.
Launch AI Cost Calculator โ