How LLM per-token pricing works in 2026 โ and how to estimate your costs before you build.
AI models don't process English words โ they process tokens. A token is roughly 4 characters or 0.75 English words. When you send a prompt, the provider counts the tokens in your input (prompt + conversation history) and in the model's response (output). You're billed for both.
Every AI API charges different rates for input and output. Output is almost always more expensive โ typically 3โ5ร the input rate โ because generating text requires more compute than reading it.
| Model | Input $/1M | Output $/1M | Output multiplier |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 4.0ร |
| Claude Sonnet 4 | $3.00 | $15.00 | 5.0ร |
| Claude Opus 4 | $15.00 | $75.00 | 5.0ร |
| Gemini 2.5 Flash | $0.15 | $0.60 | 4.0ร |
| DeepSeek V3 | $0.27 | $1.10 | 4.1ร |
| GLM-4 Air (cheapest) | $0.01 | $0.01 | 1.0ร |
This matters because most applications are output-heavy โ you send a short prompt, get a long response. A coding assistant that generates 1,500 tokens of code per prompt will spend more on output than input even if input is cheaper.
The formula is straightforward:
Example: 1,000 requests/day to GPT-4o with 2,000 input + 500 output tokens each:
= 1,000 ร 30 ร (2,000 ร $2.50 + 500 ร $10.00) รท 1,000,000
= 30,000 ร ($5.00 + $5.00) รท 1,000,000
= $150/month
Same volume on cheapest models (GLM-4 Air at $0.01/$0.01):
= 30,000 ร ($0.02 + $0.005) รท 1,000,000
= $0.75/month โ 200ร cheaper.
Across 402 per-token models, pricing spans four orders of magnitude:
OpenAI and Anthropic support prompt caching โ when the model recognizes that some input tokens are identical to a previous request, it reuses the cached computation and charges ~50% less for those tokens.
Caching works best for:
At 80% cache hit rate, your effective input cost drops by 40%. Enable caching in the AI Cost Calculator to see the difference.
Anthropic and OpenAI offer batch APIs at 50% discount. You submit a file of requests and get results within 24 hours. Ideal for:
Not all models charge per token. Ollama Cloud uses flat monthly subscriptions:
| Plan | Price | Concurrency | Use case |
|---|---|---|---|
| Free | $0/mo | 1 model | Development, personal projects |
| Pro | $20/mo | 3 models | Freelance, side projects, prototyping |
| Max | $100/mo | 10 models | Production teams, parallel workloads |
If you're running high-volume agent workloads (1M+ tokens/day), subscription pricing is usually cheaper than per-token. Run both sides in the calculator to compare.
OpenRouter aggregates 361 community-hosted models โ many are open-weight models (Llama, Qwen, Gemma) served at near-cost pricing. The platform adds a small margin but gives you one API key for access to hundreds of models. OpenRouter prices tend to be 30-60% cheaper than direct API for equivalent models because of competitive hosting.
Don't guess. Use real numbers:
usage field with exact input/output token counts.Enter your usage volume and compare costs across all 461 models โ daily, monthly, and annual breakdowns.
Launch AI Cost Calculator โ