🦞 AI API Pricing Explained

How LLM per-token pricing works in 2026 — and how to estimate your costs before you build.

What Are Tokens?

AI models don't process English words — they process tokens. A token is roughly 4 characters or 0.75 English words. When you send a prompt, the provider counts the tokens in your input (prompt + conversation history) and in the model's response (output). You're billed for both.

Quick reference:
1 token ≈ 4 characters ≈ 0.75 English words
250 words (1 page) ≈ 333 tokens
75,000 words (300-page book) ≈ 100,000 tokens
1M tokens ≈ 750,000 words ≈ 3,000 pages ≈ 10 books

Input vs Output Pricing

Every AI API charges different rates for input and output. Output is almost always more expensive — typically 3–5× the input rate — because generating text requires more compute than reading it.

Model	Input $/1M	Output $/1M	Output multiplier
GPT-4o	$2.50	$10.00	4.0×
Claude Sonnet 4	$3.00	$15.00	5.0×
Claude Opus 4	$15.00	$75.00	5.0×
Gemini 2.5 Flash	$0.15	$0.60	4.0×
DeepSeek V3	$0.27	$1.10	4.1×
GLM-4 Air (cheapest)	$0.01	$0.01	1.0×

This matters because most applications are output-heavy — you send a short prompt, get a long response. A coding assistant that generates 1,500 tokens of code per prompt will spend more on output than input even if input is cheaper.

How Pricing Works: The Math

The formula is straightforward:

Monthly cost = requests/day × 30 × (input_tokens × input_price + output_tokens × output_price) ÷ 1,000,000

Example: 1,000 requests/day to GPT-4o with 2,000 input + 500 output tokens each:
= 1,000 × 30 × (2,000 × $2.50 + 500 × $10.00) ÷ 1,000,000
= 30,000 × ($5.00 + $5.00) ÷ 1,000,000
= $150/month

Same volume on cheapest models (GLM-4 Air at $0.01/$0.01):
= 30,000 × ($0.02 + $0.005) ÷ 1,000,000
= $0.75/month — 200× cheaper.

Price Range: What Models Cost in 2026

Across 402 per-token models, pricing spans four orders of magnitude:

Budget tier

$0.01–0.10

GLM-4 Air, Qwen Turbo

Mid-tier

$0.15–2.00

Gemini Flash, DeepSeek V3

Premium

$2.50–15.00

GPT-4o, Claude Sonnet

Frontier

$15.00–75.00

Claude Opus 4, o4-mini

Prompt Caching: The 50% Discount You're Missing

OpenAI and Anthropic support prompt caching — when the model recognizes that some input tokens are identical to a previous request, it reuses the cached computation and charges ~50% less for those tokens.

Caching works best for:

System prompts — the same instructions every request
Retrieval contexts — RAG pipelines with static knowledge base chunks
Multi-turn conversations — the conversation history stays constant

At 80% cache hit rate, your effective input cost drops by 40%. Enable caching in the AI Cost Calculator to see the difference.

Batch Processing: Half Price, 24-Hour Turnaround

Anthropic and OpenAI offer batch APIs at 50% discount. You submit a file of requests and get results within 24 hours. Ideal for:

Evaluating model benchmarks across test sets
Bulk data labeling and classification
Summarizing large document collections
Generating training data for fine-tuning

Subscription Pricing: Ollama Cloud

Not all models charge per token. Ollama Cloud uses flat monthly subscriptions:

Plan	Price	Concurrency	Use case
Free	$0/mo	1 model	Development, personal projects
Pro	$20/mo	3 models	Freelance, side projects, prototyping
Max	$100/mo	10 models	Production teams, parallel workloads

If you're running high-volume agent workloads (1M+ tokens/day), subscription pricing is usually cheaper than per-token. Run both sides in the calculator to compare.

OpenRouter: The Aggregator Discount

OpenRouter aggregates 361 community-hosted models — many are open-weight models (Llama, Qwen, Gemma) served at near-cost pricing. The platform adds a small margin but gives you one API key for access to hundreds of models. OpenRouter prices tend to be 30-60% cheaper than direct API for equivalent models because of competitive hosting.

Estimating Your Actual Bill

Don't guess. Use real numbers:

Count your requests. Track daily API calls from your application logs.
Measure your token usage. Every API response includes a usage field with exact input/output token counts.
Plan for growth. Multiply current usage by your projected user growth over 3 months.
Compare providers. Different models at different price points — use the calculator for a side-by-side comparison with your numbers.

🦞 Stop guessing. Start calculating.

Enter your usage volume and compare costs across all 461 models — daily, monthly, and annual breakdowns.

Launch AI Cost Calculator →