Provider Surcharges (March 2026)
Both Anthropic and Google apply explicit price surcharges for long contexts. When your request exceeds 200K tokens, the per-token price doubles. This isn’t hidden — it’s documented on their pricing pages — but many developers miss it when estimating costs.
// Context surcharge examples
Claude Opus 4.6
<200K tokens: $5.00/M input
>200K tokens: $10.00/M input (2x)
Gemini 2.5 Pro
<200K tokens: $1.25/M input
>200K tokens: $2.50/M input (2x)
GPT-5
400K context, no surcharge
The Practical Impact
A RAG system that retrieves 50 documents averaging 5,000 tokens each loads 250K tokens per request. On Claude Opus, the first 200K costs $1.00 and the remaining 50K costs $0.50 (at the 2x rate). The surcharge adds 50% to the total input cost for that last 50K tokens. Keeping context under 200K becomes a concrete cost optimization target.
Key insight: Context surcharges create a pricing cliff. Staying just under 200K tokens can save you 2x on the marginal cost. This is why context compression and selective retrieval aren’t just nice-to-haves — they have direct, measurable cost impact.