Cost Lab.
Everything money-shaped about running on LLMs in one place: per-request cost, prompt caching savings, multi-month forecasting, and embedding cost. Combines what used to be the Token Cost Estimator and Prompt Caching Calculator with two new tabs for forecasting and embedding spend.
Pricing verified 2026-05-06 · Sourced from vendor pricing pages
Sources & methodology
Pricing: Sourced from vendor pricing pages. Anthropic, OpenAI, Google, Together AI, DeepSeek.
Caching math: Anthropic prompt caching writes at 1.25x base input rate, reads at 0.10x base. Cache TTL is 5 minutes (default). At hit rates below ~20%, caching costs more than it saves.
Forecast model: Compounding monthly volume growth at constant per-request token shape. Does not model price changes, model deprecations, or capacity tier discounts.
Embedding pricing: Public list pricing for Voyage AI, OpenAI, and Cohere embedding endpoints. Re-embed factor is your assumption about corpus churn or model upgrade frequency.
Caveats: Volume discounts, enterprise tiers, and regional pricing variance not modeled. Latency, capability, and quality differences not factored. Use these numbers as a planning baseline, not a final quote.