Tool 02 · Live

Cost Lab.

Everything money-shaped about running on LLMs in one place: per-request cost, prompt caching savings, multi-month forecasting, and embedding cost. Combines what used to be the Token Cost Estimator and Prompt Caching Calculator with two new tabs for forecasting and embedding spend.

Pricing verified 2026-05-06 · Sourced from vendor pricing pages

Input tokens / req Output tokens / req Requests / day Days / month

Use prompt caching

Cache hit rate (%)

Cost to embed a corpus once, plus optional yearly re-embed if your embedding model changes. Embedding pricing is separate from LLM pricing and uses dedicated embedding APIs.

Documents to embed Avg tokens / document Yearly re-embed factor

Re-embed factor: 0.5 means you re-embed about half the corpus per year. 1.0 = full re-embed. 0 = one-time cost only.

Sources & methodology

Pricing: Sourced from vendor pricing pages. Anthropic, OpenAI, Google, Together AI, DeepSeek.

Caching math: Anthropic prompt caching writes at 1.25x base input rate, reads at 0.10x base. Cache TTL is 5 minutes (default). At hit rates below ~20%, caching costs more than it saves.

Forecast model: Compounding monthly volume growth at constant per-request token shape. Does not model price changes, model deprecations, or capacity tier discounts.

Embedding pricing: Public list pricing for Voyage AI, OpenAI, and Cohere embedding endpoints. Re-embed factor is your assumption about corpus churn or model upgrade frequency.

Caveats: Volume discounts, enterprise tiers, and regional pricing variance not modeled. Latency, capability, and quality differences not factored. Use these numbers as a planning baseline, not a final quote.

Related tools

Performance Lab

Will it fit and how fast

Data Sensitivity

Classify before you send

AI ROI Calculator

Hours saved to dollars