Tool 07 · Live

Performance Lab.

Will my doc fit and how fast will it return? Context-window fit and API latency across 10 models, in one tool. Combines what used to be the Context Window Visualizer with new latency estimates.

Pricing verified 2026-05-06 · Latency numbers are illustrative; run your own evals

Stated context windows are technical ceilings. Real reliability degrades past ~50% of the window (Chroma "Context Rot" research). The reliable working zone is shaded in green.

Sources & methodology

Context windows: Sourced from vendor documentation: Anthropic, OpenAI, Google AI for Developers, Together AI, DeepSeek API.

Reliable working zone: 50% of stated window, based on Chroma Research's "Context Rot" findings (2025) and the broader long-context-degradation literature. Some models hold reliability further; we err conservative.

Token conversion: ~4 chars/token, ~0.75 words/token, ~500 tokens per page of typical English prose, ~250 tokens per KB. These are averages; technical writing and code can run 30-40% denser.

Latency assumptions: TTFT and tokens-per-second are illustrative defaults derived from public benchmarks and our own measurements. Real-world latency depends on region, traffic, and prompt size. Always validate against your own region under your own load.

Cache hit reduction: Warm-cache TTFT is approximated as ~70% of cold TTFT. Anthropic and OpenAI both ship faster TTFT on cache hits, with the exact ratio depending on prompt structure.

Related tools