The five terms every buyer mishears about LLMs.
Token, context window, fine-tune, agent, RAG. Five words that mean different things to your engineer than they do to your vendor. Hover any underlined term inline for the actual definition.
Why this matters
Most failed AI projects are misalignments, not technical defeats. The buyer wanted one thing, the vendor heard another, both used the same five words. Six months later there is a system in production that nobody asked for, and the disagreement is too embarrassing to surface.
Below: the five terms most often weaponized in pitch decks, with the plain-English meaning under each one. Hover the underlined word to see the short definition; the full one is in the tooltip too.
Token
Vendors quote pricing in tokens, which sounds like a stable unit of measurement. It is not. Different models tokenize the same English text into different numbers of tokens, and tokenizers also change between model versions, so the same prompt can cost more under a newer model even at the same listed per-token rate.
What to ask the vendor. "Show me the actual cost for one of my real prompts under your tokenizer, not your sticker price." Anyone evasive about that question is not the right vendor.
Context window
The marketing line is "1 million token context window" and the implication is that the model can actually use all of it. The reality, established by Chroma's "Context Rot" research and Anthropic's own internal evals, is that quality degrades sharply past 30 to 50% of the stated window depending on the model. By the time you are at 80% capacity, the model is missing details from the middle of the document.
What to ask the vendor. "Past what fraction of the stated window do your evals show degradation?" If they have not measured this, they have not stress-tested their claim.
Fine-tune
Buyers hear "fine-tune" and assume the vendor is going to teach the model their domain. Vendors hear "fine-tune" and think about a multi-week training run that costs five figures and locks the model behavior into a snapshot they will pay to update every quarter.
For 90% of buyer use cases, prompt engineering plus retrieval beats fine-tuning on flexibility, cost, and time-to-deploy. Fine-tuning earns its keep when you need consistent style or output format at scale, or when latency matters more than accuracy. Most buyers asking for it actually want either a custom system prompt or a RAG pipeline.
What to ask the vendor. "What would have to be true for prompt engineering plus retrieval to be insufficient here?" If they cannot answer that, they are selling fine-tuning as a default, not a choice.
Agent
Three definitions are running at the same time. To buyers, an agent is "a chatbot that does the work for me." To vendors selling SaaS, an agent is "an LLM with a single tool, like a function call to our API." To engineers, an agent is "an LLM in a loop, deciding what to do next, with multiple tools and a stop condition."
These are not interchangeable. The first is a fantasy unless the workflow is well-bounded. The second is a feature, not a product. The third is real and the most expensive to build well: it requires careful tool design, guardrails, max-turn caps, and an evaluation harness.
What to ask the vendor. "Walk me through one decision the agent makes that a deterministic script could not." If the answer is "summarize" or "rewrite", that is a function call, not an agent.
RAG
RAG sounds like a product. It is a pattern: retrieve relevant chunks of your data first, then prompt the model with them. Buyers hear "Retrieval-Augmented Generation" and picture a finished thing they can buy. Vendors hear it and think about embeddings, vector stores, chunk size, retrieval depth, and re-ranking. None of those are off-the-shelf decisions; all of them affect quality.
What to ask the vendor. "What chunk size and retrieval depth did you settle on, and what did you measure to land there?" The answer should be a number with a reason, not "we use the standard settings."
How to use this
Three rules.
- Demand falsifiability. Every claim a vendor makes about an LLM should come with the number that, if it were higher or lower, would change the answer. "Big context window" is not falsifiable. "Recall stays above 90% out to 200K tokens on our task" is.
- Replace the buzzword. Ask the vendor to walk through their solution without using the term. If they cannot describe it without saying "agent" or "RAG", neither can their engineers.
- Keep the glossary open. If you are sitting in a vendor meeting and one of these terms gets used, ask which definition they are using before you let it move on. The 30 seconds of awkwardness saves quarters.
The terms above are the ones that come up most often in our intake calls.