Token
Definition
The basic unit of text processed by AI language models, typically representing words, word parts, or punctuation, crucial for understanding language.
Use Cases
- OpenAI: Metering and limiting usage for ChatGPT and API-based text generation by counting tokens in prompts and model outputs. — Uses tokenization to convert input/output text into tokens for model processing, then applies token-based limits (context window) and usage accounting for billing and rate limiting. (Enables predictable capacity planning, prevents overly long requests that exceed model context limits, and supports usage-based pricing tied to actual model workload.)
- Microsoft: Enterprise copilots and Azure OpenAI workloads that must control prompt size, latency, and cost. — Applications estimate and cap prompt + completion tokens before sending requests, summarize or chunk documents to fit within model context limits, and monitor token consumption for governance and budgeting. (Improves reliability (fewer context-length errors), reduces inference cost by avoiding oversized prompts, and helps teams enforce consistent performance targets.)
- Google: Document Q&A and summarization systems that process long content while staying within model context limits. — Splits large documents into smaller chunks, embeds or summarizes them, and sends only the most relevant chunks to the model so total tokens remain within the context window. (Maintains answer quality on large corpora while controlling latency and cost by minimizing unnecessary tokens sent to the model.)
Frequently Asked Questions
- What's the difference between a token and a word in AI?
- A word is a human language unit, but a token is what the model actually reads. Tokens can be whole words, parts of words (like prefixes/suffixes), punctuation, or even whitespace depending on the tokenizer. For example, “unhappiness” might be split into multiple tokens, and “hello!” might be two tokens (“hello” and “!”).
- When do I need to think about tokens when using an LLM?
- Think about tokens whenever you design prompts, process long documents, or manage cost and latency. Tokens matter for (1) context window limits (prompt + output must fit), (2) performance (more tokens usually means slower responses), and (3) pricing/quotas (many APIs charge per input and output token).
- How much do tokens cost?
- Tokens don’t have an inherent cost by themselves; cost depends on the specific model/API you use. Many LLM providers price separately for input tokens (your prompt) and output tokens (the model’s response). Total cost is influenced by model choice, total tokens processed, and features like larger context windows. Always check the pricing page for your chosen model and estimate usage by measuring average prompt and response token counts.
Category: ai-ml
Difficulty: intermediate
Related Terms
See Also