Context Window

Definition

The maximum amount of text, measured in tokens, that an AI model can process in a single conversation or request, impacting response quality.

Use Cases

Frequently Asked Questions

What’s the difference between a context window and token limit (max output tokens)?
The context window is the total tokens the model can consider at once (your prompt + conversation history + retrieved text + the model’s reply). Max output tokens is only how long the model is allowed to generate in its response. A model can have a large context window but still have a smaller maximum response length.
When should I use a larger context window?
Use a larger context window when you need the model to consider lots of material at once, such as summarizing long documents, analyzing large code files, comparing multiple contracts, or maintaining long multi-turn conversations. If you only need short Q&A, a smaller context window is usually cheaper and faster.
How much does a larger context window cost?
You typically pay per token processed. Larger context windows can increase cost because more input tokens (and sometimes more output tokens) are sent to the model. Even if the model supports a large window, your cost depends on how many tokens you actually include in each request, plus the model’s per-token pricing and any added costs for retrieval, storage, or orchestration services.

Category: ai-ml

Difficulty: intermediate

Related Terms

See Also