Context Window
Definition
The maximum amount of text, measured in tokens, that an AI model can process in a single conversation or request, impacting response quality.
Use Cases
- Khan Academy: AI tutoring that can reference a student’s ongoing conversation and lesson materials to provide coherent, step-by-step help. — Built Khanmigo on top of GPT-4-class models, relying on the model’s context window to keep recent dialogue, instructions, and relevant lesson snippets in a single prompt for consistent tutoring behavior. (More coherent multi-turn tutoring experiences and better continuity within a session compared with short-context prompts, improving the usefulness of interactive learning support.)
- Morgan Stanley: Internal assistant that answers questions using a large corpus of wealth management documents and procedures. — Used GPT-4 to summarize and answer questions over internal content; in practice, they combine retrieval (bringing the most relevant passages) with the model’s context window to include those passages and the user’s question in one request. (Faster access to institutional knowledge for advisors and more consistent answers, reducing time spent searching across documents.)
- Duolingo: Conversational language practice where the assistant must stay consistent with the learner’s level, the scenario, and the recent dialogue. — Used GPT-4 for features like roleplay-style conversation; the context window is used to include conversation history, scenario instructions, and learner constraints so responses remain on-topic and level-appropriate. (More natural multi-turn conversations and improved practice experiences compared with simpler scripted interactions.)
Frequently Asked Questions
- What’s the difference between a context window and token limit (max output tokens)?
- The context window is the total tokens the model can consider at once (your prompt + conversation history + retrieved text + the model’s reply). Max output tokens is only how long the model is allowed to generate in its response. A model can have a large context window but still have a smaller maximum response length.
- When should I use a larger context window?
- Use a larger context window when you need the model to consider lots of material at once, such as summarizing long documents, analyzing large code files, comparing multiple contracts, or maintaining long multi-turn conversations. If you only need short Q&A, a smaller context window is usually cheaper and faster.
- How much does a larger context window cost?
- You typically pay per token processed. Larger context windows can increase cost because more input tokens (and sometimes more output tokens) are sent to the model. Even if the model supports a large window, your cost depends on how many tokens you actually include in each request, plus the model’s per-token pricing and any added costs for retrieval, storage, or orchestration services.
Category: ai-ml
Difficulty: intermediate
Related Terms
See Also