Context window

A context window is the maximum number of tokens a model can consider at once — the combined size of your prompt and the model's response — ranging from a few thousand to over a million tokens depending on the model.

The context window is the model’s working memory for a single request. Everything the model can “see” — your instructions, any documents you include, the conversation so far, and the answer it is generating — must fit inside it, measured in tokens.

Larger context windows let you feed in more material (long documents, big codebases) in one call. Modern models range from a few thousand tokens to one million or more. If you exceed the window, older content must be dropped or summarised.

Context window size affects what you can build on a free tier: bigger inputs consume more tokens, which can hit per-minute token limits faster.

Last updated