Rate limit

A rate limit is the cap a provider places on how often you can call an API — commonly expressed as requests per minute (RPM), requests per day (RPD), or tokens per minute (TPM) — and free tiers usually have the tightest limits.

A rate limit controls throughput so one user can’t overwhelm a service. The most common forms for LLM APIs are requests per minute (RPM), requests per day (RPD), and tokens per minute (TPM). Hitting a limit returns an error (often HTTP 429) until the window resets.

Free tiers are defined largely by their rate limits — for example a set number of free-model requests per day on OpenRouter, or per-model RPM/RPD caps on the Gemini and Groq free tiers. Paid tiers raise these limits.

To stay within free limits, batch work, add retries with backoff, and spread requests over time rather than bursting.

Last updated