Inference

Inference is the act of running a trained model to produce an output — turning your prompt into a response — and it is the usage that API free tiers, trial credits, and per-token pricing pay for.

Inference is using a model after it has been trained: you send input, the model computes, and you get an output. Every API call you make to an LLM is an inference request, billed by the tokens consumed.

Providers and routers that specialise in fast, low-cost inference — such as Groq, Together AI, OpenRouter, and Hugging Face Inference Providers — compete on speed and price. Their free tiers and included credits let you run inference at no cost within limits.

Inference is distinct from training or fine-tuning, which create or adapt a model. Most free LLM offers cover inference only.

Last updated