Model Inference

Definition

Using a trained AI model to make predictions or decisions on new data. Like applying learned knowledge to solve new problems.

Use Cases

Provider Equivalents

Frequently Asked Questions

What's the difference between model inference and model training?
Training is when you feed lots of labeled or historical data into an algorithm to learn model parameters (it’s compute-heavy and happens periodically). Inference is when you use the already-trained model to make a prediction on new data (it’s usually latency-sensitive and happens continuously in production).
When should I use model inference (real-time vs batch)?
Use real-time (online) inference when you need an immediate response, such as fraud checks during checkout, chatbot replies, or image recognition in an app. Use batch inference when you can process many records at once on a schedule, such as scoring all customers nightly for churn risk or generating weekly demand forecasts.
How much does model inference cost?
Cost depends on (1) compute type and size (CPU vs GPU/accelerators), (2) how long endpoints run (always-on vs scale-to-zero options where available), (3) request volume and payload size, (4) latency/throughput targets that drive overprovisioning, and (5) extras like load balancing, monitoring, and data transfer. Batch inference is often cheaper for non-urgent workloads because you pay for job runtime rather than keeping an endpoint running.

Category: ai-ml

Difficulty: intermediate

Related Terms

See Also