MLOps
Definition
Machine Learning Operations - practices and tools for deploying, monitoring, and managing AI models in production, similar to DevOps but for ML systems.
Use Cases
- Netflix: Personalized recommendations and ranking models that must be updated and validated frequently as user behavior changes. — Netflix has publicly described using an internal ML platform approach with automated pipelines for training and deploying models, plus experimentation and monitoring to validate model changes before broad rollout. (Faster iteration on models and safer deployments through controlled rollouts and measurement, supporting personalization at scale.)
- Uber: Productionizing many ML models for forecasting, matching, and fraud/risk signals across different teams. — Uber has publicly described Michelangelo, an internal ML platform that standardizes training workflows, feature management, deployment, and monitoring so teams can ship models consistently. (Reduced time to deploy models and improved operational consistency by providing shared tooling for the full ML lifecycle.)
- Airbnb: Search ranking and pricing-related ML models that require reliable retraining and monitoring to prevent performance regressions. — Airbnb has publicly discussed building internal ML infrastructure to manage data/feature pipelines, training, and deployment with safeguards and monitoring to track model quality over time. (More reliable model updates and improved ability to detect and address model drift or data issues in production.)
Provider Equivalents
- AWS: Amazon SageMaker
- Azure: Azure Machine Learning
- GCP: Vertex AI
- OCI: OCI Data Science
Frequently Asked Questions
- What's the difference between MLOps and DevOps?
- DevOps focuses on reliably building, testing, and deploying software code. MLOps includes those practices but adds ML-specific needs: managing training data and features, tracking experiments, versioning models, monitoring model accuracy and drift, and retraining models when data changes.
- When should I use MLOps?
- Use MLOps when a model is running in production and business outcomes depend on it. Common triggers include: multiple models or teams, frequent model updates, regulatory or audit needs, the need for monitoring and alerting, or when model performance can degrade over time due to changing data (drift). For a one-off prototype or a model used only in a notebook, full MLOps is usually unnecessary.
- How much does MLOps cost?
- Costs vary based on compute (training and inference), storage (datasets, artifacts, logs), and operational tooling (pipelines, monitoring, CI/CD). Major cost drivers are: how often you retrain, model size, traffic to inference endpoints, and retention of logs/metrics. Managed platforms (e.g., SageMaker, Azure ML, Vertex AI, OCI Data Science) charge for underlying resources you use; self-managed MLOps can reduce platform fees but increases engineering and maintenance costs.
Category: ai-ml
Difficulty: advanced
Related Terms
See Also