Data Drift
Definition
When the statistical properties of input data change over time compared to the training data, potentially degrading model performance.
Use Cases
- Netflix: Personalized content recommendations when viewing patterns change due to new releases, regional trends, or seasonal events — Netflix operates large-scale recommendation systems and continuously evaluates and updates models using fresh behavioral data. In practice, this includes monitoring shifts in key input signals (e.g., watch history patterns, content popularity) and refreshing models to keep recommendations relevant as user behavior changes. (More relevant recommendations and improved user engagement by keeping models aligned with current viewing behavior rather than outdated historical patterns.)
- Uber: Demand prediction and dynamic pricing when rider/driver behavior shifts during holidays, weather events, or major local events — Uber uses continuous data pipelines and frequent model updates to adapt to changing marketplace conditions. Monitoring for distribution shifts in inputs (time, location, event signals, weather, supply/demand indicators) helps identify when models trained on past conditions may no longer match current reality. (Better marketplace efficiency (matching riders and drivers) and more stable pricing/ETAs during rapidly changing conditions.)
Provider Equivalents
- AWS: Amazon SageMaker Model Monitor
- Azure: Azure Machine Learning data drift monitoring
- GCP: Vertex AI Model Monitoring
- OCI: OCI Data Science Model Deployment Monitoring
Frequently Asked Questions
- What's the difference between data drift and concept drift?
- Data drift means the input data (features) changes over time—for example, customers start browsing different categories. Concept drift means the relationship between inputs and the target changes—for example, the same browsing behavior no longer predicts purchases because a competitor changed prices or a new policy changed buying decisions. You can have data drift without concept drift, and vice versa.
- When should I monitor for data drift?
- Monitor for data drift when your model runs in production and the real-world environment can change—common in retail, ads, fraud, finance, logistics, and any system influenced by seasonality, campaigns, product changes, or user behavior. It’s especially important when model errors are costly (fraud losses, compliance risk, customer churn) or when you can’t label outcomes quickly (making performance drops harder to detect directly).
- How much does data drift monitoring cost?
- Costs depend on (1) how much data you monitor (volume and frequency), (2) where you store baselines and logs, (3) how often you run drift calculations, and (4) alerting/visualization and any retraining you trigger. In managed services, you typically pay for underlying compute (monitoring jobs), storage (logs/metrics), and sometimes per-feature or per-model monitoring. The biggest cost driver is often the operational overhead and retraining pipeline runs, not the drift metric itself.
Category: ai-ml
Difficulty: advanced
Related Terms
See Also