Feature Store
Definition
Centralized repository for storing, managing, and serving machine learning features for consistent use across training and production.
Use Cases
- Uber: Real-time and batch features for marketplace predictions (e.g., ETA, demand forecasting, fraud/risk signals) shared across many ML models — Built Michelangelo, which includes a feature store to compute, store, and serve features consistently for both offline training and online inference, with shared definitions and reuse across teams (Improved feature reuse and consistency between training and production, reducing duplicated feature engineering and helping models ship and operate more reliably at scale)
- Airbnb: Ranking and search/personalization models using consistent user, listing, and session features across training and serving — Developed a centralized ML platform (Bighead) with shared feature pipelines and standardized feature definitions to reduce training/serving skew and enable reuse across multiple models (Faster iteration on models and more consistent production behavior by standardizing how features are defined and computed across teams)
- Netflix: Personalization and recommendations using shared behavioral and content features across many experiments and models — Uses a centralized data and ML platform approach with standardized feature pipelines and shared datasets to ensure consistent feature computation for training and production use cases (More repeatable experimentation and improved operational reliability by reducing inconsistencies in feature definitions across model development and deployment)
Provider Equivalents
- AWS: Amazon SageMaker Feature Store
- GCP: Vertex AI Feature Store
Frequently Asked Questions
- What's the difference between a Feature Store and a data warehouse?
- A data warehouse stores raw and curated datasets for analytics and reporting. A feature store specifically manages machine learning features (the model inputs), including feature definitions, versioning, and serving patterns for training (offline) and real-time predictions (online). In practice, a feature store often reads from a warehouse/lake, then publishes model-ready features with consistent logic.
- When should I use a Feature Store?
- Use a feature store when you have multiple models or teams reusing the same features, you need consistent feature calculations between training and production (to avoid training/serving skew), you require low-latency online feature lookups for real-time inference, or you want governance (lineage, access control, versioning) for features. If you have a single model with simple batch scoring, you may not need one yet.
- How much does a Feature Store cost?
- Cost depends on (1) storage for offline feature history, (2) online store capacity and read/write throughput for low-latency serving, (3) compute for feature pipelines (batch/stream processing), and (4) data transfer and orchestration. Managed services typically charge for storage and request/throughput, plus the underlying compute used to generate features. The biggest cost drivers are high-cardinality features, frequent updates, and high QPS online reads.
Category: ai-ml
Difficulty: advanced
Related Terms
See Also