Feature Engineering
Definition
The process of selecting and transforming raw data into meaningful inputs that AI models can learn from effectively, improving model accuracy.
Use Cases
- Uber: Improving predictions for marketplace and logistics (e.g., ETA, demand forecasting, matching) — Built an internal feature store (Michelangelo) to define, compute, and reuse features across teams, with consistent offline (training) and online (serving) feature computation. (Faster iteration and more consistent model behavior by reusing standardized features across many models and reducing training/serving skew.)
- Airbnb: Search ranking and listing recommendations — Uses large-scale data processing to create features from user behavior and listing attributes (e.g., historical engagement signals, location and availability-derived features) and feeds them into ML ranking models. (More relevant search results and improved user engagement by incorporating richer behavioral and contextual signals into models.)
- Netflix: Personalized recommendations and content ranking — Engineers features from viewing history, time-of-day patterns, device context, and content metadata; these features are used in recommendation and ranking pipelines. (Better personalization and retention by capturing nuanced user preferences and context in model inputs.)
Provider Equivalents
- AWS: Amazon SageMaker Data Wrangler
- Azure: Azure Machine Learning (Data Wrangler)
- GCP: Vertex AI Feature Store
- OCI: OCI Data Science (Feature Store)
Frequently Asked Questions
- What's the difference between feature engineering and feature selection?
- Feature engineering creates or transforms inputs (e.g., turning timestamps into day-of-week, or combining fields into a new metric). Feature selection chooses which existing features to keep (e.g., dropping redundant or noisy columns) to improve accuracy, speed, or interpretability.
- When should I use feature engineering?
- Use it when raw data doesn’t represent the signal your model needs. Common triggers are: many categorical/text fields, time-series patterns, domain rules (e.g., ratios like price per square foot), or when baseline models underperform. It’s especially valuable for tabular business data (fraud, churn, pricing, forecasting).
- How much does feature engineering cost?
- Costs usually come from compute, storage, and data movement—not a per-feature fee. Batch feature pipelines cost depends on dataset size, transformation complexity, and how often you recompute features. Online feature serving adds cost for low-latency databases/caches and read/write throughput. Managed tools (e.g., Data Wrangler, Feature Stores) add service charges plus underlying compute and storage.
Category: ai-ml
Difficulty: advanced
Related Terms
See Also