aws
AI Infrastructure
advanced
Personalized content and product recommendations

Real-Time Recommendation Pipeline

AI Infrastructure

Real-time recommendations combine pre-computed collaborative filtering scores with live user behavior signals to suggest relevant content, products, or connections. This pipeline merges offline model outputs (computed in batch) with online features (recent clicks, cart items, time of day) through a feature assembly layer, then ranks candidates using a lightweight scoring model that responds in under 50ms. Built for product teams powering homepage feeds, product suggestions, or content rankings that adapt to user behavior in real time.

Data Flow

Recommendations API
Event Stream
Scoring Service
Feature Assembly
Feature Updater
Candidate Cache
User Features
Content Index
Model Artifacts

Share this architecture with your network

Service Breakdown (9 services)

Compute3 services
Scoring Service
  • Runs containerized microservices at scale
  • Auto-scales based on CPU and memory utilization
  • Supports rolling deployments and health checks
Feature Assembly
  • Runs containerized microservices at scale
  • Auto-scales based on CPU and memory utilization
  • Supports rolling deployments and health checks
Feature Updater
  • Executes serverless functions on demand
  • Scales automatically with zero idle cost
  • Ideal for event-driven and async workflows
Storage1 service
Model Artifacts
  • Stores objects with eleven 9s of durability
  • Supports lifecycle policies for cost optimization
  • Serves as a data lake foundation
Networking1 service
Recommendations API
  • Routes and throttles incoming API requests
  • Enforces authentication and rate limiting
  • Provides a unified entry point for microservices
Messaging1 service
Event Stream
  • Ingests real-time streaming data at scale
  • Supports multiple consumers per stream
  • Buffers events for downstream processing
Data3 services
Candidate Cache
  • Caches hot data in-memory for sub-ms latency
  • Supports Redis and Memcached engines
  • Reduces database load with intelligent caching
User Features
  • Provides single-digit millisecond reads and writes
  • Scales throughput automatically with demand
  • Supports global tables for multi-region access
Content Index
  • Powers full-text search and log analytics
  • Scales horizontally for large datasets
  • Supports real-time indexing and querying

Scaling Strategy

Pre-computed recommendation candidates are stored in ElastiCache for instant retrieval. Kinesis captures real-time user events (clicks, views, purchases) that update user feature vectors in DynamoDB. The scoring service on ECS scales horizontally with request-level autoscaling. OpenSearch provides content-based filtering as a fallback for cold-start users. Batch model retraining runs daily and publishes new scores to the cache.

Related Architectures