YT
gcp
System Design
advanced
Video sharing and streaming platform

YouTube Video Streaming System

YouTube / Google

YouTube handles over 500 hours of video uploads every minute and serves billions of views daily. This GCP-native architecture separates the upload pipeline (chunked upload → transcoding via GKE → multiple resolutions in Cloud Storage) from the viewing pipeline (Cloud CDN → adaptive bitrate streaming). Aimed at engineers building video platforms that need adaptive bitrate streaming, automated moderation, and personalized recommendations.

Data Flow

Cloud CDN
Upload API
Transcode Queue
Upload Service
Transcoding Workers
Recommendation Engine
Video Storage
Video Metadata DB
Memorystore Cache

Share this architecture with your network

Service Breakdown (9 services)

Other9 services
Cloud CDN
  • Distributes traffic across backend targets
  • Health-checks services and routes around failures
  • Scales automatically with traffic spikes
Upload API
  • Routes API traffic and enforces policies
  • Manages authentication and rate limiting
  • Provides a unified API endpoint
Upload Service
  • Validates and ingests video uploads from creators
  • Extracts metadata and generates processing tasks
  • Enforces content size and format restrictions
Transcoding Workers
  • Converts videos to multiple resolutions and codecs
  • Parallelizes encoding for fast turnaround
  • Produces adaptive bitrate streaming variants
Recommendation Engine
  • Ranks and personalizes video suggestions per user
  • Combines collaborative filtering with watch history
  • Optimizes for engagement and content diversity
Video Storage
  • Stores transcoded video segments and thumbnails
  • Serves content globally via CDN integration
  • Manages storage tiers for hot and archive content
Transcode Queue
  • Buffers transcoding jobs for parallel processing
  • Prioritizes based on creator tier and video length
  • Retries failed jobs with exponential backoff
Video Metadata DB
  • Stores video titles, tags, and channel information
  • Supports full-text search for video discovery
  • Handles high-throughput read queries from the API
Memorystore Cache
  • Caches video metadata and user session data
  • Reduces database load for frequently accessed items
  • Supports sub-millisecond reads for API responses

Scaling Strategy

Video uploads are chunked and stored in Cloud Storage before transcoding. Pub/Sub distributes transcoding jobs across GKE workers that auto-scale based on subscription backlog. Each video produces multiple resolution variants stored in Cloud Storage. Viewing traffic scales through Cloud CDN edge caching with adaptive bitrate manifests. Metadata queries hit Memorystore first, falling back to Cloud SQL read replicas.

Related Architectures