gcp
System Design
intermediate
Real-time search suggestions

Search Autocomplete System

System Design Classic

Search autocomplete suggests query completions as users type, requiring sub-100ms response times and frequency-based ranking. This GCP-native design uses a trie (prefix tree) stored in Memorystore (Redis) for fast prefix lookups, with Firestore tracking query frequencies. Perfect for search teams needing sub-100ms typeahead suggestions with frequency-based ranking and user-specific personalization.

Data Flow

Autocomplete API
Search Log Stream
Suggestion Lookup
Frequency Aggregator
Memorystore (Redis)
Query Frequencies
Full-Text Fallback

Share this architecture with your network

Service Breakdown (7 services)

Other7 services
Autocomplete API
  • Routes API traffic and enforces policies
  • Manages authentication and rate limiting
  • Provides a unified API endpoint
Suggestion Lookup
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads
Memorystore (Redis)
  • Caches data in-memory with sub-millisecond latency
  • Supports Redis protocol for broad compatibility
  • Scales vertically without downtime
Query Frequencies
  • Tracks how often each query prefix is searched
  • Updates frequency counts from aggregated logs
  • Drives popularity-based suggestion ranking
Search Log Stream
  • Delivers messages between decoupled services reliably
  • Supports millions of messages per second
  • Guarantees at-least-once delivery to all subscribers
Frequency Aggregator
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads
Full-Text Fallback
  • Runs stateless containers with auto-scaling to zero
  • Handles HTTPS requests with managed SSL
  • Scales instantly from zero to thousands of instances

Scaling Strategy

The trie is partitioned by prefix ranges across Memorystore shards for distributed lookups. Each keystroke triggers a Cloud Function that retrieves suggestions from the appropriate shard. Query frequency updates flow through Pub/Sub and are aggregated by Cloud Functions into Firestore, then periodically synced back to the trie. A Cloud Run service handles full-text fallback for long queries.

Related Architectures