oci
AI Infrastructure
intermediate
Production ML model management

Model Serving Platform

AI Infrastructure

A model serving platform hosts multiple ML models behind a unified API, supporting canary deployments (route 5% of traffic to the new model), A/B testing for model comparison, and automatic rollback if error rates spike. This OCI-native design uses OKE for independent model deployments. The OCI Cache feature store ensures consistent feature computation between training and serving, eliminating the common train/serve skew problem.

Data Flow

Model API
Traffic Router
Model Metrics
Model A (Stable)
Model B (Canary)
Feature Store
Model Registry
Prediction Log

Share this architecture with your network

Service Breakdown (8 services)

Other8 services
Model API
  • Routes API traffic and enforces policies
  • Manages authentication and rate limiting
  • Provides a unified API endpoint
Traffic Router
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads
Model A (Stable)
  • Orchestrates containerized workloads at scale
  • Auto-scales pods and underlying nodes
  • Supports rolling updates and rollbacks
Model B (Canary)
  • Orchestrates containerized workloads at scale
  • Auto-scales pods and underlying nodes
  • Supports rolling updates and rollbacks
Feature Store
  • Caches frequently accessed data in-memory
  • Reduces database round-trips and latency
  • Supports TTL-based expiration policies
Model Registry
  • Stores unstructured data with high durability
  • Supports lifecycle rules for cost management
  • Serves as a data lake foundation
Prediction Log
  • Handles flexible schema data at scale
  • Provides low-latency reads and writes
  • Scales horizontally with partitioning
Model Metrics
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads

Scaling Strategy

Each model version runs in its own OKE deployment, enabling independent scaling based on model-specific traffic. OCI Functions handles traffic splitting logic for canary and A/B testing. OCI Cache stores the feature store for low-latency feature lookups during inference. Object Storage stores model artifacts, and Functions handles model loading into OKE pods. OCI Monitoring triggers automatic rollback when error rate thresholds are exceeded.

Related Architectures