oci
AI Infrastructure
intermediate
Offline ML scoring at scale

Batch Inference Pipeline

AI Infrastructure

Batch inference processes large datasets through ML models — scoring millions of customer records, generating embeddings for a document corpus, or running image classification on a media library. This OCI-native pipeline uses OCI Data Science for model management, OKE for distributed inference workers with GPU shapes, and OCI Queue Service for job orchestration with checkpointing and failure recovery. Ideal for data science teams running nightly scoring jobs, bulk classification, or periodic embedding generation across large datasets.

Data Flow

Input Dataset
Job Splitter
Partition Queue
Inference Workers
Checkpoint Store
Output Results
Completion Events

Share this architecture with your network

Service Breakdown (7 services)

Other7 services
Input Dataset
  • Stores unstructured data with high durability
  • Supports lifecycle rules for cost management
  • Serves as a data lake foundation
Job Splitter
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads
Partition Queue
  • Buffers messages for reliable async processing
  • Supports visibility timeouts and retry policies
  • Decouples producers from consumers effectively
Inference Workers
  • Orchestrates containerized workloads at scale
  • Auto-scales pods and underlying nodes
  • Supports rolling updates and rollbacks
Checkpoint Store
  • Handles flexible schema data at scale
  • Provides low-latency reads and writes
  • Scales horizontally with partitioning
Output Results
  • Stores unstructured data with high durability
  • Supports lifecycle rules for cost management
  • Serves as a data lake foundation
Completion Events
  • Ingests and processes real-time event streams
  • Supports multiple consumer groups
  • Buffers data for reliable downstream delivery

Scaling Strategy

Input data is partitioned in Object Storage and jobs are distributed via OCI Queue Service to OKE workers running on preemptible instances for cost savings. Each worker processes a partition independently with checkpoint writes to NoSQL Database every N records. If a preemptible instance is reclaimed, only the current partition restarts from the last checkpoint. Results aggregate back to Object Storage with OCI Streaming for completion events.

Related Architectures