oci
AI Infrastructure
intermediate
Enterprise AI knowledge assistant

RAG AI Knowledge Base

OpenAI Pattern

Retrieval-Augmented Generation (RAG) combines the power of large language models with your own data. This OCI-native architecture ingests documents into Autonomous Database's built-in vector store, generates embeddings at query time via OCI Generative AI, retrieves the most relevant context via similarity search, and feeds it to an LLM for grounded, hallucination-reduced responses. Perfect for teams building enterprise AI assistants that need accurate, citation-backed answers from proprietary knowledge bases.

Data Flow

RAG API
Ingestion Queue
Query Orchestrator
Embedding Generator
Document Store
Vector Search
Generative AI Service
Conversation History

Share this architecture with your network

Service Breakdown (8 services)

Other8 services
RAG API
  • Routes API traffic and enforces policies
  • Manages authentication and rate limiting
  • Provides a unified API endpoint
Query Orchestrator
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads
Embedding Generator
  • Runs event-driven code without servers
  • Scales instantly from zero to peak load
  • Cost-effective for sporadic workloads
Vector Search
  • Self-tuning database with automatic scaling
  • Handles patching and backups autonomously
  • Optimizes queries with ML-driven indexing
Document Store
  • Stores unstructured data with high durability
  • Supports lifecycle rules for cost management
  • Serves as a data lake foundation
Generative AI Service
  • Invokes foundation models for text generation
  • Supports prompt engineering and response streaming
  • Manages model selection and fallback strategies
Conversation History
  • Handles flexible schema data at scale
  • Provides low-latency reads and writes
  • Scales horizontally with partitioning
Ingestion Queue
  • Buffers incoming documents for async processing
  • Ensures no data loss during high-volume ingestion
  • Prioritizes items by source and urgency

Scaling Strategy

The ingestion pipeline scales independently from the query path. OCI Queue Service buffers document uploads for batch embedding generation, while Functions handles bursty query traffic with automatic scaling. Autonomous Database provides built-in vector search that scales with OCPU auto-scaling, and NoSQL Database stores conversation history with on-demand capacity.

Related Architectures