Retrieval-Augmented Generation - AI technique that enhances language models by retrieving relevant information from a knowledge base before generating responses. Like giving an AI access to a reference library before answering questions.
A customer service chatbot uses RAG to search product documentation before answering questions, ensuring accurate and up-to-date responses.
RAG is an architecture/pattern rather than a single cloud service. All major clouds support it by combining (1) an LLM endpoint, (2) a vector database or search index for retrieval, (3) document ingestion pipelines, and (4) orchestration/guardrails. Equivalent building blocks exist across providers, but there is no one-to-one “RAG service” name that is universally applicable.
Explore real-world architectures from our community that use RAG: