Vector Database
Definition
Database optimized for storing and searching high-dimensional vectors, essential for AI applications like semantic search and recommendations.
Use Cases
- Pinterest: Visual discovery and recommendations (finding visually similar Pins and improving content recommendations). — Pinterest has described using embedding-based similarity search and approximate nearest neighbor (ANN) techniques to retrieve similar items at scale, where images/content are represented as vectors and searched by similarity. (Improved relevance for recommendations and discovery by matching content based on similarity rather than exact keywords, supporting large-scale personalized retrieval.)
- Spotify: Music recommendations and similarity (finding tracks/artists with similar audio or listening-context characteristics). — Spotify is known for using machine learning representations (embeddings) for recommendation and retrieval problems; vector similarity search is a common approach to power nearest-neighbor lookups over these embeddings. (More relevant recommendations and better personalization by retrieving “nearest” items in embedding space rather than relying only on metadata or keywords.)
- Instacart: Semantic search for groceries (handling synonyms and intent like 'soda' vs 'soft drink' or 'gluten free pasta'). — E-commerce search commonly uses embeddings for queries and products, then performs vector similarity search to retrieve relevant items even when keywords differ; this pattern is widely used for semantic retrieval in retail catalogs. (Higher search relevance and conversion by returning items that match user intent, not just exact text matches.)
Provider Equivalents
- Azure: Azure AI Search
- GCP: Vertex AI Vector Search
- OCI: Oracle Database 23ai (AI Vector Search)
Frequently Asked Questions
- What's the difference between a vector database and a traditional relational database?
- A relational database is optimized for structured data and exact matches (e.g., WHERE customer_id = 123). A vector database is optimized for similarity search over embeddings (high-dimensional vectors), answering questions like “find items most similar to this text/image.” Many systems combine both: relational tables for transactions plus a vector index for semantic retrieval.
- When should I use a vector database?
- Use a vector database when you need similarity-based retrieval: semantic search, recommendations, deduplication, RAG (retrieval-augmented generation) for chatbots, image/audio similarity, or anomaly detection. If your queries are mostly exact lookups, joins, and aggregations on structured fields, a traditional database or search engine may be a better fit.
- How much does a vector database cost?
- Cost depends on (1) how many vectors you store, (2) vector dimension size, (3) index type and replication, (4) query volume/latency targets, and (5) whether it’s fully managed. You typically pay for compute (CPU/RAM), storage, and sometimes per-query or per-index operations. Embedding generation (calling an embedding model) is a separate cost from storing/searching vectors.
Category: data
Difficulty: advanced
Related Terms
See Also