1. Overview
A vector database stores high-dimensional numerical representations of data and retrieves records by similarity. In AI applications, text, images, audio, code, and structured entities are converted into embeddings, indexed, and queried with approximate nearest-neighbor search.
2. Core Concepts
| Concept | Meaning | Why It Matters |
|---|---|---|
| Embedding | A dense vector produced by an embedding model. | Captures semantic similarity in numerical space. |
| Dimension | Number of values in each vector. | Impacts memory, index size, latency, and compatibility. |
| Distance metric | Cosine, dot product, or Euclidean distance. | Must match model training assumptions. |
| ANN index | Approximate nearest-neighbor structure. | Trades exactness for speed at scale. |
| Recall | How often true nearest neighbors are returned. | Low recall causes missed context in RAG. |
| Metadata filter | Structured constraints like tenant, document type, date, region. | Prevents irrelevant or unauthorized retrieval. |
Distance metrics
- Cosine similarity: compares vector angle; common for normalized text embeddings.
- Dot product: captures angle and magnitude; common when model output is trained for it.
- Euclidean distance: compares geometric distance; useful for some image and clustering workloads.
3. Data Pipeline
Vector search quality depends more on the ingestion pipeline than the database alone. Poor parsing, bad chunking, missing metadata, or embedding-model drift will produce weak retrieval no matter which vector store is used.
{
"id": "doc_123#chunk_004",
"vector": [0.012, -0.044, 0.287],
"metadata": {
"tenant": "acme",
"source": "policy_manual",
"document_id": "doc_123",
"chunk_index": 4,
"embedding_model": "text-embedding-v3",
"created_at": "2026-04-27"
}
}
4. Index Types
Vector databases use ANN structures to avoid scanning every vector. The right index depends on corpus size, update frequency, memory budget, latency target, and recall requirements.
| Index | Strength | Weakness | Good Fit |
|---|---|---|---|
| HNSW | High recall, fast queries | High memory use | Interactive RAG and semantic search |
| IVF | Scales well to large datasets | Needs training and tuning | Large catalogs and analytics search |
| PQ | Reduces memory dramatically | Can reduce recall | Very large collections and cost-sensitive serving |
| Flat | Exact nearest neighbors | Slow at scale | Small corpora and evaluation baselines |
5. Query Path
Important query controls
- top_k: number of candidates returned from vector search.
- score threshold: minimum similarity needed to include a result.
- filter mode: pre-filter before ANN or post-filter after retrieval.
- reranking: optional cross-encoder or LLM step to improve final ordering.
- hydration: fetch canonical text from source storage rather than trusting stale index payloads.
6. Hybrid Search
Production retrieval often combines vector search with lexical search. Dense vectors are good for semantic similarity; lexical search is strong for exact terms, IDs, error codes, part numbers, legal clauses, and rare nouns.
Dense Retrieval
Lexical Retrieval
final_score =
0.55 * normalized_vector_score
+ 0.30 * normalized_bm25_score
+ 0.15 * freshness_or_authority_boost
7. Schema Design
A vector record should carry enough metadata to enforce security, debug retrieval, and rebuild indexes. Keep raw source documents in durable storage and use vector records as searchable pointers.
| Field | Purpose | Example |
|---|---|---|
id | Stable unique chunk or entity ID | policy_42#chunk_09 |
vector | Dense embedding | 1536-dimensional float vector |
tenant_id | Isolation and access control | acme |
source_uri | Canonical document pointer | s3://docs/policy_42.pdf |
embedding_model | Model version tracking | embedding-v3-large |
acl | Permission boundary | finance-only |
8. Operations
Index lifecycle
- Build: ingest sources, parse, chunk, embed, and write vectors with metadata.
- Validate: run recall, latency, access-control, and golden-query tests.
- Promote: move a tested index or namespace into production traffic.
- Refresh: handle source updates with upsert, delete, tombstone, or full rebuild.
- Rollback: preserve previous index versions when embedding model or chunking changes.
Capacity planning
raw_vector_memory =
vector_count * dimensions * bytes_per_dimension
example =
100,000,000 vectors * 1536 dims * 4 bytes
= ~614 GB before index overhead and metadata
Real deployments also pay for index graph structures, replicas, metadata, write amplification, cache memory, backups, and compaction overhead.
9. Security
- Tenant isolation: separate namespaces, collections, or partitions by tenant and sensitivity.
- Metadata ACLs: always filter by access scope before retrieval candidates reach the user.
- Hydration checks: revalidate permissions when fetching source text or documents.
- PII controls: classify source data before embedding; embeddings can still leak sensitive information.
- Deletion: support tombstones, compaction, and rebuilds for data retention and right-to-delete workflows.
- Audit logs: log query metadata, retrieved IDs, score ranges, and caller identity without exposing secrets.
10. Evaluation
Vector database evaluation should measure both search quality and end-to-end application quality. High ANN recall does not guarantee useful RAG answers if chunks are poor or reranking is weak.
| Metric | What It Measures | Why It Matters |
|---|---|---|
| Recall@k | Whether known relevant items appear in top-k | Core retrieval coverage |
| MRR | Rank of first relevant result | Ordering quality |
| NDCG | Quality of ranked results with graded relevance | Useful for mixed relevance labels |
| Latency p95 | Tail query latency | User experience and SLO planning |
| Filter correctness | Access and metadata filter behavior | Security and compliance |
| Answer faithfulness | Whether generated answers match retrieved context | End-to-end RAG quality |
11. Use Cases
12. Tradeoffs
| Decision | Option A | Option B | Tradeoff |
|---|---|---|---|
| Index | HNSW | IVF/PQ | Higher recall and memory vs larger scale and compression. |
| Filtering | Pre-filter | Post-filter | Security and precision vs possible recall loss. |
| Chunking | Small chunks | Large chunks | Precise retrieval vs richer context. |
| Embedding model | General purpose | Domain tuned | Broad coverage vs domain-specific quality. |
| Storage | Managed service | Self-hosted | Operational simplicity vs control and cost tuning. |
The best vector database architecture is workload-specific. Optimize for measured recall, latency, update frequency, security constraints, and operational cost rather than benchmark numbers alone.
13. Implementation Roadmap
Phase 1: Baseline retrieval
Build a small index with representative documents, a pinned embedding model, basic metadata, and golden queries.
Phase 2: Production schema
Add tenant fields, source pointers, ACL metadata, model versions, timestamps, and deletion markers.
Phase 3: Hybrid retrieval
Combine vector search with lexical search, reranking, and source hydration.
Phase 4: Evaluation loop
Track recall@k, answer faithfulness, latency, filter correctness, and user feedback.
Phase 5: Scale and governance
Introduce sharding, replicas, backups, compaction, index rebuild automation, and access-control audits.