Vector Databases: Architecture and Production Guide

1. Overview

A vector database stores high-dimensional numerical representations of data and retrieves records by similarity. In AI applications, text, images, audio, code, and structured entities are converted into embeddings, indexed, and queried with approximate nearest-neighbor search.

What it storesVectors, source text references, metadata, tenant scope, and versioning information.

What it optimizesFast top-k similarity search across millions or billions of vectors.

Why it mattersIt powers semantic search, RAG, recommendations, deduplication, and memory retrieval.

Mental model: a vector database is an approximate semantic index, not a source of truth. Keep canonical content in durable storage and treat the vector DB as a retrieval layer.

2. Core Concepts

Concept	Meaning	Why It Matters
Embedding	A dense vector produced by an embedding model.	Captures semantic similarity in numerical space.
Dimension	Number of values in each vector.	Impacts memory, index size, latency, and compatibility.
Distance metric	Cosine, dot product, or Euclidean distance.	Must match model training assumptions.
ANN index	Approximate nearest-neighbor structure.	Trades exactness for speed at scale.
Recall	How often true nearest neighbors are returned.	Low recall causes missed context in RAG.
Metadata filter	Structured constraints like tenant, document type, date, region.	Prevents irrelevant or unauthorized retrieval.

Distance metrics

Cosine similarity: compares vector angle; common for normalized text embeddings.
Dot product: captures angle and magnitude; common when model output is trained for it.
Euclidean distance: compares geometric distance; useful for some image and clustering workloads.

3. Data Pipeline

Vector search quality depends more on the ingestion pipeline than the database alone. Poor parsing, bad chunking, missing metadata, or embedding-model drift will produce weak retrieval no matter which vector store is used.

Source DataDocs, tickets, code, images, tables, product records

ParseExtract clean text, structure, tables, and metadata

ChunkSplit by semantic boundaries with overlap where needed

EmbedGenerate vectors with a pinned model version

IndexBuild ANN index and metadata filters

ServeQuery, rerank, hydrate content, and monitor quality

{
  "id": "doc_123#chunk_004",
  "vector": [0.012, -0.044, 0.287],
  "metadata": {
    "tenant": "acme",
    "source": "policy_manual",
    "document_id": "doc_123",
    "chunk_index": 4,
    "embedding_model": "text-embedding-v3",
    "created_at": "2026-04-27"
  }
}

4. Index Types

Vector databases use ANN structures to avoid scanning every vector. The right index depends on corpus size, update frequency, memory budget, latency target, and recall requirements.

HNSWGraph-based index with strong recall and low latency; memory heavy but popular for online search.

IVFPartitions vector space into clusters; searches candidate clusters for scalable approximate search.

PQ / SQCompresses vectors to reduce memory and storage; trades precision for scale.

FlatExact search over all vectors; simple and accurate, but expensive at large scale.

Index	Strength	Weakness	Good Fit
HNSW	High recall, fast queries	High memory use	Interactive RAG and semantic search
IVF	Scales well to large datasets	Needs training and tuning	Large catalogs and analytics search
PQ	Reduces memory dramatically	Can reduce recall	Very large collections and cost-sensitive serving
Flat	Exact nearest neighbors	Slow at scale	Small corpora and evaluation baselines

5. Query Path

User QueryQuestion or search request

Embed QuerySame embedding model family as corpus

FilterTenant, ACL, domain, date, language

ANN SearchTop-k candidate vectors

HydrateFetch source chunks and pass to downstream app

Important query controls

top_k: number of candidates returned from vector search.
score threshold: minimum similarity needed to include a result.
filter mode: pre-filter before ANN or post-filter after retrieval.
reranking: optional cross-encoder or LLM step to improve final ordering.
hydration: fetch canonical text from source storage rather than trusting stale index payloads.

6. Hybrid Search

Production retrieval often combines vector search with lexical search. Dense vectors are good for semantic similarity; lexical search is strong for exact terms, IDs, error codes, part numbers, legal clauses, and rare nouns.

Dense Retrieval

Embed query

Vector ANN search

Semantic candidates

Lexical Retrieval

Normalize query

BM25 / keyword search

Exact-match candidates

final_score =
  0.55 * normalized_vector_score
  + 0.30 * normalized_bm25_score
  + 0.15 * freshness_or_authority_boost

7. Schema Design

A vector record should carry enough metadata to enforce security, debug retrieval, and rebuild indexes. Keep raw source documents in durable storage and use vector records as searchable pointers.

Field	Purpose	Example
`id`	Stable unique chunk or entity ID	`policy_42#chunk_09`
`vector`	Dense embedding	1536-dimensional float vector
`tenant_id`	Isolation and access control	`acme`
`source_uri`	Canonical document pointer	`s3://docs/policy_42.pdf`
`embedding_model`	Model version tracking	`embedding-v3-large`
`acl`	Permission boundary	`finance-only`

Design rule: never rely on the vector DB alone for authorization. Filter by metadata and recheck permissions before returning hydrated content.

8. Operations

Index lifecycle

Build: ingest sources, parse, chunk, embed, and write vectors with metadata.
Validate: run recall, latency, access-control, and golden-query tests.
Promote: move a tested index or namespace into production traffic.
Refresh: handle source updates with upsert, delete, tombstone, or full rebuild.
Rollback: preserve previous index versions when embedding model or chunking changes.

Capacity planning

raw_vector_memory =
  vector_count * dimensions * bytes_per_dimension

example =
  100,000,000 vectors * 1536 dims * 4 bytes
  = ~614 GB before index overhead and metadata

Real deployments also pay for index graph structures, replicas, metadata, write amplification, cache memory, backups, and compaction overhead.

9. Security

Tenant isolation: separate namespaces, collections, or partitions by tenant and sensitivity.
Metadata ACLs: always filter by access scope before retrieval candidates reach the user.
Hydration checks: revalidate permissions when fetching source text or documents.
PII controls: classify source data before embedding; embeddings can still leak sensitive information.
Deletion: support tombstones, compaction, and rebuilds for data retention and right-to-delete workflows.
Audit logs: log query metadata, retrieved IDs, score ranges, and caller identity without exposing secrets.

Security boundary: vector similarity is not authorization. A close neighbor can still be forbidden content.

10. Evaluation

Vector database evaluation should measure both search quality and end-to-end application quality. High ANN recall does not guarantee useful RAG answers if chunks are poor or reranking is weak.

Metric	What It Measures	Why It Matters
Recall@k	Whether known relevant items appear in top-k	Core retrieval coverage
MRR	Rank of first relevant result	Ordering quality
NDCG	Quality of ranked results with graded relevance	Useful for mixed relevance labels
Latency p95	Tail query latency	User experience and SLO planning
Filter correctness	Access and metadata filter behavior	Security and compliance
Answer faithfulness	Whether generated answers match retrieved context	End-to-end RAG quality

11. Use Cases

RAG searchRetrieve relevant context for LLM answers over enterprise documents.

Semantic searchFind documents by meaning instead of exact keyword matching.

RecommendationsMatch users, products, content, or events by embedding similarity.

DeduplicationDetect near-duplicate documents, tickets, or product records.

Agent memoryRetrieve long-term memories, task history, and tool traces.

Multimodal searchSearch images, audio, video, or text in a shared embedding space.

12. Tradeoffs

Decision	Option A	Option B	Tradeoff
Index	HNSW	IVF/PQ	Higher recall and memory vs larger scale and compression.
Filtering	Pre-filter	Post-filter	Security and precision vs possible recall loss.
Chunking	Small chunks	Large chunks	Precise retrieval vs richer context.
Embedding model	General purpose	Domain tuned	Broad coverage vs domain-specific quality.
Storage	Managed service	Self-hosted	Operational simplicity vs control and cost tuning.

The best vector database architecture is workload-specific. Optimize for measured recall, latency, update frequency, security constraints, and operational cost rather than benchmark numbers alone.

13. Implementation Roadmap

Phase 1: Baseline retrieval

Build a small index with representative documents, a pinned embedding model, basic metadata, and golden queries.

Phase 2: Production schema

Add tenant fields, source pointers, ACL metadata, model versions, timestamps, and deletion markers.

Phase 3: Hybrid retrieval

Combine vector search with lexical search, reranking, and source hydration.

Phase 4: Evaluation loop

Track recall@k, answer faithfulness, latency, filter correctness, and user feedback.

Phase 5: Scale and governance

Introduce sharding, replicas, backups, compaction, index rebuild automation, and access-control audits.