Vector Index Tuning Guide¶
Version: v7.0 Status: Production Ready
Overview¶
Proper index configuration is critical for vector search performance. This guide covers HNSW and IVF index tuning, memory optimization, and production best practices.
HNSW Index Configuration¶
HNSW (Hierarchical Navigable Small World) is the recommended index type for most workloads, offering high recall with low latency.
Core Parameters¶
M (Max Connections)¶
Controls the number of bidirectional links each node maintains in the graph.
-- Default: M = 16
CREATE INDEX idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16);
| M Value | Memory | Build Time | Query Speed | Recall |
|---|---|---|---|---|
| 8 | Low | Fast | Fast | ~90% |
| 16 | Medium | Medium | Medium | ~95% |
| 32 | High | Slow | Medium | ~97% |
| 48 | Very High | Very Slow | Slower | ~98% |
Recommendations: - General purpose: M = 16 - High recall required: M = 32 - Memory constrained: M = 8 - Maximum accuracy: M = 48
ef_construction (Build Quality)¶
Controls search width during index construction. Higher values create better graph connectivity.
-- Default: ef_construction = 200
CREATE INDEX idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (ef_construction = 200);
| ef_construction | Build Time | Index Quality | Recall Impact |
|---|---|---|---|
| 50 | Very Fast | Low | -5-10% recall |
| 100 | Fast | Medium | -2-5% recall |
| 200 | Medium | Good | Baseline |
| 400 | Slow | High | +1-2% recall |
| 800 | Very Slow | Maximum | +2-3% recall |
Recommendations: - Development/testing: ef_construction = 100 - Production (balanced): ef_construction = 200 - Production (high quality): ef_construction = 400 - Critical applications: ef_construction = 600+
ef_search (Query Quality)¶
Controls search width at query time. This is a runtime parameter, not an index setting.
-- Set for current session
SET hnsw.ef_search = 100;
-- Set globally
ALTER SYSTEM SET hnsw.ef_search = 100;
SELECT pg_reload_conf();
| ef_search | Query Time | Recall@10 | Use Case |
|---|---|---|---|
| 20 | ~1ms | ~85% | Ultra-low latency |
| 40 | ~2ms | ~92% | Default |
| 100 | ~5ms | ~96% | Balanced |
| 200 | ~10ms | ~98% | High accuracy |
| 500 | ~25ms | ~99% | Maximum recall |
Recommendations: - Real-time search: ef_search = 40-50 - Balanced workload: ef_search = 100 - High accuracy: ef_search = 200 - Critical queries: ef_search = 300+
Combined Configuration Examples¶
-- Fast queries, acceptable recall (production web search)
CREATE INDEX idx_fast ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
-- Use with: SET hnsw.ef_search = 50;
-- Balanced (recommended for most use cases)
CREATE INDEX idx_balanced ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 400);
-- Use with: SET hnsw.ef_search = 100;
-- High accuracy (recommendation systems, RAG)
CREATE INDEX idx_accurate ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 400);
-- Use with: SET hnsw.ef_search = 200;
-- Maximum recall (medical, legal, compliance)
CREATE INDEX idx_max_recall ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 48, ef_construction = 600);
-- Use with: SET hnsw.ef_search = 400;
IVF Index Configuration¶
IVF (Inverted File) index is recommended for memory-constrained environments or billion-scale datasets.
Core Parameters¶
lists (Number of Clusters)¶
Controls the number of partitions (clusters) in the index.
-- Default: lists = sqrt(N) where N is row count
CREATE INDEX idx ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 1000);
| Dataset Size | Recommended lists | Memory Overhead |
|---|---|---|
| <100K | 100-500 | Minimal |
| 100K-1M | 500-2000 | Low |
| 1M-10M | 2000-8000 | Medium |
| 10M-100M | 8000-32000 | Higher |
| >100M | 32000-65536 | High |
Formula: lists ≈ sqrt(N) is a good starting point.
nprobe (Search Probes)¶
Controls how many clusters to search at query time.
| nprobe (% of lists) | Query Time | Recall | Use Case |
|---|---|---|---|
| 1% | Very Fast | ~80% | Ultra-low latency |
| 5% | Fast | ~90% | Default |
| 10% | Medium | ~95% | Balanced |
| 20% | Slow | ~98% | High accuracy |
IVF Configuration Examples¶
-- Memory efficient (large datasets)
CREATE INDEX idx_ivf ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 4096);
-- Use with: SET ivf.nprobe = 50;
-- Fast search, lower accuracy
CREATE INDEX idx_ivf_fast ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 1000);
-- Use with: SET ivf.nprobe = 10;
Product Quantization (PQ)¶
For extreme memory reduction (8-16x), use IVF with product quantization:
-- IVF with PQ (memory efficient)
CREATE INDEX idx_ivf_pq ON documents
USING ivfpq (embedding vector_l2_ops)
WITH (
lists = 4096,
pq_segments = 32, -- Divide vector into 32 sub-vectors
pq_bits = 8 -- 8 bits per sub-vector (256 centroids)
);
| Configuration | Memory/Vector | Recall Impact |
|---|---|---|
| IVF-Flat | ~4 bytes/dim | Baseline |
| IVF-SQ8 | ~1 byte/dim | -2-5% |
| IVF-PQ (m=32) | ~32 bytes total | -5-10% |
Memory vs Accuracy Tradeoffs¶
Memory Usage Formula¶
HNSW:
Memory ≈ N × (D × 4 + M × 8) bytes
Where:
N = number of vectors
D = vector dimension
M = connections per node
Examples (HNSW): | Vectors | Dimensions | M | Memory | |---------|------------|---|--------| | 100K | 384 | 16 | ~160 MB | | 1M | 768 | 16 | ~3.2 GB | | 10M | 768 | 16 | ~32 GB | | 10M | 768 | 32 | ~44 GB |
IVF:
Memory ≈ N × (D × 4) + lists × (D × 4) bytes [IVF-Flat]
Memory ≈ N × pq_segments + lists × (D × 4) bytes [IVF-PQ]
Choosing the Right Index¶
Do you have < 1 million vectors?
├── YES → Use HNSW (better recall, simpler tuning)
└── NO
├── Do you have memory constraints?
│ ├── YES → Use IVF-PQ (lowest memory, ~90% recall)
│ └── NO → Use HNSW with M=16 or IVF-Flat
└── Do you need > 95% recall?
├── YES → HNSW with M=32, ef=200+
└── NO → IVF with sufficient nprobe
Index Building Parameters¶
Build Time Optimization¶
-- Parallel index building (when supported)
SET max_parallel_maintenance_workers = 8;
SET maintenance_work_mem = '4GB';
-- Then create index
CREATE INDEX CONCURRENTLY idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
Incremental Building¶
For large datasets, consider batch loading:
-- 1. Create index on empty table
CREATE TABLE documents (
id SERIAL,
embedding VECTOR(768)
);
CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops);
-- 2. Batch insert data (index updates incrementally)
INSERT INTO documents (embedding)
SELECT embedding FROM source_table
LIMIT 100000 OFFSET 0;
-- Repeat for remaining batches...
Monitoring Build Progress¶
-- Check index build progress
SELECT
phase,
tuples_done,
tuples_total,
ROUND(100.0 * tuples_done / NULLIF(tuples_total, 0), 2) AS pct_complete
FROM pg_stat_progress_create_index
WHERE datname = current_database();
Rebuilding Indexes¶
When to Rebuild¶
Rebuild your vector index when: - Recall drops below acceptable threshold - Query latency increases significantly - After large bulk deletes (>20% of data) - After significant data distribution changes - When upgrading index parameters
Rebuild Commands¶
-- Offline rebuild (fastest, but blocks queries)
REINDEX INDEX idx_documents_embedding;
-- Online rebuild (no blocking)
REINDEX INDEX CONCURRENTLY idx_documents_embedding;
-- Rebuild with new parameters
DROP INDEX idx_documents_embedding;
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 400); -- New parameters
Scheduled Rebuilds¶
For production systems, schedule periodic rebuilds:
-- Example: Weekly rebuild during maintenance window
-- Run via cron or scheduled job
-- 1. Check if rebuild needed
SELECT
pg_relation_size('idx_documents_embedding') AS current_size,
pg_table_size('documents') * 0.15 AS expected_overhead
FROM pg_class
WHERE relname = 'documents';
-- 2. Rebuild if fragmented
REINDEX INDEX CONCURRENTLY idx_documents_embedding;
-- 3. Analyze table for query optimizer
ANALYZE documents;
Performance Monitoring¶
Query Performance Analysis¶
-- Enable timing
\timing on
-- Check query plan and execution time
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::VECTOR
LIMIT 10;
-- Look for:
-- - "Index Scan using idx_documents_embedding"
-- - Execution Time < 10ms for HNSW
-- - Rows returned matches LIMIT
Index Statistics¶
-- Index size and row estimates
SELECT
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan AS scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
AND indexrelname LIKE '%embedding%';
-- Vector-specific stats
SELECT * FROM hnsw_index_stats('idx_documents_embedding');
Recall Measurement¶
-- Create test set with known ground truth
-- 1. Get exact results (brute force)
WITH exact AS (
SELECT id, embedding <=> query_vector AS distance
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 100
)
-- 2. Get approximate results (using index)
SELECT
COUNT(*) FILTER (WHERE a.id IN (SELECT id FROM exact LIMIT 10)) * 10 AS recall_at_10
FROM (
SELECT id
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10
) a;
Production Configuration Profiles¶
Web Search (Low Latency)¶
-- Index configuration
CREATE INDEX idx_web ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
-- Runtime settings
SET hnsw.ef_search = 40;
-- Expected: <5ms p99, ~92% recall
E-Commerce (Balanced)¶
-- Index configuration
CREATE INDEX idx_ecom ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 300);
-- Runtime settings
SET hnsw.ef_search = 100;
-- Expected: <10ms p99, ~96% recall
RAG / Knowledge Base (High Accuracy)¶
-- Index configuration
CREATE INDEX idx_rag ON knowledge_base
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 400);
-- Runtime settings
SET hnsw.ef_search = 200;
-- Expected: <15ms p99, ~98% recall
Legal / Medical (Maximum Recall)¶
-- Index configuration
CREATE INDEX idx_compliance ON legal_docs
USING hnsw (embedding vector_cosine_ops)
WITH (m = 48, ef_construction = 600);
-- Runtime settings
SET hnsw.ef_search = 400;
-- Expected: <30ms p99, ~99% recall
Billion-Scale (Memory Optimized)¶
-- Index configuration (IVF-PQ)
CREATE INDEX idx_billion ON large_dataset
USING ivfpq (embedding vector_l2_ops)
WITH (lists = 32768, pq_segments = 32);
-- Runtime settings
SET ivf.nprobe = 64;
-- Expected: <50ms p99, ~90% recall, 10x less memory
Troubleshooting¶
Problem: Low Recall¶
Symptoms: Missing relevant results
Solutions:
1. Increase ef_search (runtime):
ef_construction:
3. Increase M (requires rebuild):
4. Check if vectors are normalized (for cosine similarity)
Problem: Slow Queries¶
Symptoms: High latency (>50ms)
Solutions: 1. Verify index is being used:
2. Reduceef_search:
3. Check memory pressure:
4. Consider IVF for very large datasets
Problem: High Memory Usage¶
Symptoms: OOM errors, swap usage
Solutions:
1. Reduce M parameter:
Problem: Index Build Too Slow¶
Symptoms: Index creation takes hours
Solutions:
1. Reduce ef_construction:
Best Practices Summary¶
- Start with defaults: M=16, ef_construction=200, ef_search=50
- Measure recall: Verify on representative queries before production
- Tune ef_search first: Cheapest change for accuracy vs speed
- Monitor regularly: Track latency percentiles and recall
- Rebuild periodically: After significant data changes
- Match distance metric: Cosine for text, L2 for images
- Consider hybrid indexes: Different settings for different access patterns
Status: Production Ready Version: v7.0 Last Updated: January 2026