Vector Index Tuning Guide¶

Version: v7.0 Status: Production Ready

Overview¶

Proper index configuration is critical for vector search performance. This guide covers HNSW and IVF index tuning, memory optimization, and production best practices.

HNSW Index Configuration¶

HNSW (Hierarchical Navigable Small World) is the recommended index type for most workloads, offering high recall with low latency.

Core Parameters¶

M (Max Connections)¶

Controls the number of bidirectional links each node maintains in the graph.

-- Default: M = 16
CREATE INDEX idx ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16);

M Value	Memory	Build Time	Query Speed	Recall
8	Low	Fast	Fast	~90%
16	Medium	Medium	Medium	~95%
32	High	Slow	Medium	~97%
48	Very High	Very Slow	Slower	~98%

Recommendations: - General purpose: M = 16 - High recall required: M = 32 - Memory constrained: M = 8 - Maximum accuracy: M = 48

ef_construction (Build Quality)¶

Controls search width during index construction. Higher values create better graph connectivity.

-- Default: ef_construction = 200
CREATE INDEX idx ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (ef_construction = 200);

ef_construction	Build Time	Index Quality	Recall Impact
50	Very Fast	Low	-5-10% recall
100	Fast	Medium	-2-5% recall
200	Medium	Good	Baseline
400	Slow	High	+1-2% recall
800	Very Slow	Maximum	+2-3% recall

Recommendations: - Development/testing: ef_construction = 100 - Production (balanced): ef_construction = 200 - Production (high quality): ef_construction = 400 - Critical applications: ef_construction = 600+

ef_search (Query Quality)¶

Controls search width at query time. This is a runtime parameter, not an index setting.

-- Set for current session
SET hnsw.ef_search = 100;

-- Set globally
ALTER SYSTEM SET hnsw.ef_search = 100;
SELECT pg_reload_conf();

ef_search	Query Time	Recall@10	Use Case
20	~1ms	~85%	Ultra-low latency
40	~2ms	~92%	Default
100	~5ms	~96%	Balanced
200	~10ms	~98%	High accuracy
500	~25ms	~99%	Maximum recall

Recommendations: - Real-time search: ef_search = 40-50 - Balanced workload: ef_search = 100 - High accuracy: ef_search = 200 - Critical queries: ef_search = 300+

Combined Configuration Examples¶

-- Fast queries, acceptable recall (production web search)
CREATE INDEX idx_fast ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);
-- Use with: SET hnsw.ef_search = 50;

-- Balanced (recommended for most use cases)
CREATE INDEX idx_balanced ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 400);
-- Use with: SET hnsw.ef_search = 100;

-- High accuracy (recommendation systems, RAG)
CREATE INDEX idx_accurate ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);
-- Use with: SET hnsw.ef_search = 200;

-- Maximum recall (medical, legal, compliance)
CREATE INDEX idx_max_recall ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 48, ef_construction = 600);
-- Use with: SET hnsw.ef_search = 400;

IVF Index Configuration¶

IVF (Inverted File) index is recommended for memory-constrained environments or billion-scale datasets.

Core Parameters¶

lists (Number of Clusters)¶

Controls the number of partitions (clusters) in the index.

-- Default: lists = sqrt(N) where N is row count
CREATE INDEX idx ON documents
  USING ivfflat (embedding vector_l2_ops)
  WITH (lists = 1000);

Dataset Size	Recommended lists	Memory Overhead
<100K	100-500	Minimal
100K-1M	500-2000	Low
1M-10M	2000-8000	Medium
10M-100M	8000-32000	Higher
>100M	32000-65536	High

Formula: lists ≈ sqrt(N) is a good starting point.

nprobe (Search Probes)¶

Controls how many clusters to search at query time.

-- Set for current session
SET ivf.nprobe = 20;

nprobe (% of lists)	Query Time	Recall	Use Case
1%	Very Fast	~80%	Ultra-low latency
5%	Fast	~90%	Default
10%	Medium	~95%	Balanced
20%	Slow	~98%	High accuracy

IVF Configuration Examples¶

-- Memory efficient (large datasets)
CREATE INDEX idx_ivf ON documents
  USING ivfflat (embedding vector_l2_ops)
  WITH (lists = 4096);
-- Use with: SET ivf.nprobe = 50;

-- Fast search, lower accuracy
CREATE INDEX idx_ivf_fast ON documents
  USING ivfflat (embedding vector_l2_ops)
  WITH (lists = 1000);
-- Use with: SET ivf.nprobe = 10;

Product Quantization (PQ)¶

For extreme memory reduction (8-16x), use IVF with product quantization:

-- IVF with PQ (memory efficient)
CREATE INDEX idx_ivf_pq ON documents
  USING ivfpq (embedding vector_l2_ops)
  WITH (
    lists = 4096,
    pq_segments = 32,      -- Divide vector into 32 sub-vectors
    pq_bits = 8            -- 8 bits per sub-vector (256 centroids)
  );

Configuration	Memory/Vector	Recall Impact
IVF-Flat	~4 bytes/dim	Baseline
IVF-SQ8	~1 byte/dim	-2-5%
IVF-PQ (m=32)	~32 bytes total	-5-10%

Memory vs Accuracy Tradeoffs¶

Memory Usage Formula¶

HNSW:

Memory ≈ N × (D × 4 + M × 8) bytes

Where:
  N = number of vectors
  D = vector dimension
  M = connections per node

Examples (HNSW): | Vectors | Dimensions | M | Memory | |---------|------------|---|--------| | 100K | 384 | 16 | ~160 MB | | 1M | 768 | 16 | ~3.2 GB | | 10M | 768 | 16 | ~32 GB | | 10M | 768 | 32 | ~44 GB |

IVF:

Memory ≈ N × (D × 4) + lists × (D × 4) bytes  [IVF-Flat]
Memory ≈ N × pq_segments + lists × (D × 4) bytes  [IVF-PQ]

Choosing the Right Index¶

Do you have < 1 million vectors?
├── YES → Use HNSW (better recall, simpler tuning)
└── NO
    ├── Do you have memory constraints?
    │   ├── YES → Use IVF-PQ (lowest memory, ~90% recall)
    │   └── NO → Use HNSW with M=16 or IVF-Flat
    └── Do you need > 95% recall?
        ├── YES → HNSW with M=32, ef=200+
        └── NO → IVF with sufficient nprobe

Index Building Parameters¶

Build Time Optimization¶

-- Parallel index building (when supported)
SET max_parallel_maintenance_workers = 8;
SET maintenance_work_mem = '4GB';

-- Then create index
CREATE INDEX CONCURRENTLY idx ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

Incremental Building¶

For large datasets, consider batch loading:

-- 1. Create index on empty table
CREATE TABLE documents (
    id SERIAL,
    embedding VECTOR(768)
);

CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops);

-- 2. Batch insert data (index updates incrementally)
INSERT INTO documents (embedding)
SELECT embedding FROM source_table
LIMIT 100000 OFFSET 0;

-- Repeat for remaining batches...

Monitoring Build Progress¶

-- Check index build progress
SELECT
    phase,
    tuples_done,
    tuples_total,
    ROUND(100.0 * tuples_done / NULLIF(tuples_total, 0), 2) AS pct_complete
FROM pg_stat_progress_create_index
WHERE datname = current_database();

Rebuilding Indexes¶

When to Rebuild¶

Rebuild your vector index when: - Recall drops below acceptable threshold - Query latency increases significantly - After large bulk deletes (>20% of data) - After significant data distribution changes - When upgrading index parameters

Rebuild Commands¶

-- Offline rebuild (fastest, but blocks queries)
REINDEX INDEX idx_documents_embedding;

-- Online rebuild (no blocking)
REINDEX INDEX CONCURRENTLY idx_documents_embedding;

-- Rebuild with new parameters
DROP INDEX idx_documents_embedding;
CREATE INDEX idx_documents_embedding ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);  -- New parameters

Scheduled Rebuilds¶

For production systems, schedule periodic rebuilds:

-- Example: Weekly rebuild during maintenance window
-- Run via cron or scheduled job

-- 1. Check if rebuild needed
SELECT
    pg_relation_size('idx_documents_embedding') AS current_size,
    pg_table_size('documents') * 0.15 AS expected_overhead
FROM pg_class
WHERE relname = 'documents';

-- 2. Rebuild if fragmented
REINDEX INDEX CONCURRENTLY idx_documents_embedding;

-- 3. Analyze table for query optimizer
ANALYZE documents;

Performance Monitoring¶

Query Performance Analysis¶

-- Enable timing
\timing on

-- Check query plan and execution time
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::VECTOR
LIMIT 10;

-- Look for:
-- - "Index Scan using idx_documents_embedding"
-- - Execution Time < 10ms for HNSW
-- - Rows returned matches LIMIT

Index Statistics¶

-- Index size and row estimates
SELECT
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    idx_scan AS scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
  AND indexrelname LIKE '%embedding%';

-- Vector-specific stats
SELECT * FROM hnsw_index_stats('idx_documents_embedding');

Recall Measurement¶

-- Create test set with known ground truth
-- 1. Get exact results (brute force)
WITH exact AS (
    SELECT id, embedding <=> query_vector AS distance
    FROM documents
    ORDER BY embedding <=> query_vector
    LIMIT 100
)
-- 2. Get approximate results (using index)
SELECT
    COUNT(*) FILTER (WHERE a.id IN (SELECT id FROM exact LIMIT 10)) * 10 AS recall_at_10
FROM (
    SELECT id
    FROM documents
    ORDER BY embedding <=> query_vector
    LIMIT 10
) a;

Production Configuration Profiles¶

Web Search (Low Latency)¶

-- Index configuration
CREATE INDEX idx_web ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

-- Runtime settings
SET hnsw.ef_search = 40;

-- Expected: <5ms p99, ~92% recall

E-Commerce (Balanced)¶

-- Index configuration
CREATE INDEX idx_ecom ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 24, ef_construction = 300);

-- Runtime settings
SET hnsw.ef_search = 100;

-- Expected: <10ms p99, ~96% recall

RAG / Knowledge Base (High Accuracy)¶

-- Index configuration
CREATE INDEX idx_rag ON knowledge_base
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);

-- Runtime settings
SET hnsw.ef_search = 200;

-- Expected: <15ms p99, ~98% recall

Legal / Medical (Maximum Recall)¶

-- Index configuration
CREATE INDEX idx_compliance ON legal_docs
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 48, ef_construction = 600);

-- Runtime settings
SET hnsw.ef_search = 400;

-- Expected: <30ms p99, ~99% recall

Billion-Scale (Memory Optimized)¶

-- Index configuration (IVF-PQ)
CREATE INDEX idx_billion ON large_dataset
  USING ivfpq (embedding vector_l2_ops)
  WITH (lists = 32768, pq_segments = 32);

-- Runtime settings
SET ivf.nprobe = 64;

-- Expected: <50ms p99, ~90% recall, 10x less memory

Troubleshooting¶

Problem: Low Recall¶

Symptoms: Missing relevant results

Solutions: 1. Increase ef_search (runtime):

SET hnsw.ef_search = 200;

2. Rebuild index with higher ef_construction:

DROP INDEX idx; CREATE INDEX idx ... WITH (ef_construction = 400);

3. Increase M (requires rebuild):

DROP INDEX idx; CREATE INDEX idx ... WITH (m = 32);

4. Check if vectors are normalized (for cosine similarity)

Problem: Slow Queries¶

Symptoms: High latency (>50ms)

Solutions: 1. Verify index is being used:

EXPLAIN SELECT ... ORDER BY embedding <=> ...;

2. Reduce ef_search:

SET hnsw.ef_search = 50;

3. Check memory pressure:

SELECT * FROM pg_stat_bgwriter;

4. Consider IVF for very large datasets

Problem: High Memory Usage¶

Symptoms: OOM errors, swap usage

Solutions: 1. Reduce M parameter:

DROP INDEX idx; CREATE INDEX idx ... WITH (m = 8);

2. Switch to IVF-PQ:

CREATE INDEX idx USING ivfpq ... WITH (pq_segments = 32);

3. Shard across multiple nodes

Problem: Index Build Too Slow¶

Symptoms: Index creation takes hours

Solutions: 1. Reduce ef_construction:

CREATE INDEX idx ... WITH (ef_construction = 100);

2. Increase parallelism:

SET max_parallel_maintenance_workers = 8;

3. Build on subset first, then extend

Best Practices Summary¶

Start with defaults: M=16, ef_construction=200, ef_search=50
Measure recall: Verify on representative queries before production
Tune ef_search first: Cheapest change for accuracy vs speed
Monitor regularly: Track latency percentiles and recall
Rebuild periodically: After significant data changes
Match distance metric: Cosine for text, L2 for images
Consider hybrid indexes: Different settings for different access patterns

Status: Production Ready Version: v7.0 Last Updated: January 2026