Skip to content

Vector Index Tuning Guide

Version: v7.0 Status: Production Ready


Overview

Proper index configuration is critical for vector search performance. This guide covers HNSW and IVF index tuning, memory optimization, and production best practices.


HNSW Index Configuration

HNSW (Hierarchical Navigable Small World) is the recommended index type for most workloads, offering high recall with low latency.

Core Parameters

M (Max Connections)

Controls the number of bidirectional links each node maintains in the graph.

-- Default: M = 16
CREATE INDEX idx ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16);
M Value Memory Build Time Query Speed Recall
8 Low Fast Fast ~90%
16 Medium Medium Medium ~95%
32 High Slow Medium ~97%
48 Very High Very Slow Slower ~98%

Recommendations: - General purpose: M = 16 - High recall required: M = 32 - Memory constrained: M = 8 - Maximum accuracy: M = 48

ef_construction (Build Quality)

Controls search width during index construction. Higher values create better graph connectivity.

-- Default: ef_construction = 200
CREATE INDEX idx ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (ef_construction = 200);
ef_construction Build Time Index Quality Recall Impact
50 Very Fast Low -5-10% recall
100 Fast Medium -2-5% recall
200 Medium Good Baseline
400 Slow High +1-2% recall
800 Very Slow Maximum +2-3% recall

Recommendations: - Development/testing: ef_construction = 100 - Production (balanced): ef_construction = 200 - Production (high quality): ef_construction = 400 - Critical applications: ef_construction = 600+

ef_search (Query Quality)

Controls search width at query time. This is a runtime parameter, not an index setting.

-- Set for current session
SET hnsw.ef_search = 100;

-- Set globally
ALTER SYSTEM SET hnsw.ef_search = 100;
SELECT pg_reload_conf();
ef_search Query Time Recall@10 Use Case
20 ~1ms ~85% Ultra-low latency
40 ~2ms ~92% Default
100 ~5ms ~96% Balanced
200 ~10ms ~98% High accuracy
500 ~25ms ~99% Maximum recall

Recommendations: - Real-time search: ef_search = 40-50 - Balanced workload: ef_search = 100 - High accuracy: ef_search = 200 - Critical queries: ef_search = 300+

Combined Configuration Examples

-- Fast queries, acceptable recall (production web search)
CREATE INDEX idx_fast ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);
-- Use with: SET hnsw.ef_search = 50;

-- Balanced (recommended for most use cases)
CREATE INDEX idx_balanced ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 400);
-- Use with: SET hnsw.ef_search = 100;

-- High accuracy (recommendation systems, RAG)
CREATE INDEX idx_accurate ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);
-- Use with: SET hnsw.ef_search = 200;

-- Maximum recall (medical, legal, compliance)
CREATE INDEX idx_max_recall ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 48, ef_construction = 600);
-- Use with: SET hnsw.ef_search = 400;

IVF Index Configuration

IVF (Inverted File) index is recommended for memory-constrained environments or billion-scale datasets.

Core Parameters

lists (Number of Clusters)

Controls the number of partitions (clusters) in the index.

-- Default: lists = sqrt(N) where N is row count
CREATE INDEX idx ON documents
  USING ivfflat (embedding vector_l2_ops)
  WITH (lists = 1000);
Dataset Size Recommended lists Memory Overhead
<100K 100-500 Minimal
100K-1M 500-2000 Low
1M-10M 2000-8000 Medium
10M-100M 8000-32000 Higher
>100M 32000-65536 High

Formula: lists ≈ sqrt(N) is a good starting point.

nprobe (Search Probes)

Controls how many clusters to search at query time.

-- Set for current session
SET ivf.nprobe = 20;
nprobe (% of lists) Query Time Recall Use Case
1% Very Fast ~80% Ultra-low latency
5% Fast ~90% Default
10% Medium ~95% Balanced
20% Slow ~98% High accuracy

IVF Configuration Examples

-- Memory efficient (large datasets)
CREATE INDEX idx_ivf ON documents
  USING ivfflat (embedding vector_l2_ops)
  WITH (lists = 4096);
-- Use with: SET ivf.nprobe = 50;

-- Fast search, lower accuracy
CREATE INDEX idx_ivf_fast ON documents
  USING ivfflat (embedding vector_l2_ops)
  WITH (lists = 1000);
-- Use with: SET ivf.nprobe = 10;

Product Quantization (PQ)

For extreme memory reduction (8-16x), use IVF with product quantization:

-- IVF with PQ (memory efficient)
CREATE INDEX idx_ivf_pq ON documents
  USING ivfpq (embedding vector_l2_ops)
  WITH (
    lists = 4096,
    pq_segments = 32,      -- Divide vector into 32 sub-vectors
    pq_bits = 8            -- 8 bits per sub-vector (256 centroids)
  );
Configuration Memory/Vector Recall Impact
IVF-Flat ~4 bytes/dim Baseline
IVF-SQ8 ~1 byte/dim -2-5%
IVF-PQ (m=32) ~32 bytes total -5-10%

Memory vs Accuracy Tradeoffs

Memory Usage Formula

HNSW:

Memory ≈ N × (D × 4 + M × 8) bytes

Where:
  N = number of vectors
  D = vector dimension
  M = connections per node

Examples (HNSW): | Vectors | Dimensions | M | Memory | |---------|------------|---|--------| | 100K | 384 | 16 | ~160 MB | | 1M | 768 | 16 | ~3.2 GB | | 10M | 768 | 16 | ~32 GB | | 10M | 768 | 32 | ~44 GB |

IVF:

Memory ≈ N × (D × 4) + lists × (D × 4) bytes  [IVF-Flat]
Memory ≈ N × pq_segments + lists × (D × 4) bytes  [IVF-PQ]

Choosing the Right Index

Do you have < 1 million vectors?
├── YES → Use HNSW (better recall, simpler tuning)
└── NO
    ├── Do you have memory constraints?
    │   ├── YES → Use IVF-PQ (lowest memory, ~90% recall)
    │   └── NO → Use HNSW with M=16 or IVF-Flat
    └── Do you need > 95% recall?
        ├── YES → HNSW with M=32, ef=200+
        └── NO → IVF with sufficient nprobe

Index Building Parameters

Build Time Optimization

-- Parallel index building (when supported)
SET max_parallel_maintenance_workers = 8;
SET maintenance_work_mem = '4GB';

-- Then create index
CREATE INDEX CONCURRENTLY idx ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

Incremental Building

For large datasets, consider batch loading:

-- 1. Create index on empty table
CREATE TABLE documents (
    id SERIAL,
    embedding VECTOR(768)
);

CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops);

-- 2. Batch insert data (index updates incrementally)
INSERT INTO documents (embedding)
SELECT embedding FROM source_table
LIMIT 100000 OFFSET 0;

-- Repeat for remaining batches...

Monitoring Build Progress

-- Check index build progress
SELECT
    phase,
    tuples_done,
    tuples_total,
    ROUND(100.0 * tuples_done / NULLIF(tuples_total, 0), 2) AS pct_complete
FROM pg_stat_progress_create_index
WHERE datname = current_database();

Rebuilding Indexes

When to Rebuild

Rebuild your vector index when: - Recall drops below acceptable threshold - Query latency increases significantly - After large bulk deletes (>20% of data) - After significant data distribution changes - When upgrading index parameters

Rebuild Commands

-- Offline rebuild (fastest, but blocks queries)
REINDEX INDEX idx_documents_embedding;

-- Online rebuild (no blocking)
REINDEX INDEX CONCURRENTLY idx_documents_embedding;

-- Rebuild with new parameters
DROP INDEX idx_documents_embedding;
CREATE INDEX idx_documents_embedding ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);  -- New parameters

Scheduled Rebuilds

For production systems, schedule periodic rebuilds:

-- Example: Weekly rebuild during maintenance window
-- Run via cron or scheduled job

-- 1. Check if rebuild needed
SELECT
    pg_relation_size('idx_documents_embedding') AS current_size,
    pg_table_size('documents') * 0.15 AS expected_overhead
FROM pg_class
WHERE relname = 'documents';

-- 2. Rebuild if fragmented
REINDEX INDEX CONCURRENTLY idx_documents_embedding;

-- 3. Analyze table for query optimizer
ANALYZE documents;

Performance Monitoring

Query Performance Analysis

-- Enable timing
\timing on

-- Check query plan and execution time
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::VECTOR
LIMIT 10;

-- Look for:
-- - "Index Scan using idx_documents_embedding"
-- - Execution Time < 10ms for HNSW
-- - Rows returned matches LIMIT

Index Statistics

-- Index size and row estimates
SELECT
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    idx_scan AS scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
  AND indexrelname LIKE '%embedding%';

-- Vector-specific stats
SELECT * FROM hnsw_index_stats('idx_documents_embedding');

Recall Measurement

-- Create test set with known ground truth
-- 1. Get exact results (brute force)
WITH exact AS (
    SELECT id, embedding <=> query_vector AS distance
    FROM documents
    ORDER BY embedding <=> query_vector
    LIMIT 100
)
-- 2. Get approximate results (using index)
SELECT
    COUNT(*) FILTER (WHERE a.id IN (SELECT id FROM exact LIMIT 10)) * 10 AS recall_at_10
FROM (
    SELECT id
    FROM documents
    ORDER BY embedding <=> query_vector
    LIMIT 10
) a;

Production Configuration Profiles

Web Search (Low Latency)

-- Index configuration
CREATE INDEX idx_web ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

-- Runtime settings
SET hnsw.ef_search = 40;

-- Expected: <5ms p99, ~92% recall

E-Commerce (Balanced)

-- Index configuration
CREATE INDEX idx_ecom ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 24, ef_construction = 300);

-- Runtime settings
SET hnsw.ef_search = 100;

-- Expected: <10ms p99, ~96% recall

RAG / Knowledge Base (High Accuracy)

-- Index configuration
CREATE INDEX idx_rag ON knowledge_base
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);

-- Runtime settings
SET hnsw.ef_search = 200;

-- Expected: <15ms p99, ~98% recall
-- Index configuration
CREATE INDEX idx_compliance ON legal_docs
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 48, ef_construction = 600);

-- Runtime settings
SET hnsw.ef_search = 400;

-- Expected: <30ms p99, ~99% recall

Billion-Scale (Memory Optimized)

-- Index configuration (IVF-PQ)
CREATE INDEX idx_billion ON large_dataset
  USING ivfpq (embedding vector_l2_ops)
  WITH (lists = 32768, pq_segments = 32);

-- Runtime settings
SET ivf.nprobe = 64;

-- Expected: <50ms p99, ~90% recall, 10x less memory

Troubleshooting

Problem: Low Recall

Symptoms: Missing relevant results

Solutions: 1. Increase ef_search (runtime):

SET hnsw.ef_search = 200;
2. Rebuild index with higher ef_construction:
DROP INDEX idx; CREATE INDEX idx ... WITH (ef_construction = 400);
3. Increase M (requires rebuild):
DROP INDEX idx; CREATE INDEX idx ... WITH (m = 32);
4. Check if vectors are normalized (for cosine similarity)

Problem: Slow Queries

Symptoms: High latency (>50ms)

Solutions: 1. Verify index is being used:

EXPLAIN SELECT ... ORDER BY embedding <=> ...;
2. Reduce ef_search:
SET hnsw.ef_search = 50;
3. Check memory pressure:
SELECT * FROM pg_stat_bgwriter;
4. Consider IVF for very large datasets

Problem: High Memory Usage

Symptoms: OOM errors, swap usage

Solutions: 1. Reduce M parameter:

DROP INDEX idx; CREATE INDEX idx ... WITH (m = 8);
2. Switch to IVF-PQ:
CREATE INDEX idx USING ivfpq ... WITH (pq_segments = 32);
3. Shard across multiple nodes

Problem: Index Build Too Slow

Symptoms: Index creation takes hours

Solutions: 1. Reduce ef_construction:

CREATE INDEX idx ... WITH (ef_construction = 100);
2. Increase parallelism:
SET max_parallel_maintenance_workers = 8;
3. Build on subset first, then extend


Best Practices Summary

  1. Start with defaults: M=16, ef_construction=200, ef_search=50
  2. Measure recall: Verify on representative queries before production
  3. Tune ef_search first: Cheapest change for accuracy vs speed
  4. Monitor regularly: Track latency percentiles and recall
  5. Rebuild periodically: After significant data changes
  6. Match distance metric: Cosine for text, L2 for images
  7. Consider hybrid indexes: Different settings for different access patterns

Status: Production Ready Version: v7.0 Last Updated: January 2026