Skip to content

Vector Search Architecture

Last Updated: November 1, 2025 Version: v6.0 Phase 2 M1 Feature: F6.9 Hybrid Vector Search

Comprehensive architecture documentation for HeliosDB's vector search capabilities, including HNSW indexing, hybrid search, and fusion algorithms.


Table of Contents

  1. Overview
  2. HNSW Index Architecture
  3. Hybrid Search Design
  4. Fusion Algorithms
  5. Query Processing
  6. Performance Optimizations
  7. Integration with Storage Layer

Overview

Vector Search Capabilities

HeliosDB provides state-of-the-art vector search with three primary modes:

  1. Dense Vector Search (HNSW): Semantic similarity using embeddings
  2. Sparse Vector Search (BM25): Keyword-based retrieval
  3. Hybrid Search: Combined dense + sparse with intelligent fusion

Performance Metrics (v6.0): - Search latency: <10ms on 100K vectors - Recall@10: 97%+ accuracy - Index build time: ~2 min for 1M vectors - Memory overhead: ~50 bytes per vector (HNSW)


HNSW Index Architecture

Hierarchical Navigable Small World (HNSW)

Core Concept: Multi-layer proximity graph for approximate nearest neighbor search

Layer 2:  A ─────────────────── F         (Longest connections)
          │                     │
          │                     │
Layer 1:  A ──── B ──── C ──── F ──── G   (Medium connections)
          │      │      │      │      │
          │      │      │      │      │
Layer 0:  A ─ B ─ C ─ D ─ E ─ F ─ G ─ H   (All nodes, shortest hops)

Key Properties: - Multi-layer structure: Higher layers have sparser connections - Navigable: Greedy search from top layer to bottom - Small world: Short path between any two nodes - Logarithmic search: O(log N) average case

Data Structures

Node Structure:

struct HnswNode {
    id: VectorId,
    vector: Vec<f32>,           // Dense embedding (e.g., 768 dims)
    layer: usize,               // Maximum layer this node appears in
    neighbors: Vec<Vec<VectorId>>, // Neighbors per layer
}

Index Structure:

pub struct HnswIndex {
    nodes: HashMap<VectorId, HnswNode>,
    entry_point: VectorId,      // Top-layer entry node
    max_layer: usize,
    m: usize,                   // Max connections per layer (default: 16)
    ef_construction: usize,     // Build-time search width (default: 64)
    distance_fn: DistanceFunction,
}

pub enum DistanceFunction {
    Cosine,       // 1 - dot(a, b) / (||a|| * ||b||)
    Euclidean,    // sqrt(sum((a_i - b_i)^2))
    DotProduct,   // -dot(a, b)
}

Index Construction Algorithm

Insertion Process:

fn insert(vector: Vec<f32>, id: VectorId) {
    // 1. Determine layer for new node (exponential decay)
    let layer = select_layer();  // P(layer=l) = 1/2^l

    // 2. Find nearest neighbors from top to bottom
    let mut entry_point = self.entry_point;
    for level in (layer+1..=self.max_layer).rev() {
        // Search layer without adding connections
        entry_point = search_layer(entry_point, vector, level, 1);
    }

    // 3. Insert node and create connections
    for level in (0..=layer).rev() {
        // Find M nearest neighbors
        let candidates = search_layer(entry_point, vector, level, ef_construction);
        let neighbors = select_neighbors(candidates, m);

        // Create bidirectional links
        for neighbor in neighbors {
            add_connection(id, neighbor, level);
            add_connection(neighbor, id, level);

            // Prune neighbor's connections if exceeds M
            prune_connections(neighbor, level);
        }
    }
}

Layer Selection (exponential decay):

fn select_layer() -> usize {
    let m_l = 1.0 / (self.m as f64).ln();
    (-rand::random::<f64>().ln() * m_l).floor() as usize
}

Complexity: - Insert: O(log N * M * ef_construction) - Search: O(log N * ef_search) - Space: O(N * M * avg_layer)

Search Algorithm

Greedy Search (from top to bottom):

fn search(&self, query: Vec<f32>, k: usize, ef_search: usize) -> Vec<(VectorId, f32)> {
    let mut entry_point = self.entry_point;

    // Phase 1: Navigate top layers (greedy descent)
    for level in (1..=self.max_layer).rev() {
        entry_point = search_layer(entry_point, query, level, 1);
    }

    // Phase 2: Search bottom layer (expand search)
    let candidates = search_layer(entry_point, query, 0, ef_search);

    // Phase 3: Return top-k results
    candidates.iter().take(k).cloned().collect()
}

fn search_layer(&self, entry: VectorId, query: Vec<f32>, level: usize, ef: usize)
    -> VectorId
{
    let mut visited = HashSet::new();
    let mut candidates = BinaryHeap::new();  // Max-heap by distance
    let mut results = BinaryHeap::new();     // Max-heap (worst result on top)

    // Initialize with entry point
    let dist = distance(&self.nodes[entry].vector, &query);
    candidates.push((Reverse(dist), entry));
    results.push((dist, entry));
    visited.insert(entry);

    while let Some((Reverse(dist), current)) = candidates.pop() {
        // Stop if current is farther than worst result
        if dist > results.peek().unwrap().0 {
            break;
        }

        // Explore neighbors
        for &neighbor in &self.nodes[current].neighbors[level] {
            if visited.insert(neighbor) {
                let neighbor_dist = distance(&self.nodes[neighbor].vector, &query);

                // Add to results if better than worst
                if neighbor_dist < results.peek().unwrap().0 || results.len() < ef {
                    candidates.push((Reverse(neighbor_dist), neighbor));
                    results.push((neighbor_dist, neighbor));

                    // Remove worst result if exceeded ef
                    if results.len() > ef {
                        results.pop();
                    }
                }
            }
        }
    }

    results.peek().unwrap().1  // Return closest node
}

Search Parameters: - ef_search: Search expansion factor (higher = better recall, slower) - Typical values: 50-200 - Trade-off: recall vs latency


Hybrid Search Design

Architecture Overview

Hybrid Search combines dense (semantic) and sparse (keyword) retrieval:

Query: "wireless headphones"
    ├─────────────────┬─────────────────┐
    │                 │                 │
    ▼                 ▼                 ▼
Dense Vector      Sparse Vector    Full-Text
(HNSW)            (BM25)           Search
    │                 │                 │
    │  [0.82, 0.91]   │  "wireless": 2  │
    │  [0.78, 0.89]   │  "headphones": 3│
    │  ...            │  ...            │
    │                 │                 │
    └─────────────────┴─────────────────┘
              Fusion Algorithm
              (RRF / Weighted / Learned)
              Merged Results
              [doc_1, doc_2, ...]

Data Storage

Table Schema:

CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name TEXT,
    description TEXT,

    -- Dense vector (from embedding model)
    embedding VECTOR(768),

    -- Sparse vector (BM25 term weights)
    keywords SPARSEVEC(10000),

    -- Full-text search vector
    search_tsv TSVECTOR
);

-- Indexes
CREATE INDEX idx_embedding ON products USING hnsw(embedding vector_cosine_ops);
CREATE INDEX idx_keywords ON products USING gin(keywords);
CREATE INDEX idx_search_tsv ON products USING gin(search_tsv);


Fusion Algorithms

1. Reciprocal Rank Fusion (RRF)

Formula:

RRF_score(doc) = Σ (1 / (k + rank_i(doc)))

Implementation:

pub fn reciprocal_rank_fusion(
    dense_results: Vec<(DocId, f32)>,   // [(doc, score), ...]
    sparse_results: Vec<(DocId, f32)>,
    k: usize,  // Constant (typically 60)
) -> Vec<(DocId, f32)> {
    let mut scores = HashMap::new();

    // Add dense scores
    for (rank, (doc_id, _score)) in dense_results.iter().enumerate() {
        *scores.entry(doc_id).or_insert(0.0) += 1.0 / (k as f32 + rank as f32 + 1.0);
    }

    // Add sparse scores
    for (rank, (doc_id, _score)) in sparse_results.iter().enumerate() {
        *scores.entry(doc_id).or_insert(0.0) += 1.0 / (k as f32 + rank as f32 + 1.0);
    }

    // Sort by RRF score
    let mut results: Vec<_> = scores.into_iter().collect();
    results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    results
}

Characteristics: - Simple, parameter-free (except k) - Handles different score scales - Good default choice - ⚠ Ignores original score magnitudes

2. Weighted Fusion

Formula:

score(doc) = α * norm(dense_score) + (1-α) * norm(sparse_score)

Implementation:

pub fn weighted_fusion(
    dense_results: Vec<(DocId, f32)>,
    sparse_results: Vec<(DocId, f32)>,
    alpha: f32,  // Weight for dense (0.0-1.0)
) -> Vec<(DocId, f32)> {
    let mut scores = HashMap::new();

    // Normalize dense scores to [0, 1]
    let max_dense = dense_results[0].1;
    for (doc_id, score) in dense_results {
        *scores.entry(doc_id).or_insert(0.0) += alpha * (score / max_dense);
    }

    // Normalize sparse scores to [0, 1]
    let max_sparse = sparse_results[0].1;
    for (doc_id, score) in sparse_results {
        *scores.entry(doc_id).or_insert(0.0) += (1.0 - alpha) * (score / max_sparse);
    }

    // Sort by weighted score
    let mut results: Vec<_> = scores.into_iter().collect();
    results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    results
}

Characteristics: - Preserves score information - Tunable weight (α) - ⚠ Requires score normalization - ⚠ Sensitive to score distributions

Optimal α selection: - Semantic queries: α = 0.7-0.9 (favor dense) - Keyword queries: α = 0.1-0.3 (favor sparse) - Mixed queries: α = 0.5 (balanced)

3. Pre-Filter Fusion

Strategy: Filter with sparse, then re-rank with dense

Implementation:

pub fn pre_filter_fusion(
    query_embedding: Vec<f32>,
    sparse_results: Vec<DocId>,  // Top-N from BM25
    top_k: usize,
) -> Vec<(DocId, f32)> {
    // Phase 1: Filter to top-N with sparse search (e.g., N=100)
    let candidates = sparse_results.into_iter().take(100).collect::<Vec<_>>();

    // Phase 2: Re-rank candidates with dense search
    let mut dense_scores = Vec::new();
    for doc_id in candidates {
        let embedding = get_embedding(doc_id);
        let score = cosine_similarity(&query_embedding, &embedding);
        dense_scores.push((doc_id, score));
    }

    // Phase 3: Sort by dense score
    dense_scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    dense_scores.into_iter().take(top_k).collect()
}

Characteristics: - Fast (only re-rank top-N) - Good for keyword + semantic queries - ⚠ May miss dense-only matches

4. Learned Fusion (ML-based)

Strategy: Train ML model to predict optimal fusion weights

Model:

pub struct LearnedFusion {
    model: LinearRegression,  // Or LightGBM, neural network
}

impl LearnedFusion {
    pub fn predict_score(
        &self,
        dense_score: f32,
        sparse_score: f32,
        dense_rank: usize,
        sparse_rank: usize,
        query_features: Vec<f32>,  // Query length, term overlap, etc.
    ) -> f32 {
        let features = vec![
            dense_score,
            sparse_score,
            dense_rank as f32,
            sparse_rank as f32,
            query_features[0],  // Query length
            query_features[1],  // Term count
            // ... more features
        ];

        self.model.predict(&features)
    }
}

Training Data: - Query-document relevance labels - Features: scores, ranks, query characteristics - Loss: NDCG@10 or MRR

Characteristics: - Best performance (when trained) - Adapts to data distribution - ⚠ Requires training data - ⚠ More complex


Query Processing

SQL Interface

Basic Vector Search:

SELECT id, name, embedding <=> query_embedding AS distance
FROM products
ORDER BY distance
LIMIT 10;

Hybrid Search:

SELECT * FROM hybrid_search(
    table_name := 'products',
    query_text := 'wireless headphones',
    query_embedding := get_embedding('wireless headphones'),
    fusion_algorithm := 'rrf',  -- 'rrf' | 'weighted' | 'prefilter' | 'learned'
    k := 60,                    -- RRF parameter
    alpha := 0.7,               -- Weighted fusion parameter
    limit := 10
);

Query Execution Plan

HybridSearchExec
├─ DenseSearchExec (HNSW)
│  └─ IndexScan(products, idx_embedding)
├─ SparseSearchExec (BM25)
│  └─ IndexScan(products, idx_keywords)
└─ FusionExec (RRF)
   └─ TopK(10)

Performance Optimizations

1. SIMD Distance Computation

Cosine Similarity (AVX2):

#[cfg(target_arch = "x86_64")]
unsafe fn cosine_similarity_simd(a: &[f32], b: &[f32]) -> f32 {
    use std::arch::x86_64::*;

    let mut dot = _mm256_setzero_ps();
    let mut norm_a = _mm256_setzero_ps();
    let mut norm_b = _mm256_setzero_ps();

    // Process 8 floats at a time
    for i in (0..a.len()).step_by(8) {
        let va = _mm256_loadu_ps(&a[i]);
        let vb = _mm256_loadu_ps(&b[i]);

        dot = _mm256_fmadd_ps(va, vb, dot);
        norm_a = _mm256_fmadd_ps(va, va, norm_a);
        norm_b = _mm256_fmadd_ps(vb, vb, norm_b);
    }

    // Horizontal sum
    let dot_sum = hsum_ps_avx(dot);
    let norm_a_sum = hsum_ps_avx(norm_a);
    let norm_b_sum = hsum_ps_avx(norm_b);

    1.0 - (dot_sum / (norm_a_sum.sqrt() * norm_b_sum.sqrt()))
}

Speedup: 4-8x vs scalar

2. Quantization

Product Quantization (compress vectors):

pub struct ProductQuantizer {
    codebooks: Vec<Vec<Vec<f32>>>,  // [subspace][centroid][dims]
    m: usize,                       // Number of subspaces
    ksub: usize,                    // Centroids per subspace
}

// Compress 768-dim vector to 96 bytes (8x compression)
// m=96 subspaces, ksub=256 centroids → 1 byte per subspace

Speedup: 5-10x faster search, 8x less memory

3. Caching

Query Result Cache:

pub struct QueryCache {
    cache: LruCache<QueryHash, Vec<(DocId, f32)>>,
}

// Cache hot queries (e.g., "wireless headphones")
// Hit rate: 30-50% for production workloads


Integration with Storage Layer

Index Persistence

File Format:

hnsw_index.bin
├─ Header (metadata)
│  ├─ version: u32
│  ├─ num_vectors: u64
│  ├─ dim: u32
│  ├─ m: u32
│  └─ max_layer: u32
├─ Node Data
│  ├─ Node 1: [id, layer, vector, neighbors]
│  ├─ Node 2: [id, layer, vector, neighbors]
│  └─ ...
└─ Entry Point: VectorId

Mmap Loading (zero-copy):

let file = File::open("hnsw_index.bin")?;
let mmap = unsafe { Mmap::map(&file)? };
let index = HnswIndex::from_bytes(&mmap)?;

Loading Time: ~1 second for 1M vectors


Benchmarks

Latency (100K vectors, 768 dims)

Operation Latency (P50) Latency (P99)
Dense search (k=10) 3.2 ms 8.1 ms
Sparse search (k=10) 1.8 ms 4.2 ms
Hybrid search (RRF, k=10) 5.1 ms 11.3 ms

Recall@10 Accuracy

Method Recall@10 QPS (1 thread)
Exact search 100% 12
HNSW (ef=50) 95.2% 420
HNSW (ef=100) 97.8% 310
HNSW (ef=200) 99.1% 180

Scalability (HNSW, ef=100)

Vectors Build Time Index Size Search Latency
10K 12 sec 4.8 MB 0.8 ms
100K 2.1 min 48 MB 3.2 ms
1M 22 min 480 MB 12.1 ms
10M 3.8 hr 4.8 GB 45.3 ms

Last Updated: November 1, 2025 Version: v6.0 Phase 2 M1 Maintained by: HeliosDB Vector Search Team

Related: - Hybrid Search User Guide - Main Architecture - Storage Architecture