Vector Search Architecture¶
Last Updated: November 1, 2025 Version: v6.0 Phase 2 M1 Feature: F6.9 Hybrid Vector Search
Comprehensive architecture documentation for HeliosDB's vector search capabilities, including HNSW indexing, hybrid search, and fusion algorithms.
Table of Contents¶
- Overview
- HNSW Index Architecture
- Hybrid Search Design
- Fusion Algorithms
- Query Processing
- Performance Optimizations
- Integration with Storage Layer
Overview¶
Vector Search Capabilities¶
HeliosDB provides state-of-the-art vector search with three primary modes:
- Dense Vector Search (HNSW): Semantic similarity using embeddings
- Sparse Vector Search (BM25): Keyword-based retrieval
- Hybrid Search: Combined dense + sparse with intelligent fusion
Performance Metrics (v6.0): - Search latency: <10ms on 100K vectors - Recall@10: 97%+ accuracy - Index build time: ~2 min for 1M vectors - Memory overhead: ~50 bytes per vector (HNSW)
HNSW Index Architecture¶
Hierarchical Navigable Small World (HNSW)¶
Core Concept: Multi-layer proximity graph for approximate nearest neighbor search
Layer 2: A ─────────────────── F (Longest connections)
│ │
│ │
Layer 1: A ──── B ──── C ──── F ──── G (Medium connections)
│ │ │ │ │
│ │ │ │ │
Layer 0: A ─ B ─ C ─ D ─ E ─ F ─ G ─ H (All nodes, shortest hops)
Key Properties: - Multi-layer structure: Higher layers have sparser connections - Navigable: Greedy search from top layer to bottom - Small world: Short path between any two nodes - Logarithmic search: O(log N) average case
Data Structures¶
Node Structure:
struct HnswNode {
id: VectorId,
vector: Vec<f32>, // Dense embedding (e.g., 768 dims)
layer: usize, // Maximum layer this node appears in
neighbors: Vec<Vec<VectorId>>, // Neighbors per layer
}
Index Structure:
pub struct HnswIndex {
nodes: HashMap<VectorId, HnswNode>,
entry_point: VectorId, // Top-layer entry node
max_layer: usize,
m: usize, // Max connections per layer (default: 16)
ef_construction: usize, // Build-time search width (default: 64)
distance_fn: DistanceFunction,
}
pub enum DistanceFunction {
Cosine, // 1 - dot(a, b) / (||a|| * ||b||)
Euclidean, // sqrt(sum((a_i - b_i)^2))
DotProduct, // -dot(a, b)
}
Index Construction Algorithm¶
Insertion Process:
fn insert(vector: Vec<f32>, id: VectorId) {
// 1. Determine layer for new node (exponential decay)
let layer = select_layer(); // P(layer=l) = 1/2^l
// 2. Find nearest neighbors from top to bottom
let mut entry_point = self.entry_point;
for level in (layer+1..=self.max_layer).rev() {
// Search layer without adding connections
entry_point = search_layer(entry_point, vector, level, 1);
}
// 3. Insert node and create connections
for level in (0..=layer).rev() {
// Find M nearest neighbors
let candidates = search_layer(entry_point, vector, level, ef_construction);
let neighbors = select_neighbors(candidates, m);
// Create bidirectional links
for neighbor in neighbors {
add_connection(id, neighbor, level);
add_connection(neighbor, id, level);
// Prune neighbor's connections if exceeds M
prune_connections(neighbor, level);
}
}
}
Layer Selection (exponential decay):
fn select_layer() -> usize {
let m_l = 1.0 / (self.m as f64).ln();
(-rand::random::<f64>().ln() * m_l).floor() as usize
}
Complexity: - Insert: O(log N * M * ef_construction) - Search: O(log N * ef_search) - Space: O(N * M * avg_layer)
Search Algorithm¶
Greedy Search (from top to bottom):
fn search(&self, query: Vec<f32>, k: usize, ef_search: usize) -> Vec<(VectorId, f32)> {
let mut entry_point = self.entry_point;
// Phase 1: Navigate top layers (greedy descent)
for level in (1..=self.max_layer).rev() {
entry_point = search_layer(entry_point, query, level, 1);
}
// Phase 2: Search bottom layer (expand search)
let candidates = search_layer(entry_point, query, 0, ef_search);
// Phase 3: Return top-k results
candidates.iter().take(k).cloned().collect()
}
fn search_layer(&self, entry: VectorId, query: Vec<f32>, level: usize, ef: usize)
-> VectorId
{
let mut visited = HashSet::new();
let mut candidates = BinaryHeap::new(); // Max-heap by distance
let mut results = BinaryHeap::new(); // Max-heap (worst result on top)
// Initialize with entry point
let dist = distance(&self.nodes[entry].vector, &query);
candidates.push((Reverse(dist), entry));
results.push((dist, entry));
visited.insert(entry);
while let Some((Reverse(dist), current)) = candidates.pop() {
// Stop if current is farther than worst result
if dist > results.peek().unwrap().0 {
break;
}
// Explore neighbors
for &neighbor in &self.nodes[current].neighbors[level] {
if visited.insert(neighbor) {
let neighbor_dist = distance(&self.nodes[neighbor].vector, &query);
// Add to results if better than worst
if neighbor_dist < results.peek().unwrap().0 || results.len() < ef {
candidates.push((Reverse(neighbor_dist), neighbor));
results.push((neighbor_dist, neighbor));
// Remove worst result if exceeded ef
if results.len() > ef {
results.pop();
}
}
}
}
}
results.peek().unwrap().1 // Return closest node
}
Search Parameters:
- ef_search: Search expansion factor (higher = better recall, slower)
- Typical values: 50-200
- Trade-off: recall vs latency
Hybrid Search Design¶
Architecture Overview¶
Hybrid Search combines dense (semantic) and sparse (keyword) retrieval:
Query: "wireless headphones"
│
├─────────────────┬─────────────────┐
│ │ │
▼ ▼ ▼
Dense Vector Sparse Vector Full-Text
(HNSW) (BM25) Search
│ │ │
│ [0.82, 0.91] │ "wireless": 2 │
│ [0.78, 0.89] │ "headphones": 3│
│ ... │ ... │
│ │ │
└─────────────────┴─────────────────┘
│
▼
Fusion Algorithm
(RRF / Weighted / Learned)
│
▼
Merged Results
[doc_1, doc_2, ...]
Data Storage¶
Table Schema:
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT,
description TEXT,
-- Dense vector (from embedding model)
embedding VECTOR(768),
-- Sparse vector (BM25 term weights)
keywords SPARSEVEC(10000),
-- Full-text search vector
search_tsv TSVECTOR
);
-- Indexes
CREATE INDEX idx_embedding ON products USING hnsw(embedding vector_cosine_ops);
CREATE INDEX idx_keywords ON products USING gin(keywords);
CREATE INDEX idx_search_tsv ON products USING gin(search_tsv);
Fusion Algorithms¶
1. Reciprocal Rank Fusion (RRF)¶
Formula:
Implementation:
pub fn reciprocal_rank_fusion(
dense_results: Vec<(DocId, f32)>, // [(doc, score), ...]
sparse_results: Vec<(DocId, f32)>,
k: usize, // Constant (typically 60)
) -> Vec<(DocId, f32)> {
let mut scores = HashMap::new();
// Add dense scores
for (rank, (doc_id, _score)) in dense_results.iter().enumerate() {
*scores.entry(doc_id).or_insert(0.0) += 1.0 / (k as f32 + rank as f32 + 1.0);
}
// Add sparse scores
for (rank, (doc_id, _score)) in sparse_results.iter().enumerate() {
*scores.entry(doc_id).or_insert(0.0) += 1.0 / (k as f32 + rank as f32 + 1.0);
}
// Sort by RRF score
let mut results: Vec<_> = scores.into_iter().collect();
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
results
}
Characteristics: - Simple, parameter-free (except k) - Handles different score scales - Good default choice - ⚠ Ignores original score magnitudes
2. Weighted Fusion¶
Formula:
Implementation:
pub fn weighted_fusion(
dense_results: Vec<(DocId, f32)>,
sparse_results: Vec<(DocId, f32)>,
alpha: f32, // Weight for dense (0.0-1.0)
) -> Vec<(DocId, f32)> {
let mut scores = HashMap::new();
// Normalize dense scores to [0, 1]
let max_dense = dense_results[0].1;
for (doc_id, score) in dense_results {
*scores.entry(doc_id).or_insert(0.0) += alpha * (score / max_dense);
}
// Normalize sparse scores to [0, 1]
let max_sparse = sparse_results[0].1;
for (doc_id, score) in sparse_results {
*scores.entry(doc_id).or_insert(0.0) += (1.0 - alpha) * (score / max_sparse);
}
// Sort by weighted score
let mut results: Vec<_> = scores.into_iter().collect();
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
results
}
Characteristics: - Preserves score information - Tunable weight (α) - ⚠ Requires score normalization - ⚠ Sensitive to score distributions
Optimal α selection: - Semantic queries: α = 0.7-0.9 (favor dense) - Keyword queries: α = 0.1-0.3 (favor sparse) - Mixed queries: α = 0.5 (balanced)
3. Pre-Filter Fusion¶
Strategy: Filter with sparse, then re-rank with dense
Implementation:
pub fn pre_filter_fusion(
query_embedding: Vec<f32>,
sparse_results: Vec<DocId>, // Top-N from BM25
top_k: usize,
) -> Vec<(DocId, f32)> {
// Phase 1: Filter to top-N with sparse search (e.g., N=100)
let candidates = sparse_results.into_iter().take(100).collect::<Vec<_>>();
// Phase 2: Re-rank candidates with dense search
let mut dense_scores = Vec::new();
for doc_id in candidates {
let embedding = get_embedding(doc_id);
let score = cosine_similarity(&query_embedding, &embedding);
dense_scores.push((doc_id, score));
}
// Phase 3: Sort by dense score
dense_scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
dense_scores.into_iter().take(top_k).collect()
}
Characteristics: - Fast (only re-rank top-N) - Good for keyword + semantic queries - ⚠ May miss dense-only matches
4. Learned Fusion (ML-based)¶
Strategy: Train ML model to predict optimal fusion weights
Model:
pub struct LearnedFusion {
model: LinearRegression, // Or LightGBM, neural network
}
impl LearnedFusion {
pub fn predict_score(
&self,
dense_score: f32,
sparse_score: f32,
dense_rank: usize,
sparse_rank: usize,
query_features: Vec<f32>, // Query length, term overlap, etc.
) -> f32 {
let features = vec![
dense_score,
sparse_score,
dense_rank as f32,
sparse_rank as f32,
query_features[0], // Query length
query_features[1], // Term count
// ... more features
];
self.model.predict(&features)
}
}
Training Data: - Query-document relevance labels - Features: scores, ranks, query characteristics - Loss: NDCG@10 or MRR
Characteristics: - Best performance (when trained) - Adapts to data distribution - ⚠ Requires training data - ⚠ More complex
Query Processing¶
SQL Interface¶
Basic Vector Search:
SELECT id, name, embedding <=> query_embedding AS distance
FROM products
ORDER BY distance
LIMIT 10;
Hybrid Search:
SELECT * FROM hybrid_search(
table_name := 'products',
query_text := 'wireless headphones',
query_embedding := get_embedding('wireless headphones'),
fusion_algorithm := 'rrf', -- 'rrf' | 'weighted' | 'prefilter' | 'learned'
k := 60, -- RRF parameter
alpha := 0.7, -- Weighted fusion parameter
limit := 10
);
Query Execution Plan¶
HybridSearchExec
├─ DenseSearchExec (HNSW)
│ └─ IndexScan(products, idx_embedding)
├─ SparseSearchExec (BM25)
│ └─ IndexScan(products, idx_keywords)
└─ FusionExec (RRF)
└─ TopK(10)
Performance Optimizations¶
1. SIMD Distance Computation¶
Cosine Similarity (AVX2):
#[cfg(target_arch = "x86_64")]
unsafe fn cosine_similarity_simd(a: &[f32], b: &[f32]) -> f32 {
use std::arch::x86_64::*;
let mut dot = _mm256_setzero_ps();
let mut norm_a = _mm256_setzero_ps();
let mut norm_b = _mm256_setzero_ps();
// Process 8 floats at a time
for i in (0..a.len()).step_by(8) {
let va = _mm256_loadu_ps(&a[i]);
let vb = _mm256_loadu_ps(&b[i]);
dot = _mm256_fmadd_ps(va, vb, dot);
norm_a = _mm256_fmadd_ps(va, va, norm_a);
norm_b = _mm256_fmadd_ps(vb, vb, norm_b);
}
// Horizontal sum
let dot_sum = hsum_ps_avx(dot);
let norm_a_sum = hsum_ps_avx(norm_a);
let norm_b_sum = hsum_ps_avx(norm_b);
1.0 - (dot_sum / (norm_a_sum.sqrt() * norm_b_sum.sqrt()))
}
Speedup: 4-8x vs scalar
2. Quantization¶
Product Quantization (compress vectors):
pub struct ProductQuantizer {
codebooks: Vec<Vec<Vec<f32>>>, // [subspace][centroid][dims]
m: usize, // Number of subspaces
ksub: usize, // Centroids per subspace
}
// Compress 768-dim vector to 96 bytes (8x compression)
// m=96 subspaces, ksub=256 centroids → 1 byte per subspace
Speedup: 5-10x faster search, 8x less memory
3. Caching¶
Query Result Cache:
pub struct QueryCache {
cache: LruCache<QueryHash, Vec<(DocId, f32)>>,
}
// Cache hot queries (e.g., "wireless headphones")
// Hit rate: 30-50% for production workloads
Integration with Storage Layer¶
Index Persistence¶
File Format:
hnsw_index.bin
├─ Header (metadata)
│ ├─ version: u32
│ ├─ num_vectors: u64
│ ├─ dim: u32
│ ├─ m: u32
│ └─ max_layer: u32
├─ Node Data
│ ├─ Node 1: [id, layer, vector, neighbors]
│ ├─ Node 2: [id, layer, vector, neighbors]
│ └─ ...
└─ Entry Point: VectorId
Mmap Loading (zero-copy):
let file = File::open("hnsw_index.bin")?;
let mmap = unsafe { Mmap::map(&file)? };
let index = HnswIndex::from_bytes(&mmap)?;
Loading Time: ~1 second for 1M vectors
Benchmarks¶
Latency (100K vectors, 768 dims)¶
| Operation | Latency (P50) | Latency (P99) |
|---|---|---|
| Dense search (k=10) | 3.2 ms | 8.1 ms |
| Sparse search (k=10) | 1.8 ms | 4.2 ms |
| Hybrid search (RRF, k=10) | 5.1 ms | 11.3 ms |
Recall@10 Accuracy¶
| Method | Recall@10 | QPS (1 thread) |
|---|---|---|
| Exact search | 100% | 12 |
| HNSW (ef=50) | 95.2% | 420 |
| HNSW (ef=100) | 97.8% | 310 |
| HNSW (ef=200) | 99.1% | 180 |
Scalability (HNSW, ef=100)¶
| Vectors | Build Time | Index Size | Search Latency |
|---|---|---|---|
| 10K | 12 sec | 4.8 MB | 0.8 ms |
| 100K | 2.1 min | 48 MB | 3.2 ms |
| 1M | 22 min | 480 MB | 12.1 ms |
| 10M | 3.8 hr | 4.8 GB | 45.3 ms |
Last Updated: November 1, 2025 Version: v6.0 Phase 2 M1 Maintained by: HeliosDB Vector Search Team
Related: - Hybrid Search User Guide - Main Architecture - Storage Architecture