HeliosDB Indexing Overview¶
Version: 7.0 Last Updated: 2026-01-04
Introduction¶
HeliosDB provides a comprehensive indexing system designed for high-performance data access across diverse workloads. From traditional B-tree indexes for range queries to advanced vector indexes for AI/ML applications, HeliosDB supports the full spectrum of indexing requirements for modern data-intensive applications.
This document provides an overview of all index types available in HeliosDB, their use cases, and key characteristics.
Index Types at a Glance¶
| Index Type | Best For | Access Pattern | Range Queries | Memory Footprint |
|---|---|---|---|---|
| B-tree | General purpose | O(log n) | Yes | Medium |
| Hash | Exact matches | O(1) | No | Low |
| LSM-Tree | Write-heavy workloads | O(log n) | Yes | Low (tiered) |
| GIN | Multi-valued columns | O(log n) | Partial | High |
| GiST | Spatial/geometric data | O(log n) | Yes | Medium |
| HNSW | Vector similarity | O(log n) approx | N/A | High |
| IVF | Large-scale vectors | O(n/k) approx | N/A | Medium |
| Full-Text | Text search | O(log n) | N/A | High |
B-tree Indexes¶
B-tree indexes are the default and most versatile index type in HeliosDB. They maintain sorted data and support efficient equality and range queries.
Key Characteristics¶
- Balanced tree structure with O(log n) lookup, insert, and delete
- Supports range queries:
<,<=,=,>=,>,BETWEEN - Supports ORDER BY operations efficiently
- Ideal for columns with high cardinality
Use Cases¶
- Primary key lookups
- Foreign key joins
- Date/timestamp range queries
- Sorted result retrieval
Example¶
-- Create a B-tree index (default)
CREATE INDEX idx_orders_date ON orders (order_date);
-- Composite B-tree index
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
-- Query using the index
SELECT * FROM orders
WHERE customer_id = 12345
AND order_date BETWEEN '2025-01-01' AND '2025-12-31';
Hash Indexes¶
Hash indexes provide O(1) constant-time lookups for exact equality comparisons. They are optimized for high-speed point queries.
Key Characteristics¶
- O(1) lookup time for equality comparisons
- Does not support range queries
- Lower memory overhead than B-trees for equality-only access
- Ideal for lookup tables and exact-match scenarios
Use Cases¶
- Session ID lookups
- Cache key lookups
- Unique identifier searches
- Configuration value retrieval
Example¶
-- Create a hash index
CREATE INDEX idx_sessions_id ON sessions USING HASH (session_id);
-- Query using the hash index
SELECT user_id, created_at FROM sessions WHERE session_id = 'abc123xyz';
LSM-Tree Indexes¶
LSM-Tree (Log-Structured Merge-Tree) indexes are optimized for write-heavy workloads. They buffer writes in memory and periodically flush to sorted disk structures.
Key Characteristics¶
- Optimized for high write throughput
- Writes buffered in memtable before flushing to SSTables
- Background compaction merges and optimizes disk structures
- Supports range queries via sorted SSTables
Use Cases¶
- Event logging and time-series data
- IoT sensor data ingestion
- High-velocity insert workloads
- Append-heavy access patterns
Example¶
-- Create an LSM-Tree index for high-write workloads
CREATE INDEX idx_events_timestamp ON events USING LSMTREE (timestamp);
-- Configure memtable size
ALTER INDEX idx_events_timestamp SET (memtable_size_mb = 64);
GIN (Generalized Inverted) Indexes¶
GIN indexes are designed for columns containing multiple values per row, such as arrays, JSONB documents, and full-text search vectors.
Key Characteristics¶
- Inverted index structure mapping values to rows
- Efficient for containment queries (
@>,<@,&&) - Supports arrays, JSONB, and tsvector
- Higher build and update cost than B-tree
Use Cases¶
- JSONB document queries
- Array containment searches
- Tag-based filtering
- Full-text search
Example¶
-- Create a GIN index on JSONB column
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);
-- Query JSONB with the index
SELECT * FROM products
WHERE attributes @> '{"category": "electronics", "in_stock": true}';
-- Create GIN index on array column
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
-- Query array containment
SELECT * FROM posts WHERE tags && ARRAY['database', 'performance'];
GiST (Generalized Search Tree) Indexes¶
GiST indexes support complex data types like geometric shapes, ranges, and spatial data. They enable efficient spatial queries and nearest-neighbor searches.
Key Characteristics¶
- Supports geometric and range data types
- Enables spatial queries (contains, overlaps, nearest-neighbor)
- Balanced tree with pluggable operators
- Used for PostGIS-style geospatial queries
Use Cases¶
- Geospatial location queries
- Bounding box searches
- IP range lookups
- Temporal range overlaps
Example¶
-- Create a GiST index for geometric data
CREATE INDEX idx_locations_geo ON locations USING GIST (coordinates);
-- Spatial containment query
SELECT name FROM locations
WHERE coordinates && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0);
-- Create GiST index for IP ranges
CREATE INDEX idx_ip_ranges ON network_blocks USING GIST (ip_range);
-- Query IP containment
SELECT block_name FROM network_blocks WHERE ip_range >> '192.168.1.100';
Vector Indexes (HNSW)¶
HNSW (Hierarchical Navigable Small World) indexes provide fast approximate nearest-neighbor search for high-dimensional vector data, essential for AI/ML applications.
Key Characteristics¶
- Graph-based structure with navigable small-world properties
- Sub-linear search time for high-dimensional vectors
- Configurable accuracy vs speed tradeoff
- Supports cosine, L2, and inner product distance metrics
Configuration Parameters¶
| Parameter | Description | Default |
|---|---|---|
m |
Connections per layer | 32 |
ef_construction |
Build-time beam width | 40 |
ef_search |
Query-time beam width | 10 |
Use Cases¶
- Semantic search
- Image similarity
- Recommendation systems
- RAG (Retrieval-Augmented Generation)
Example¶
-- Create HNSW index for vector embeddings
CREATE INDEX idx_embeddings_hnsw ON documents USING HNSW (embedding vector_l2_ops)
WITH (m = 32, ef_construction = 64);
-- Similarity search
SELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
Vector Indexes (IVF)¶
IVF (Inverted File) indexes partition vectors into clusters for efficient large-scale similarity search. They offer a good balance between speed and memory usage.
Key Characteristics¶
- Cluster-based partitioning using k-means
- Searches only nearby clusters for efficiency
- Configurable number of clusters (nlist) and probes (nprobe)
- Better for very large vector datasets than HNSW
Configuration Parameters¶
| Parameter | Description | Default |
|---|---|---|
nlist |
Number of clusters | 100 |
nprobe |
Clusters to search | 10 |
Use Cases¶
- Large-scale vector search (millions of vectors)
- Memory-constrained environments
- Batch similarity processing
Example¶
-- Create IVF index with product quantization
CREATE INDEX idx_images_ivf ON images USING IVFPQ (feature_vector)
WITH (nlist = 1000, nprobe = 20);
-- Adjust probe count for accuracy
SET ivfpq.nprobe = 50;
-- Similarity search
SELECT id, image_url
FROM images
ORDER BY feature_vector <-> query_vector
LIMIT 20;
Full-Text Search Indexes¶
Full-text search indexes enable efficient text search with support for stemming, stop words, ranking, and phrase matching.
Key Characteristics¶
- Tokenization and stemming for natural language processing
- Stop word filtering to reduce noise
- Relevance ranking with ts_rank functions
- Boolean and phrase queries
Use Cases¶
- Document search
- Product catalog search
- Knowledge base queries
- Log analysis
Example¶
-- Create full-text index
CREATE INDEX idx_articles_fts ON articles USING FULLTEXT (title, content);
-- Basic search with ranking
SELECT id, title, ts_rank(document, query) AS relevance
FROM articles,
websearch_to_tsquery('english', 'database performance') AS query
WHERE document @@ query
ORDER BY relevance DESC
LIMIT 10;
-- Phrase search
SELECT * FROM articles
WHERE content @@ phraseto_tsquery('english', 'query optimization');
Partial Indexes¶
Partial indexes index only a subset of rows matching a WHERE condition, reducing index size and improving performance for filtered queries.
Key Characteristics¶
- Smaller index size than full indexes
- Faster maintenance for INSERT/UPDATE/DELETE
- Optimized for specific query patterns
Example¶
-- Index only active orders
CREATE INDEX idx_orders_active ON orders (customer_id, order_date)
WHERE status = 'active';
-- Index only recent data
CREATE INDEX idx_logs_recent ON logs (created_at, severity)
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';
Expression Indexes¶
Expression indexes index the result of an expression or function, enabling efficient queries on computed values.
Key Characteristics¶
- Indexes computed values rather than raw column data
- Supports functions, operators, and expressions
- Essential for case-insensitive searches and JSON path queries
Example¶
-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Query using the expression index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- JSONB path expression index
CREATE INDEX idx_orders_customer_name ON orders ((data->>'customer_name'));
-- Date extraction index
CREATE INDEX idx_events_year_month ON events (EXTRACT(YEAR FROM event_date), EXTRACT(MONTH FROM event_date));
Composite Indexes¶
Composite indexes index multiple columns together, optimizing queries that filter or sort on multiple columns.
Key Characteristics¶
- Column order matters: leftmost prefix is used for lookups
- Supports multiple access patterns with one index
- Reduces index count and maintenance overhead
Example¶
-- Composite index for common query pattern
CREATE INDEX idx_orders_cust_date_status ON orders (customer_id, order_date DESC, status);
-- This index supports:
-- WHERE customer_id = ?
-- WHERE customer_id = ? AND order_date = ?
-- WHERE customer_id = ? AND order_date BETWEEN ? AND ?
-- ORDER BY customer_id, order_date DESC
Related Documentation¶
- User Guide - Comprehensive guide to creating and managing indexes
- Quick Start - Get started with indexing in 5 minutes
- Online Rebuild - Zero-downtime index maintenance
- WAL Durability - Index crash recovery implementation
- Full-Text Search Tuning - Advanced full-text search configuration
- Vector Search - Multimodal vector search guide
Performance Considerations¶
Index Selection Guidelines¶
- Start with B-tree for general-purpose indexing
- Use Hash only for pure equality lookups with no range needs
- Choose LSM-Tree for write-heavy workloads with eventual reads
- Select HNSW for vector search with low latency requirements
- Pick IVF for large-scale vector search with memory constraints
- Apply GIN for JSONB and array columns
- Use GiST for spatial and range data types
Index Maintenance¶
- Monitor index bloat with
heliosdb_index_statsview - Rebuild fragmented indexes using online rebuild for zero downtime
- Update statistics regularly with
ANALYZE - Remove unused indexes to reduce write overhead
Summary¶
HeliosDB's indexing system provides:
- 9 index types covering all major access patterns
- Online index rebuild with zero downtime
- WAL-based durability for crash recovery
- Autonomous index recommendations via the Index Advisor
- GPU-accelerated vector search for AI/ML workloads
Choose the right index type for your workload, and leverage HeliosDB's advanced features like partial indexes, expression indexes, and online maintenance for optimal performance.