Skip to content

HeliosDB Indexing Overview

Version: 7.0 Last Updated: 2026-01-04


Introduction

HeliosDB provides a comprehensive indexing system designed for high-performance data access across diverse workloads. From traditional B-tree indexes for range queries to advanced vector indexes for AI/ML applications, HeliosDB supports the full spectrum of indexing requirements for modern data-intensive applications.

This document provides an overview of all index types available in HeliosDB, their use cases, and key characteristics.


Index Types at a Glance

Index Type Best For Access Pattern Range Queries Memory Footprint
B-tree General purpose O(log n) Yes Medium
Hash Exact matches O(1) No Low
LSM-Tree Write-heavy workloads O(log n) Yes Low (tiered)
GIN Multi-valued columns O(log n) Partial High
GiST Spatial/geometric data O(log n) Yes Medium
HNSW Vector similarity O(log n) approx N/A High
IVF Large-scale vectors O(n/k) approx N/A Medium
Full-Text Text search O(log n) N/A High

B-tree Indexes

B-tree indexes are the default and most versatile index type in HeliosDB. They maintain sorted data and support efficient equality and range queries.

Key Characteristics

  • Balanced tree structure with O(log n) lookup, insert, and delete
  • Supports range queries: <, <=, =, >=, >, BETWEEN
  • Supports ORDER BY operations efficiently
  • Ideal for columns with high cardinality

Use Cases

  • Primary key lookups
  • Foreign key joins
  • Date/timestamp range queries
  • Sorted result retrieval

Example

-- Create a B-tree index (default)
CREATE INDEX idx_orders_date ON orders (order_date);

-- Composite B-tree index
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

-- Query using the index
SELECT * FROM orders
WHERE customer_id = 12345
  AND order_date BETWEEN '2025-01-01' AND '2025-12-31';

Hash Indexes

Hash indexes provide O(1) constant-time lookups for exact equality comparisons. They are optimized for high-speed point queries.

Key Characteristics

  • O(1) lookup time for equality comparisons
  • Does not support range queries
  • Lower memory overhead than B-trees for equality-only access
  • Ideal for lookup tables and exact-match scenarios

Use Cases

  • Session ID lookups
  • Cache key lookups
  • Unique identifier searches
  • Configuration value retrieval

Example

-- Create a hash index
CREATE INDEX idx_sessions_id ON sessions USING HASH (session_id);

-- Query using the hash index
SELECT user_id, created_at FROM sessions WHERE session_id = 'abc123xyz';

LSM-Tree Indexes

LSM-Tree (Log-Structured Merge-Tree) indexes are optimized for write-heavy workloads. They buffer writes in memory and periodically flush to sorted disk structures.

Key Characteristics

  • Optimized for high write throughput
  • Writes buffered in memtable before flushing to SSTables
  • Background compaction merges and optimizes disk structures
  • Supports range queries via sorted SSTables

Use Cases

  • Event logging and time-series data
  • IoT sensor data ingestion
  • High-velocity insert workloads
  • Append-heavy access patterns

Example

-- Create an LSM-Tree index for high-write workloads
CREATE INDEX idx_events_timestamp ON events USING LSMTREE (timestamp);

-- Configure memtable size
ALTER INDEX idx_events_timestamp SET (memtable_size_mb = 64);

GIN (Generalized Inverted) Indexes

GIN indexes are designed for columns containing multiple values per row, such as arrays, JSONB documents, and full-text search vectors.

Key Characteristics

  • Inverted index structure mapping values to rows
  • Efficient for containment queries (@>, <@, &&)
  • Supports arrays, JSONB, and tsvector
  • Higher build and update cost than B-tree

Use Cases

  • JSONB document queries
  • Array containment searches
  • Tag-based filtering
  • Full-text search

Example

-- Create a GIN index on JSONB column
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

-- Query JSONB with the index
SELECT * FROM products
WHERE attributes @> '{"category": "electronics", "in_stock": true}';

-- Create GIN index on array column
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

-- Query array containment
SELECT * FROM posts WHERE tags && ARRAY['database', 'performance'];

GiST (Generalized Search Tree) Indexes

GiST indexes support complex data types like geometric shapes, ranges, and spatial data. They enable efficient spatial queries and nearest-neighbor searches.

Key Characteristics

  • Supports geometric and range data types
  • Enables spatial queries (contains, overlaps, nearest-neighbor)
  • Balanced tree with pluggable operators
  • Used for PostGIS-style geospatial queries

Use Cases

  • Geospatial location queries
  • Bounding box searches
  • IP range lookups
  • Temporal range overlaps

Example

-- Create a GiST index for geometric data
CREATE INDEX idx_locations_geo ON locations USING GIST (coordinates);

-- Spatial containment query
SELECT name FROM locations
WHERE coordinates && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0);

-- Create GiST index for IP ranges
CREATE INDEX idx_ip_ranges ON network_blocks USING GIST (ip_range);

-- Query IP containment
SELECT block_name FROM network_blocks WHERE ip_range >> '192.168.1.100';

Vector Indexes (HNSW)

HNSW (Hierarchical Navigable Small World) indexes provide fast approximate nearest-neighbor search for high-dimensional vector data, essential for AI/ML applications.

Key Characteristics

  • Graph-based structure with navigable small-world properties
  • Sub-linear search time for high-dimensional vectors
  • Configurable accuracy vs speed tradeoff
  • Supports cosine, L2, and inner product distance metrics

Configuration Parameters

Parameter Description Default
m Connections per layer 32
ef_construction Build-time beam width 40
ef_search Query-time beam width 10

Use Cases

  • Semantic search
  • Image similarity
  • Recommendation systems
  • RAG (Retrieval-Augmented Generation)

Example

-- Create HNSW index for vector embeddings
CREATE INDEX idx_embeddings_hnsw ON documents USING HNSW (embedding vector_l2_ops)
WITH (m = 32, ef_construction = 64);

-- Similarity search
SELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

Vector Indexes (IVF)

IVF (Inverted File) indexes partition vectors into clusters for efficient large-scale similarity search. They offer a good balance between speed and memory usage.

Key Characteristics

  • Cluster-based partitioning using k-means
  • Searches only nearby clusters for efficiency
  • Configurable number of clusters (nlist) and probes (nprobe)
  • Better for very large vector datasets than HNSW

Configuration Parameters

Parameter Description Default
nlist Number of clusters 100
nprobe Clusters to search 10

Use Cases

  • Large-scale vector search (millions of vectors)
  • Memory-constrained environments
  • Batch similarity processing

Example

-- Create IVF index with product quantization
CREATE INDEX idx_images_ivf ON images USING IVFPQ (feature_vector)
WITH (nlist = 1000, nprobe = 20);

-- Adjust probe count for accuracy
SET ivfpq.nprobe = 50;

-- Similarity search
SELECT id, image_url
FROM images
ORDER BY feature_vector <-> query_vector
LIMIT 20;

Full-Text Search Indexes

Full-text search indexes enable efficient text search with support for stemming, stop words, ranking, and phrase matching.

Key Characteristics

  • Tokenization and stemming for natural language processing
  • Stop word filtering to reduce noise
  • Relevance ranking with ts_rank functions
  • Boolean and phrase queries

Use Cases

  • Document search
  • Product catalog search
  • Knowledge base queries
  • Log analysis

Example

-- Create full-text index
CREATE INDEX idx_articles_fts ON articles USING FULLTEXT (title, content);

-- Basic search with ranking
SELECT id, title, ts_rank(document, query) AS relevance
FROM articles,
     websearch_to_tsquery('english', 'database performance') AS query
WHERE document @@ query
ORDER BY relevance DESC
LIMIT 10;

-- Phrase search
SELECT * FROM articles
WHERE content @@ phraseto_tsquery('english', 'query optimization');

Partial Indexes

Partial indexes index only a subset of rows matching a WHERE condition, reducing index size and improving performance for filtered queries.

Key Characteristics

  • Smaller index size than full indexes
  • Faster maintenance for INSERT/UPDATE/DELETE
  • Optimized for specific query patterns

Example

-- Index only active orders
CREATE INDEX idx_orders_active ON orders (customer_id, order_date)
WHERE status = 'active';

-- Index only recent data
CREATE INDEX idx_logs_recent ON logs (created_at, severity)
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';

Expression Indexes

Expression indexes index the result of an expression or function, enabling efficient queries on computed values.

Key Characteristics

  • Indexes computed values rather than raw column data
  • Supports functions, operators, and expressions
  • Essential for case-insensitive searches and JSON path queries

Example

-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Query using the expression index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- JSONB path expression index
CREATE INDEX idx_orders_customer_name ON orders ((data->>'customer_name'));

-- Date extraction index
CREATE INDEX idx_events_year_month ON events (EXTRACT(YEAR FROM event_date), EXTRACT(MONTH FROM event_date));

Composite Indexes

Composite indexes index multiple columns together, optimizing queries that filter or sort on multiple columns.

Key Characteristics

  • Column order matters: leftmost prefix is used for lookups
  • Supports multiple access patterns with one index
  • Reduces index count and maintenance overhead

Example

-- Composite index for common query pattern
CREATE INDEX idx_orders_cust_date_status ON orders (customer_id, order_date DESC, status);

-- This index supports:
-- WHERE customer_id = ?
-- WHERE customer_id = ? AND order_date = ?
-- WHERE customer_id = ? AND order_date BETWEEN ? AND ?
-- ORDER BY customer_id, order_date DESC


Performance Considerations

Index Selection Guidelines

  1. Start with B-tree for general-purpose indexing
  2. Use Hash only for pure equality lookups with no range needs
  3. Choose LSM-Tree for write-heavy workloads with eventual reads
  4. Select HNSW for vector search with low latency requirements
  5. Pick IVF for large-scale vector search with memory constraints
  6. Apply GIN for JSONB and array columns
  7. Use GiST for spatial and range data types

Index Maintenance

  • Monitor index bloat with heliosdb_index_stats view
  • Rebuild fragmented indexes using online rebuild for zero downtime
  • Update statistics regularly with ANALYZE
  • Remove unused indexes to reduce write overhead

Summary

HeliosDB's indexing system provides:

  • 9 index types covering all major access patterns
  • Online index rebuild with zero downtime
  • WAL-based durability for crash recovery
  • Autonomous index recommendations via the Index Advisor
  • GPU-accelerated vector search for AI/ML workloads

Choose the right index type for your workload, and leverage HeliosDB's advanced features like partial indexes, expression indexes, and online maintenance for optimal performance.