HeliosDB Indexing User Guide¶

Version: 7.0 Last Updated: 2026-01-04

Table of Contents¶

When to Use Which Index Type
Creating Indexes
Managing Indexes
Index Statistics and Analysis
Online Index Rebuild
Index Maintenance Best Practices
Performance Impact of Indexes
Troubleshooting

When to Use Which Index Type¶

Choosing the right index type is critical for query performance. Use this decision guide to select the optimal index for your use case.

Decision Matrix¶

Query Pattern	Recommended Index	Alternative
Equality lookup (`=`)	Hash	B-tree
Range query (`<`, `>`, `BETWEEN`)	B-tree	-
Sorting (`ORDER BY`)	B-tree	-
Pattern matching (`LIKE 'prefix%'`)	B-tree	Full-text
JSONB containment (`@>`)	GIN	-
Array operations (`&&`, `@>`)	GIN	-
Full-text search	Full-text	GIN
Geospatial queries	GiST	-
IP range lookup	GiST	-
Vector similarity (k-NN)	HNSW	IVF
High-write workload	LSM-Tree	B-tree

Detailed Guidelines¶

Use B-tree When:¶

You need range queries on numeric or date columns
Queries require sorted output
The column has high cardinality (many unique values)
You need composite indexes on multiple columns

Use Hash When:¶

All queries are exact equality matches
Column has high cardinality
You want minimal memory footprint
Range queries are never needed

Use GIN When:¶

Indexing JSONB documents
Indexing array columns
Supporting full-text search with tsvector
Queries use containment operators (@>, <@, &&)

Use GiST When:¶

Working with geometric or spatial data
Implementing nearest-neighbor queries on non-vector data
Indexing ranges (tsrange, int4range, etc.)
Performing IP address range lookups

Use HNSW When:¶

Vector similarity search is the primary use case
Low latency is critical (sub-millisecond)
Dataset fits in memory
Accuracy requirements allow approximate results

Use IVF When:¶

Vector dataset is very large (millions of vectors)
Memory is constrained
Batch processing is acceptable
Training data is available for clustering

Use LSM-Tree When:¶

Write throughput is the bottleneck
Reads are less frequent than writes
Data is append-mostly
Background compaction is acceptable

Creating Indexes¶

Basic Syntax¶

CREATE INDEX [CONCURRENTLY] [IF NOT EXISTS] index_name
ON table_name [USING method]
(column_expression [ASC | DESC] [NULLS {FIRST | LAST}], ...)
[INCLUDE (column, ...)]
[WHERE predicate]
[WITH (storage_parameter = value, ...)];

B-tree Index Examples¶

-- Simple single-column index
CREATE INDEX idx_users_email ON users (email);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_cust_date ON orders (customer_id, order_date DESC);

-- Index with INCLUDE for covering queries
CREATE INDEX idx_products_sku ON products (sku) INCLUDE (name, price);

-- Concurrent index creation (non-blocking)
CREATE INDEX CONCURRENTLY idx_logs_timestamp ON logs (created_at);

Hash Index Examples¶

-- Hash index for exact lookups
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);

-- Hash index on UUID column
CREATE INDEX idx_events_trace_id ON events USING HASH (trace_id);

GIN Index Examples¶

-- GIN index on JSONB column
CREATE INDEX idx_configs_data ON configurations USING GIN (config_data);

-- GIN index on array column
CREATE INDEX idx_articles_tags ON articles USING GIN (tags);

-- GIN index for trigram similarity (requires pg_trgm)
CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);

GiST Index Examples¶

-- GiST index for geometric data
CREATE INDEX idx_stores_location ON stores USING GIST (location);

-- GiST index for IP ranges
CREATE INDEX idx_access_list_ip ON access_list USING GIST (ip_range inet_ops);

-- GiST index for timestamp ranges
CREATE INDEX idx_reservations_period ON reservations USING GIST (reservation_period);

Vector Index Examples¶

-- HNSW index for L2 distance
CREATE INDEX idx_docs_embedding_hnsw ON documents
USING HNSW (embedding vector_l2_ops)
WITH (m = 32, ef_construction = 64);

-- HNSW index for cosine similarity
CREATE INDEX idx_images_features ON images
USING HNSW (features vector_cosine_ops)
WITH (m = 48, ef_construction = 100);

-- IVF index for large-scale search
CREATE INDEX idx_products_vectors_ivf ON products
USING IVFFLAT (embedding vector_l2_ops)
WITH (lists = 1000);

Full-Text Index Examples¶

-- Full-text index on single column
CREATE INDEX idx_articles_body_fts ON articles USING FULLTEXT (body);

-- Full-text index on multiple columns
CREATE INDEX idx_docs_fts ON documents
USING FULLTEXT (title, abstract, content)
WITH (language = 'english');

Partial Index Examples¶

-- Index only active records
CREATE INDEX idx_users_active_email ON users (email)
WHERE is_active = true;

-- Index only recent orders
CREATE INDEX idx_orders_pending ON orders (customer_id, order_date)
WHERE status IN ('pending', 'processing');

-- Index only non-null values
CREATE INDEX idx_profiles_phone ON profiles (phone)
WHERE phone IS NOT NULL;

Expression Index Examples¶

-- Case-insensitive email search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- JSONB path extraction
CREATE INDEX idx_events_user_id ON events ((payload->>'user_id'));

-- Date part extraction
CREATE INDEX idx_orders_year_month ON orders (
    EXTRACT(YEAR FROM order_date),
    EXTRACT(MONTH FROM order_date)
);

-- Computed column
CREATE INDEX idx_orders_total ON orders ((quantity * unit_price));

Managing Indexes¶

Viewing Existing Indexes¶

-- List all indexes on a table
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';

-- View index size and usage statistics
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
    idx_scan AS scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE idx_scan = 0
  AND NOT indisunique
ORDER BY pg_relation_size(i.indexrelid) DESC;

Modifying Indexes¶

-- Rename an index
ALTER INDEX idx_old_name RENAME TO idx_new_name;

-- Move index to different tablespace
ALTER INDEX idx_orders_date SET TABLESPACE fast_ssd;

-- Change storage parameters
ALTER INDEX idx_large_table SET (fillfactor = 90);

Dropping Indexes¶

-- Drop a single index
DROP INDEX idx_users_email;

-- Drop if exists (no error if missing)
DROP INDEX IF EXISTS idx_temporary;

-- Drop concurrently (non-blocking)
DROP INDEX CONCURRENTLY idx_old_index;

-- Drop multiple indexes
DROP INDEX idx_a, idx_b, idx_c;

Reindexing¶

-- Rebuild a single index
REINDEX INDEX idx_orders_date;

-- Rebuild all indexes on a table
REINDEX TABLE orders;

-- Rebuild all indexes in a schema
REINDEX SCHEMA public;

-- Concurrent reindex (non-blocking)
REINDEX INDEX CONCURRENTLY idx_users_email;

Index Statistics and Analysis¶

Gathering Statistics¶

-- Analyze a single table
ANALYZE orders;

-- Analyze specific columns
ANALYZE orders (customer_id, order_date);

-- Verbose analyze with output
ANALYZE VERBOSE orders;

-- Analyze all tables in schema
ANALYZE;

Viewing Index Statistics¶

-- Index usage statistics
SELECT
    indexrelname AS index_name,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;

-- Index size information
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(quote_ident(indexname)::regclass)) AS size,
    pg_relation_size(quote_ident(indexname)::regclass) AS size_bytes
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(quote_ident(indexname)::regclass) DESC;

-- Index bloat estimation
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    ROUND(100.0 * pg_relation_size(indexrelid) /
          NULLIF(pg_relation_size(relid), 0), 1) AS index_to_table_ratio
FROM pg_stat_user_indexes
JOIN pg_index ON indexrelid = pg_index.indexrelid
ORDER BY pg_relation_size(indexrelid) DESC;

Using EXPLAIN to Verify Index Usage¶

-- Basic explain
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;

-- Explain with actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;

-- Detailed explain with buffers
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = 12345
  AND order_date > '2025-01-01';

-- JSON format for programmatic analysis
EXPLAIN (FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 12345;

Understanding EXPLAIN Output¶

-- Example output interpretation
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 12345 AND status = 'active';

/*
Output:
Index Scan using idx_orders_customer on orders  (cost=0.43..8.45 rows=1 width=120)
  Index Cond: (customer_id = 12345)
  Filter: (status = 'active')
  Rows Removed by Filter: 3
  Actual time: 0.021..0.025 ms
  Buffers: shared hit=4

Key metrics:
- "Index Scan" = Index is being used
- "cost=0.43..8.45" = Startup cost..total cost
- "Actual time: 0.021..0.025 ms" = Real execution time
- "Buffers: shared hit=4" = 4 buffer pages read from cache
*/

Online Index Rebuild¶

HeliosDB supports online index rebuilding with zero downtime using a shadow index approach.

How It Works¶

Shadow Index Creation: A new index is built with a _rebuild suffix
Background Building: Index is populated while original handles queries
Incremental Sync: Changes during rebuild are captured and applied
Atomic Cutover: Near-instant swap to new index (<50ms)
Cleanup: Old index is dropped

Manual Online Rebuild¶

-- Initiate online rebuild
REBUILD INDEX idx_orders_date ONLINE;

-- Check rebuild progress
SELECT * FROM heliosdb_index_rebuild_progress
WHERE index_name = 'idx_orders_date';

-- Cancel a rebuild
CANCEL REBUILD INDEX idx_orders_date;

-- View rebuild statistics
SELECT * FROM heliosdb_rebuild_stats;

Monitoring Rebuild Progress¶

-- Real-time progress monitoring
SELECT
    index_name,
    shadow_name,
    status,
    percent_complete,
    rows_processed,
    total_rows,
    incremental_changes,
    estimated_completion,
    current_phase
FROM heliosdb_index_rebuild_progress;

-- Example output:
-- index_name      | idx_orders_date
-- shadow_name     | idx_orders_date_rebuild
-- status          | Building
-- percent_complete| 45.5
-- rows_processed  | 4,550,000
-- total_rows      | 10,000,000
-- incremental_changes | 1,234
-- estimated_completion| 2026-01-04 15:32:00
-- current_phase   | Phase 2: Background building

Autonomous Rebuild¶

Configure automatic fragmentation detection and rebuild:

-- Enable autonomous rebuild
ALTER SYSTEM SET heliosdb.auto_rebuild_enabled = true;
ALTER SYSTEM SET heliosdb.fragmentation_threshold = 30;  -- 30%
ALTER SYSTEM SET heliosdb.rebuild_check_interval = '1 hour';
ALTER SYSTEM SET heliosdb.max_concurrent_rebuilds = 2;

-- Reload configuration
SELECT pg_reload_conf();

Index Maintenance Best Practices¶

1. Analyze Regularly¶

-- After bulk data loads
ANALYZE large_table;

-- Schedule periodic analysis (via cron or pg_cron)
SELECT cron.schedule('analyze-orders', '0 2 * * *', 'ANALYZE orders');

2. Monitor Bloat and Fragmentation¶

-- Create a monitoring view
CREATE VIEW index_health AS
SELECT
    schemaname,
    tablename,
    indexrelname AS index_name,
    idx_scan AS scans,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    CASE
        WHEN idx_scan = 0 THEN 'UNUSED'
        WHEN idx_scan < 100 THEN 'LOW_USAGE'
        ELSE 'ACTIVE'
    END AS status
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

3. Remove Unused Indexes¶

-- Identify candidates for removal
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(indexrelid)) AS wasted_space
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE idx_scan = 0
  AND NOT indisunique
  AND NOT indisprimary
  AND pg_relation_size(indexrelid) > 10 * 1024 * 1024;  -- > 10MB

4. Use Partial Indexes Where Appropriate¶

-- Replace full index with partial index
DROP INDEX idx_orders_status;

CREATE INDEX idx_orders_pending ON orders (customer_id, order_date)
WHERE status = 'pending';  -- Most queries filter for pending

5. Optimize Composite Index Column Order¶

-- Wrong: Low cardinality column first
CREATE INDEX idx_bad ON orders (status, customer_id);

-- Right: High cardinality column first
CREATE INDEX idx_good ON orders (customer_id, status);

6. Use INCLUDE for Covering Indexes¶

-- Avoid table lookups with INCLUDE
CREATE INDEX idx_orders_covering ON orders (customer_id)
INCLUDE (order_date, total_amount, status);

-- Query uses index-only scan
SELECT order_date, total_amount, status
FROM orders
WHERE customer_id = 12345;

Performance Impact of Indexes¶

Write Overhead¶

Every index adds overhead to INSERT, UPDATE, and DELETE operations:

Index Type	INSERT Overhead	UPDATE Overhead	DELETE Overhead
B-tree	Low-Medium	Medium	Low
Hash	Very Low	Low	Very Low
GIN	High	High	Medium
GiST	Medium	Medium	Medium
HNSW	High	High	Medium
Full-text	High	High	High

Guidelines for Index Count¶

-- Check index count per table
SELECT
    relname AS table_name,
    COUNT(indexrelid) AS index_count
FROM pg_stat_user_tables
LEFT JOIN pg_index ON indrelid = relid
GROUP BY relname
HAVING COUNT(indexrelid) > 5
ORDER BY index_count DESC;

-- Recommendation: Generally 3-8 indexes per table is optimal
-- More than 10 indexes may indicate over-indexing

Memory Considerations¶

-- Total index memory usage
SELECT
    pg_size_pretty(SUM(pg_relation_size(indexrelid))) AS total_index_size
FROM pg_stat_user_indexes;

-- Ensure indexes fit in shared_buffers for optimal performance
SHOW shared_buffers;

Troubleshooting¶

Index Not Being Used¶

Symptom: EXPLAIN shows Sequential Scan instead of Index Scan

Solutions:

-- 1. Check if statistics are current
ANALYZE problematic_table;

-- 2. Check if planner cost settings are appropriate
SET random_page_cost = 1.1;  -- For SSDs
SET seq_page_cost = 1.0;

-- 3. Force index usage (for testing only)
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT ...;
SET enable_seqscan = on;

-- 4. Check selectivity
SELECT
    n_distinct,
    correlation,
    null_frac
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

Index Scan Is Slower Than Expected¶

Symptom: Index scan takes longer than anticipated

Solutions:

-- 1. Check for index bloat
SELECT
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_slow';

-- 2. Rebuild if bloated
REINDEX INDEX CONCURRENTLY idx_slow;

-- 3. Check for lock contention
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';

High Write Latency¶

Symptom: INSERT/UPDATE operations are slow

Solutions:

-- 1. Identify expensive indexes
SELECT
    indexrelname,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_tup_read DESC;

-- 2. Remove unused indexes
DROP INDEX idx_never_used;

-- 3. Consider partial indexes
CREATE INDEX idx_partial ON table (col) WHERE condition;

Expression Index Not Used¶

Symptom: Index on expression is ignored

Solutions:

-- Expression must match exactly
-- Wrong query (won't use LOWER index):
SELECT * FROM users WHERE lower(email) = 'test@example.com';

-- Right query (uses LOWER index):
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Verify expression index definition
SELECT indexdef FROM pg_indexes WHERE indexname = 'idx_users_email_lower';

README - Indexing overview
Quick Start - Get started quickly
EXPLAIN Plan Guide - Query plan analysis
Performance Tuning - Overall performance optimization
Full-Text Search - Full-text search tuning

Summary¶

Effective index management is crucial for database performance. Key takeaways:

Choose the right index type for your query patterns
Create indexes strategically - not too few, not too many
Monitor index usage and remove unused indexes
Maintain statistics with regular ANALYZE
Use online rebuild for zero-downtime maintenance
Verify with EXPLAIN that indexes are being used

Following these practices will ensure optimal query performance while minimizing write overhead and storage costs.

HeliosDB Indexing User Guide¶

Table of Contents¶

When to Use Which Index Type¶

Decision Matrix¶

Detailed Guidelines¶

Use B-tree When:¶

Use Hash When:¶

Use GIN When:¶

Use GiST When:¶

Use HNSW When:¶

Use IVF When:¶

Use LSM-Tree When:¶

Creating Indexes¶

Basic Syntax¶

B-tree Index Examples¶

Hash Index Examples¶

GIN Index Examples¶

GiST Index Examples¶

Vector Index Examples¶

Full-Text Index Examples¶

Partial Index Examples¶

Expression Index Examples¶

Managing Indexes¶

Viewing Existing Indexes¶

Modifying Indexes¶

Dropping Indexes¶

Reindexing¶

Index Statistics and Analysis¶

Gathering Statistics¶

Viewing Index Statistics¶

Using EXPLAIN to Verify Index Usage¶

Understanding EXPLAIN Output¶

Online Index Rebuild¶

How It Works¶

Manual Online Rebuild¶

Monitoring Rebuild Progress¶

Autonomous Rebuild¶

Index Maintenance Best Practices¶

1. Analyze Regularly¶

2. Monitor Bloat and Fragmentation¶

3. Remove Unused Indexes¶

4. Use Partial Indexes Where Appropriate¶

5. Optimize Composite Index Column Order¶

6. Use INCLUDE for Covering Indexes¶

Performance Impact of Indexes¶

Write Overhead¶

Guidelines for Index Count¶

Memory Considerations¶

Troubleshooting¶

Index Not Being Used¶

Index Scan Is Slower Than Expected¶

High Write Latency¶

Expression Index Not Used¶

Related Documentation¶

Summary¶