Skip to content

HeliosDB Indexing User Guide

Version: 7.0 Last Updated: 2026-01-04


Table of Contents

  1. When to Use Which Index Type
  2. Creating Indexes
  3. Managing Indexes
  4. Index Statistics and Analysis
  5. Online Index Rebuild
  6. Index Maintenance Best Practices
  7. Performance Impact of Indexes
  8. Troubleshooting

When to Use Which Index Type

Choosing the right index type is critical for query performance. Use this decision guide to select the optimal index for your use case.

Decision Matrix

Query Pattern Recommended Index Alternative
Equality lookup (=) Hash B-tree
Range query (<, >, BETWEEN) B-tree -
Sorting (ORDER BY) B-tree -
Pattern matching (LIKE 'prefix%') B-tree Full-text
JSONB containment (@>) GIN -
Array operations (&&, @>) GIN -
Full-text search Full-text GIN
Geospatial queries GiST -
IP range lookup GiST -
Vector similarity (k-NN) HNSW IVF
High-write workload LSM-Tree B-tree

Detailed Guidelines

Use B-tree When:

  • You need range queries on numeric or date columns
  • Queries require sorted output
  • The column has high cardinality (many unique values)
  • You need composite indexes on multiple columns

Use Hash When:

  • All queries are exact equality matches
  • Column has high cardinality
  • You want minimal memory footprint
  • Range queries are never needed

Use GIN When:

  • Indexing JSONB documents
  • Indexing array columns
  • Supporting full-text search with tsvector
  • Queries use containment operators (@>, <@, &&)

Use GiST When:

  • Working with geometric or spatial data
  • Implementing nearest-neighbor queries on non-vector data
  • Indexing ranges (tsrange, int4range, etc.)
  • Performing IP address range lookups

Use HNSW When:

  • Vector similarity search is the primary use case
  • Low latency is critical (sub-millisecond)
  • Dataset fits in memory
  • Accuracy requirements allow approximate results

Use IVF When:

  • Vector dataset is very large (millions of vectors)
  • Memory is constrained
  • Batch processing is acceptable
  • Training data is available for clustering

Use LSM-Tree When:

  • Write throughput is the bottleneck
  • Reads are less frequent than writes
  • Data is append-mostly
  • Background compaction is acceptable

Creating Indexes

Basic Syntax

CREATE INDEX [CONCURRENTLY] [IF NOT EXISTS] index_name
ON table_name [USING method]
(column_expression [ASC | DESC] [NULLS {FIRST | LAST}], ...)
[INCLUDE (column, ...)]
[WHERE predicate]
[WITH (storage_parameter = value, ...)];

B-tree Index Examples

-- Simple single-column index
CREATE INDEX idx_users_email ON users (email);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_cust_date ON orders (customer_id, order_date DESC);

-- Index with INCLUDE for covering queries
CREATE INDEX idx_products_sku ON products (sku) INCLUDE (name, price);

-- Concurrent index creation (non-blocking)
CREATE INDEX CONCURRENTLY idx_logs_timestamp ON logs (created_at);

Hash Index Examples

-- Hash index for exact lookups
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);

-- Hash index on UUID column
CREATE INDEX idx_events_trace_id ON events USING HASH (trace_id);

GIN Index Examples

-- GIN index on JSONB column
CREATE INDEX idx_configs_data ON configurations USING GIN (config_data);

-- GIN index on array column
CREATE INDEX idx_articles_tags ON articles USING GIN (tags);

-- GIN index for trigram similarity (requires pg_trgm)
CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);

GiST Index Examples

-- GiST index for geometric data
CREATE INDEX idx_stores_location ON stores USING GIST (location);

-- GiST index for IP ranges
CREATE INDEX idx_access_list_ip ON access_list USING GIST (ip_range inet_ops);

-- GiST index for timestamp ranges
CREATE INDEX idx_reservations_period ON reservations USING GIST (reservation_period);

Vector Index Examples

-- HNSW index for L2 distance
CREATE INDEX idx_docs_embedding_hnsw ON documents
USING HNSW (embedding vector_l2_ops)
WITH (m = 32, ef_construction = 64);

-- HNSW index for cosine similarity
CREATE INDEX idx_images_features ON images
USING HNSW (features vector_cosine_ops)
WITH (m = 48, ef_construction = 100);

-- IVF index for large-scale search
CREATE INDEX idx_products_vectors_ivf ON products
USING IVFFLAT (embedding vector_l2_ops)
WITH (lists = 1000);

Full-Text Index Examples

-- Full-text index on single column
CREATE INDEX idx_articles_body_fts ON articles USING FULLTEXT (body);

-- Full-text index on multiple columns
CREATE INDEX idx_docs_fts ON documents
USING FULLTEXT (title, abstract, content)
WITH (language = 'english');

Partial Index Examples

-- Index only active records
CREATE INDEX idx_users_active_email ON users (email)
WHERE is_active = true;

-- Index only recent orders
CREATE INDEX idx_orders_pending ON orders (customer_id, order_date)
WHERE status IN ('pending', 'processing');

-- Index only non-null values
CREATE INDEX idx_profiles_phone ON profiles (phone)
WHERE phone IS NOT NULL;

Expression Index Examples

-- Case-insensitive email search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- JSONB path extraction
CREATE INDEX idx_events_user_id ON events ((payload->>'user_id'));

-- Date part extraction
CREATE INDEX idx_orders_year_month ON orders (
    EXTRACT(YEAR FROM order_date),
    EXTRACT(MONTH FROM order_date)
);

-- Computed column
CREATE INDEX idx_orders_total ON orders ((quantity * unit_price));

Managing Indexes

Viewing Existing Indexes

-- List all indexes on a table
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';

-- View index size and usage statistics
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
    idx_scan AS scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE idx_scan = 0
  AND NOT indisunique
ORDER BY pg_relation_size(i.indexrelid) DESC;

Modifying Indexes

-- Rename an index
ALTER INDEX idx_old_name RENAME TO idx_new_name;

-- Move index to different tablespace
ALTER INDEX idx_orders_date SET TABLESPACE fast_ssd;

-- Change storage parameters
ALTER INDEX idx_large_table SET (fillfactor = 90);

Dropping Indexes

-- Drop a single index
DROP INDEX idx_users_email;

-- Drop if exists (no error if missing)
DROP INDEX IF EXISTS idx_temporary;

-- Drop concurrently (non-blocking)
DROP INDEX CONCURRENTLY idx_old_index;

-- Drop multiple indexes
DROP INDEX idx_a, idx_b, idx_c;

Reindexing

-- Rebuild a single index
REINDEX INDEX idx_orders_date;

-- Rebuild all indexes on a table
REINDEX TABLE orders;

-- Rebuild all indexes in a schema
REINDEX SCHEMA public;

-- Concurrent reindex (non-blocking)
REINDEX INDEX CONCURRENTLY idx_users_email;

Index Statistics and Analysis

Gathering Statistics

-- Analyze a single table
ANALYZE orders;

-- Analyze specific columns
ANALYZE orders (customer_id, order_date);

-- Verbose analyze with output
ANALYZE VERBOSE orders;

-- Analyze all tables in schema
ANALYZE;

Viewing Index Statistics

-- Index usage statistics
SELECT
    indexrelname AS index_name,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;

-- Index size information
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(quote_ident(indexname)::regclass)) AS size,
    pg_relation_size(quote_ident(indexname)::regclass) AS size_bytes
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(quote_ident(indexname)::regclass) DESC;

-- Index bloat estimation
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    ROUND(100.0 * pg_relation_size(indexrelid) /
          NULLIF(pg_relation_size(relid), 0), 1) AS index_to_table_ratio
FROM pg_stat_user_indexes
JOIN pg_index ON indexrelid = pg_index.indexrelid
ORDER BY pg_relation_size(indexrelid) DESC;

Using EXPLAIN to Verify Index Usage

-- Basic explain
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;

-- Explain with actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;

-- Detailed explain with buffers
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = 12345
  AND order_date > '2025-01-01';

-- JSON format for programmatic analysis
EXPLAIN (FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 12345;

Understanding EXPLAIN Output

-- Example output interpretation
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 12345 AND status = 'active';

/*
Output:
Index Scan using idx_orders_customer on orders  (cost=0.43..8.45 rows=1 width=120)
  Index Cond: (customer_id = 12345)
  Filter: (status = 'active')
  Rows Removed by Filter: 3
  Actual time: 0.021..0.025 ms
  Buffers: shared hit=4

Key metrics:
- "Index Scan" = Index is being used
- "cost=0.43..8.45" = Startup cost..total cost
- "Actual time: 0.021..0.025 ms" = Real execution time
- "Buffers: shared hit=4" = 4 buffer pages read from cache
*/

Online Index Rebuild

HeliosDB supports online index rebuilding with zero downtime using a shadow index approach.

How It Works

  1. Shadow Index Creation: A new index is built with a _rebuild suffix
  2. Background Building: Index is populated while original handles queries
  3. Incremental Sync: Changes during rebuild are captured and applied
  4. Atomic Cutover: Near-instant swap to new index (<50ms)
  5. Cleanup: Old index is dropped

Manual Online Rebuild

-- Initiate online rebuild
REBUILD INDEX idx_orders_date ONLINE;

-- Check rebuild progress
SELECT * FROM heliosdb_index_rebuild_progress
WHERE index_name = 'idx_orders_date';

-- Cancel a rebuild
CANCEL REBUILD INDEX idx_orders_date;

-- View rebuild statistics
SELECT * FROM heliosdb_rebuild_stats;

Monitoring Rebuild Progress

-- Real-time progress monitoring
SELECT
    index_name,
    shadow_name,
    status,
    percent_complete,
    rows_processed,
    total_rows,
    incremental_changes,
    estimated_completion,
    current_phase
FROM heliosdb_index_rebuild_progress;

-- Example output:
-- index_name      | idx_orders_date
-- shadow_name     | idx_orders_date_rebuild
-- status          | Building
-- percent_complete| 45.5
-- rows_processed  | 4,550,000
-- total_rows      | 10,000,000
-- incremental_changes | 1,234
-- estimated_completion| 2026-01-04 15:32:00
-- current_phase   | Phase 2: Background building

Autonomous Rebuild

Configure automatic fragmentation detection and rebuild:

-- Enable autonomous rebuild
ALTER SYSTEM SET heliosdb.auto_rebuild_enabled = true;
ALTER SYSTEM SET heliosdb.fragmentation_threshold = 30;  -- 30%
ALTER SYSTEM SET heliosdb.rebuild_check_interval = '1 hour';
ALTER SYSTEM SET heliosdb.max_concurrent_rebuilds = 2;

-- Reload configuration
SELECT pg_reload_conf();

Index Maintenance Best Practices

1. Analyze Regularly

-- After bulk data loads
ANALYZE large_table;

-- Schedule periodic analysis (via cron or pg_cron)
SELECT cron.schedule('analyze-orders', '0 2 * * *', 'ANALYZE orders');

2. Monitor Bloat and Fragmentation

-- Create a monitoring view
CREATE VIEW index_health AS
SELECT
    schemaname,
    tablename,
    indexrelname AS index_name,
    idx_scan AS scans,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    CASE
        WHEN idx_scan = 0 THEN 'UNUSED'
        WHEN idx_scan < 100 THEN 'LOW_USAGE'
        ELSE 'ACTIVE'
    END AS status
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

3. Remove Unused Indexes

-- Identify candidates for removal
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(indexrelid)) AS wasted_space
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE idx_scan = 0
  AND NOT indisunique
  AND NOT indisprimary
  AND pg_relation_size(indexrelid) > 10 * 1024 * 1024;  -- > 10MB

4. Use Partial Indexes Where Appropriate

-- Replace full index with partial index
DROP INDEX idx_orders_status;

CREATE INDEX idx_orders_pending ON orders (customer_id, order_date)
WHERE status = 'pending';  -- Most queries filter for pending

5. Optimize Composite Index Column Order

-- Wrong: Low cardinality column first
CREATE INDEX idx_bad ON orders (status, customer_id);

-- Right: High cardinality column first
CREATE INDEX idx_good ON orders (customer_id, status);

6. Use INCLUDE for Covering Indexes

-- Avoid table lookups with INCLUDE
CREATE INDEX idx_orders_covering ON orders (customer_id)
INCLUDE (order_date, total_amount, status);

-- Query uses index-only scan
SELECT order_date, total_amount, status
FROM orders
WHERE customer_id = 12345;

Performance Impact of Indexes

Write Overhead

Every index adds overhead to INSERT, UPDATE, and DELETE operations:

Index Type INSERT Overhead UPDATE Overhead DELETE Overhead
B-tree Low-Medium Medium Low
Hash Very Low Low Very Low
GIN High High Medium
GiST Medium Medium Medium
HNSW High High Medium
Full-text High High High

Guidelines for Index Count

-- Check index count per table
SELECT
    relname AS table_name,
    COUNT(indexrelid) AS index_count
FROM pg_stat_user_tables
LEFT JOIN pg_index ON indrelid = relid
GROUP BY relname
HAVING COUNT(indexrelid) > 5
ORDER BY index_count DESC;

-- Recommendation: Generally 3-8 indexes per table is optimal
-- More than 10 indexes may indicate over-indexing

Memory Considerations

-- Total index memory usage
SELECT
    pg_size_pretty(SUM(pg_relation_size(indexrelid))) AS total_index_size
FROM pg_stat_user_indexes;

-- Ensure indexes fit in shared_buffers for optimal performance
SHOW shared_buffers;

Troubleshooting

Index Not Being Used

Symptom: EXPLAIN shows Sequential Scan instead of Index Scan

Solutions:

-- 1. Check if statistics are current
ANALYZE problematic_table;

-- 2. Check if planner cost settings are appropriate
SET random_page_cost = 1.1;  -- For SSDs
SET seq_page_cost = 1.0;

-- 3. Force index usage (for testing only)
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT ...;
SET enable_seqscan = on;

-- 4. Check selectivity
SELECT
    n_distinct,
    correlation,
    null_frac
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

Index Scan Is Slower Than Expected

Symptom: Index scan takes longer than anticipated

Solutions:

-- 1. Check for index bloat
SELECT
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_slow';

-- 2. Rebuild if bloated
REINDEX INDEX CONCURRENTLY idx_slow;

-- 3. Check for lock contention
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';

High Write Latency

Symptom: INSERT/UPDATE operations are slow

Solutions:

-- 1. Identify expensive indexes
SELECT
    indexrelname,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_tup_read DESC;

-- 2. Remove unused indexes
DROP INDEX idx_never_used;

-- 3. Consider partial indexes
CREATE INDEX idx_partial ON table (col) WHERE condition;

Expression Index Not Used

Symptom: Index on expression is ignored

Solutions:

-- Expression must match exactly
-- Wrong query (won't use LOWER index):
SELECT * FROM users WHERE lower(email) = 'test@example.com';

-- Right query (uses LOWER index):
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Verify expression index definition
SELECT indexdef FROM pg_indexes WHERE indexname = 'idx_users_email_lower';


Summary

Effective index management is crucial for database performance. Key takeaways:

  1. Choose the right index type for your query patterns
  2. Create indexes strategically - not too few, not too many
  3. Monitor index usage and remove unused indexes
  4. Maintain statistics with regular ANALYZE
  5. Use online rebuild for zero-downtime maintenance
  6. Verify with EXPLAIN that indexes are being used

Following these practices will ensure optimal query performance while minimizing write overhead and storage costs.