Query Optimizer Quick Start Guide¶
Version: 7.0 Status: Production Ready Last Updated: 2026-01-04
Introduction¶
This guide helps you quickly understand and use HeliosDB's query optimizer. You will learn how to analyze query plans, identify performance issues, and apply basic optimization techniques.
Using EXPLAIN ANALYZE¶
The most important tool for understanding query performance is EXPLAIN ANALYZE.
Basic EXPLAIN¶
-- See the query plan without executing
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
/*
Output:
Index Scan using idx_orders_customer on orders
(cost=0.43..8.45 rows=10 width=120)
Index Cond: (customer_id = 123)
Interpretation:
- Uses index scan (good!)
- cost=0.43..8.45: startup..total cost
- rows=10: estimated matching rows
- width=120: average row size in bytes
*/
EXPLAIN ANALYZE (with actual execution)¶
-- Execute the query and show actual vs estimated
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
/*
Output:
Index Scan using idx_orders_customer on orders
(cost=0.43..8.45 rows=10 width=120)
(actual time=0.025..0.032 ms rows=8 loops=1)
Index Cond: (customer_id = 123)
Planning Time: 0.15 ms
Execution Time: 0.08 ms
Key Metrics:
- actual time: 0.025..0.032 ms (startup..total)
- rows=8: actual rows returned (vs estimated 10)
- loops=1: number of executions
*/
EXPLAIN with All Options¶
-- Maximum detail
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;
/*
Output:
Index Scan using idx_orders_customer on public.orders o
(cost=0.43..8.45 rows=10 width=120)
(actual time=0.025..0.032 ms rows=8 loops=1)
Output: id, customer_id, order_date, amount, status
Index Cond: (o.customer_id = 123)
Buffers: shared hit=3
Planning:
Buffers: shared hit=10
Planning Time: 0.15 ms
Execution Time: 0.08 ms
Buffer Analysis:
- shared hit=3: 3 pages read from cache (fast)
- shared read=0: 0 pages read from disk (good)
*/
Reading Query Plans¶
Plan Node Types¶
| Node Type | Description | Performance |
|---|---|---|
| Seq Scan | Full table scan | Slow for large tables |
| Index Scan | B-tree index lookup + table fetch | Fast for selective queries |
| Index Only Scan | All data from index | Fastest (no table access) |
| Bitmap Index Scan | Multiple index merge | Good for OR conditions |
| Hash Join | Hash table join | Fast for large joins |
| Merge Join | Sorted merge join | Fast for pre-sorted data |
| Nested Loop | Iterative lookup join | Fast for small outer tables |
Cost Interpretation¶
cost=startup..total
Ranges:
0-100: Very fast
100-1000: Fast
1000-10000: Moderate
10000+: Slow (needs optimization)
Example: Comparing Good vs Bad Plans¶
Bad Plan (Sequential Scan):
-- Missing index on customer_id
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
/*
Seq Scan on orders
(cost=0.00..25000.00 rows=10 width=120)
(actual time=50.5..245.8 ms rows=8 loops=1)
Filter: (customer_id = 123)
Rows Removed by Filter: 999992
Problems:
- Seq Scan: Reading entire table
- 245.8 ms: Very slow
- Rows Removed: 999992 (scanning 1M rows for 8 results!)
*/
Good Plan (Index Scan):
-- With index on customer_id
CREATE INDEX idx_orders_customer ON orders(customer_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
/*
Index Scan using idx_orders_customer on orders
(cost=0.43..8.45 rows=10 width=120)
(actual time=0.025..0.032 ms rows=8 loops=1)
Index Cond: (customer_id = 123)
Improvements:
- Index Scan: Direct lookup
- 0.032 ms: 7,681x faster!
- No rows filtered (only reads needed rows)
*/
Basic Optimization Techniques¶
1. Create Indexes for WHERE Clauses¶
-- Identify slow query
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'john@example.com';
-- Seq Scan (cost=5000.00, time=150ms)
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- Verify improvement
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'john@example.com';
-- Index Scan (cost=8.45, time=0.05ms)
2. Use Composite Indexes for Multiple Conditions¶
-- Query with multiple conditions
SELECT * FROM orders
WHERE customer_id = 123
AND order_date > '2025-01-01';
-- Create composite index (columns in order of selectivity)
CREATE INDEX idx_orders_cust_date ON orders(customer_id, order_date);
-- Verify both conditions use index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123
AND order_date > '2025-01-01';
/*
Index Scan using idx_orders_cust_date on orders
Index Cond: ((customer_id = 123) AND (order_date > '2025-01-01'))
*/
3. Avoid Functions on Indexed Columns¶
-- BAD: Function prevents index use
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Seq Scan (function applied to every row)
-- GOOD: Store normalized data
SELECT * FROM users WHERE email_lower = 'john@example.com';
-- Index Scan
-- OR: Create functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Index Scan
4. Use LIMIT for Large Result Sets¶
-- BAD: Returns 1 million rows
SELECT * FROM logs WHERE level = 'ERROR';
-- Slow, memory intensive
-- GOOD: Limit results
SELECT * FROM logs
WHERE level = 'ERROR'
ORDER BY timestamp DESC
LIMIT 100;
-- Fast, returns only recent errors
5. Update Statistics Regularly¶
-- Check when table was last analyzed
SELECT
relname,
last_analyze,
last_autoanalyze
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Update statistics if stale
ANALYZE orders;
-- Verify improved estimates
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
Common Performance Patterns¶
Pattern 1: Index-Only Scan (Fastest)¶
-- Create covering index
CREATE INDEX idx_orders_covering
ON orders(customer_id) INCLUDE (order_date, amount);
-- Query uses only indexed columns
SELECT customer_id, order_date, amount
FROM orders WHERE customer_id = 123;
/*
Index Only Scan using idx_orders_covering
Heap Fetches: 0 <-- No table access!
*/
Pattern 2: Bitmap Index Scan (Multiple Conditions)¶
-- Query with OR condition
SELECT * FROM products
WHERE category = 'Electronics' OR price < 10;
/*
Bitmap Heap Scan on products
Recheck Cond: ((category = 'Electronics') OR (price < 10))
-> BitmapOr
-> Bitmap Index Scan on idx_products_category
-> Bitmap Index Scan on idx_products_price
Explanation: Combines results from two index scans
*/
Pattern 3: Efficient Join (Hash Join)¶
-- Join two tables
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
/*
Hash Join
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o
-> Hash
-> Seq Scan on customers c
Explanation: Builds hash table from smaller table (customers),
probes with larger table (orders). O(n+m) complexity.
*/
Pattern 4: Parallel Query (Large Tables)¶
-- Aggregate query on large table
SELECT region, SUM(amount)
FROM sales
GROUP BY region;
/*
Finalize GroupAggregate
-> Gather Merge
Workers Planned: 4
-> Partial GroupAggregate
-> Parallel Seq Scan on sales
Explanation: Uses 4 parallel workers for 4x speedup
*/
Quick Optimization Checklist¶
Before Running a Query¶
- Does the WHERE clause have an index?
- Are JOIN columns indexed?
- Are statistics up to date?
- Is the result set limited?
After EXPLAIN ANALYZE¶
Check for these warning signs:
| Warning Sign | Meaning | Solution |
|---|---|---|
| Seq Scan on large table | No useful index | Create index |
| Rows Removed by Filter (high) | Low selectivity | More selective predicate |
| actual rows >> estimated | Stale statistics | Run ANALYZE |
| Nested Loop (large outer) | Wrong join type | Check memory, indexes |
| Sort (disk) | Insufficient memory | Increase work_mem |
Quick Fixes¶
-- 1. Missing index
CREATE INDEX idx_table_column ON table(column);
-- 2. Stale statistics
ANALYZE table_name;
-- 3. Need more sort/hash memory
SET work_mem = '256MB';
-- 4. Function on indexed column
CREATE INDEX idx_func ON table(FUNCTION(column));
-- 5. Low selectivity OR condition
-- Rewrite as UNION ALL
SELECT * FROM t WHERE a = 1
UNION ALL
SELECT * FROM t WHERE b = 2;
EXPLAIN Output Formats¶
Text (Default)¶
JSON (For Tools)¶
YAML (Configuration-Friendly)¶
AI Mode (Natural Language)¶
EXPLAIN (AI) SELECT * FROM orders WHERE customer_id = 123;
/*
This query searches the orders table for a specific customer.
It uses an index on customer_id, making the lookup very fast.
Performance: FAST (estimated 0.1ms)
No optimization suggestions - query is already optimal.
*/
Common Questions¶
Q: Why isn't my index being used?¶
EXPLAIN ANALYZE (WHY_NOT)
SELECT * FROM users WHERE age > 20;
/*
WHY_NOT Analysis:
Index idx_users_age exists but not used because:
- Selectivity: 80% (too many rows match)
- Sequential scan cost: 1000
- Index scan cost: 3200 (3.2x more expensive)
Suggestion: Index scans are efficient when <20% of rows match.
This query returns 80% of rows, so sequential scan is faster.
*/
Q: How do I optimize a slow JOIN?¶
-- Check if join columns are indexed
\d orders
-- Look for idx_orders_customer_id
-- If missing, create index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Ensure both tables have fresh statistics
ANALYZE orders;
ANALYZE customers;
Q: How do I know if my query is parallel?¶
EXPLAIN ANALYZE SELECT SUM(amount) FROM sales;
/*
Look for:
Gather or Gather Merge
Workers Planned: 4
Workers Launched: 4
Parallel Seq Scan
If missing, check:
SHOW max_parallel_workers_per_gather;
SHOW min_parallel_table_scan_size;
*/
Next Steps¶
- Read: User Guide for advanced techniques
- Study: Troubleshooting Guide for common issues
- Explore: Query Optimizer Overview for architecture details
- Practice: Run EXPLAIN ANALYZE on your own queries
Quick Reference¶
Essential Commands¶
-- Basic plan
EXPLAIN SELECT ...;
-- With execution stats
EXPLAIN ANALYZE SELECT ...;
-- Full detail
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ...;
-- AI explanation
EXPLAIN (AI) SELECT ...;
-- Why not analysis
EXPLAIN ANALYZE (WHY_NOT) SELECT ...;
-- Update statistics
ANALYZE table_name;
-- View index usage
SELECT * FROM pg_stat_user_indexes;
Cost Thresholds¶
| Cost Range | Performance | Action |
|---|---|---|
| < 100 | Excellent | None needed |
| 100-1000 | Good | Monitor |
| 1000-10000 | Moderate | Consider optimization |
| > 10000 | Poor | Optimize immediately |
Query Optimizer Quick Start - Master query performance in minutes.