Enhanced EXPLAIN PLAN User Guide¶
Version: 7.0 Last Updated: 2025-11-17 Status: Phase 1 Complete
Table of Contents¶
- Introduction
- Quick Start
- EXPLAIN Basics
- Configuration Tracking
- Feature Detection
- Optimizer Decisions
- Why Not Analysis
- Output Formats
- Query Optimization Workflow
- Best Practices
- Troubleshooting
- Advanced Topics
Introduction¶
HeliosDB's Enhanced EXPLAIN PLAN goes far beyond traditional query plan visualization. It provides:
- Configuration Snapshots: Capture 45+ configuration parameters affecting query planning
- Feature Detection: Automatically detect 8+ active optimization features (partition pruning, SIMD, etc.)
- Optimizer Decision Tracking: See why the optimizer chose specific strategies with cost comparisons
- "Why Not" Analysis: Understand why certain optimizations weren't applied
- Query Rewrite Suggestions: Get actionable recommendations to improve query performance
- Multiple Output Formats: Text, JSON, YAML, XML, HTML, GraphViz, Mermaid
Key Benefits¶
- Deep Visibility: Understand exactly what your queries are doing
- Performance Tuning: Identify bottlenecks and optimization opportunities
- Debugging: Diagnose unexpected query plans
- Education: Learn how the query optimizer works
- Production Monitoring: Track query plan changes over time
Quick Start¶
Basic EXPLAIN¶
-- Simple EXPLAIN
EXPLAIN SELECT * FROM users WHERE customer_id = 123;
-- Output:
EXPLAIN PLAN (Enhanced)
Estimated Cost: 50.00
Estimated Rows: 1
Planning Time: 2.34ms
Active Features (2):
✓ Predicate Pushdown (IMPLICIT)
Trigger: 1 predicates pushed to storage layer
Benefit: Filters applied early, reduces data transfer
✓ Index Scan on table 1 (IMPLICIT)
Trigger: Highly selective predicate on indexed column (0.1% selectivity)
Benefit: 100x faster than sequential scan
Savings: 99.0% (10000.00 -> 100.00)
Execution Plan:
-> IndexScan(table_id=1, predicates=1)
EXPLAIN ANALYZE with "Why Not"¶
Different Output Formats¶
-- JSON format for tools
EXPLAIN (FORMAT JSON) SELECT * FROM users WHERE id = 1;
-- HTML for web viewing
EXPLAIN (FORMAT HTML) SELECT * FROM orders WHERE date > '2024-01-01';
-- GraphViz for diagrams
EXPLAIN (FORMAT GRAPHVIZ) SELECT * FROM complex_join_query;
-- Mermaid for Markdown
EXPLAIN (FORMAT MERMAID) SELECT * FROM sales GROUP BY region;
EXPLAIN Basics¶
Syntax¶
EXPLAIN [ ANALYZE ] [ ( option [, ...] ) ] statement
Options:
FORMAT { TEXT | JSON | YAML | XML | HTML | GRAPHVIZ | MERMAID }
WHY_NOT { TRUE | FALSE }
VERBOSE { TRUE | FALSE }
COSTS { TRUE | FALSE }
BUFFERS { TRUE | FALSE }
TIMING { TRUE | FALSE }
Understanding Output¶
Estimated Cost¶
The cost is an arbitrary unit representing execution time and resource usage:
- Sequential Scan: High cost for large tables (1.0 per page)
- Index Scan: Lower cost for selective queries (0.005 per tuple)
- Hash Join: Depends on hash table size and probe complexity
- Sort: Depends on data volume and available memory
Rule of Thumb: - Cost < 100: Very fast - Cost 100-1000: Fast - Cost 1000-10000: Moderate - Cost > 10000: Slow (consider optimization)
Estimated Rows¶
Number of rows the optimizer expects this operation to return.
Important: If actual rows differ significantly from estimated rows, your statistics may be stale!
Configuration Tracking¶
Every EXPLAIN captures 45+ configuration parameters affecting the plan:
Memory Settings¶
work_mem: 256MB
- Controls memory for sorts and hash tables
- Increase if seeing "disk-based sort" warnings
shared_buffers: 1024MB
- Shared memory cache for data pages
- Typically 25% of system RAM
effective_cache_size: 4096MB
- Estimate of OS cache available
- Used for cost calculations
Join Strategy Toggles¶
enable_hashjoin: true
- Hash joins for equality conditions
- Fast for medium-sized tables
enable_mergejoin: true
- Merge joins for sorted data
- Efficient for pre-sorted inputs
enable_nestloop: true
- Nested loop joins for small tables
- Good when one side is very selective
Scan Strategy Toggles¶
enable_indexscan: true
- Use indexes when beneficial
enable_seqscan: true
- Sequential scans for large portions of table
enable_bitmapscan: true
- Bitmap scans for multiple index conditions
Parallelism Settings¶
max_parallel_workers: 8
- Maximum parallel workers across all queries
max_parallel_workers_per_gather: 4
- Maximum parallel workers for single operation
min_parallel_table_scan_size_mb: 8
- Minimum table size for parallelism
Cost Parameters¶
seq_page_cost: 1.0
- Cost to read one page sequentially
random_page_cost: 4.0
- Cost to read one page randomly
- Higher = optimizer prefers sequential scans
- Lower = optimizer prefers index scans
cpu_tuple_cost: 0.01
- Cost to process one tuple
cpu_operator_cost: 0.0025
- Cost to execute one operator/function
Advanced Features¶
enable_partition_pruning: true
- Skip irrelevant partitions based on predicates
enable_simd: true
- SIMD vectorization for aggregations
enable_jit: false
- JIT compilation for complex queries (experimental)
Feature Detection¶
The optimizer automatically detects and reports active features:
Partition Pruning¶
SELECT * FROM sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-01-31';
-- Feature Detected:
✓ Partition Pruning (IMPLICIT)
Trigger: Range predicates on partitioned table (1 predicates)
Benefit: Skipped 11 of 12 partitions (91.7% I/O savings)
Savings: 91.7% (12000.00 -> 1000.00)
Confidence: 70%
When it Activates: - Range predicates on partition key - Equality predicates matching partition boundaries - IN clauses with partition key values
How to Maximize: - Partition on frequently filtered columns (date, region, category) - Use partition-aligned predicates - Keep partition count reasonable (< 1000)
Predicate Pushdown¶
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'active';
-- Feature Detected:
✓ Predicate Pushdown (IMPLICIT)
Trigger: 2 predicates pushed to storage layer
Benefit: Filters applied early, reduces data transfer
Confidence: 95%
When it Activates: - WHERE clause predicates on base tables - JOIN conditions pushed to table scans - Subquery filters pushed down
Performance Impact: - 10-100x reduction in data transfer - Lower memory usage - Faster query execution
SIMD Vectorization¶
SELECT region, SUM(sales), AVG(price)
FROM sales
GROUP BY region;
-- Feature Detected:
✓ SIMD Vectorization (IMPLICIT)
Trigger: Numeric aggregation operations (SUM/AVG/COUNT)
Benefit: 4x speedup on aggregation operations using AVX2/AVX512
Savings: 75.0% (1000.00 -> 250.00)
Confidence: 90%
When it Activates: - SUM, AVG, COUNT aggregations - Numeric data types (INT, BIGINT, FLOAT, DOUBLE) - SIMD-capable hardware (AVX2/AVX512)
Performance Impact: - 4-8x speedup on supported operations - Automatic - no query changes needed
Index Usage¶
SELECT * FROM users WHERE email = 'test@example.com';
-- Feature Detected:
✓ Index Scan on table users (IMPLICIT)
Trigger: Highly selective predicate on indexed column (0.1% selectivity)
Benefit: 100x faster than sequential scan
Savings: 99.0% (10000.00 -> 100.00)
Confidence: 95%
Types of Index Scans: - Index Scan: Read index + fetch table rows - Index-Only Scan: Read index only (covering index) - Bitmap Index Scan: Multiple index conditions combined
Parallelism¶
SELECT * FROM large_table WHERE category = 'electronics';
-- Feature Detected:
✓ Parallel Sequential Scan (IMPLICIT)
Trigger: Table size (50GB) exceeds min_parallel_table_scan_size
Benefit: 4x speedup with 4 parallel workers
Savings: 75.0% (20000.00 -> 5000.00)
Workers: 4
Confidence: 85%
When it Activates: - Table size > min_parallel_table_scan_size_mb - Query cost > parallel_setup_cost - Sufficient parallel workers available
Optimizer Decisions¶
See exactly why the optimizer chose specific strategies:
Join Method Selection¶
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Optimizer Decision:
1. Join Strategy
CHOSEN: Hash Join (cost: 250.00, rows: 10000)
Reasoning: Medium-sized tables with equality condition.
Hash join is optimal.
REJECTED: Nested Loop Join (cost: 5000.00, 20.0x more expensive)
Reason: One table too large for nested loop
REJECTED: Merge Join (cost: 800.00, 3.2x more expensive)
Reason: Neither input is sorted on join key
Decision Factors: - Table sizes - Join condition types (equality vs range) - Available memory (work_mem) - Pre-sorted data - Index availability
Index Selection¶
SELECT * FROM users
WHERE email = 'test@example.com' AND status = 'active';
-- Optimizer Decision:
1. Index Selection for users table
CHOSEN: idx_users_email (cost: 50.00, rows: 1)
Reasoning: Email is highly selective (0.1%).
Index lookup is 100x faster than seq scan.
REJECTED: idx_users_status (cost: 2000.00, 40.0x more expensive)
Reason: Status has low selectivity (50%). Sequential scan cheaper.
REJECTED: Sequential Scan (cost: 5000.00, 100.0x more expensive)
Reason: Table is large (1M rows) and predicate is highly selective
Scan Method Selection¶
SELECT * FROM products WHERE category = 'electronics';
-- Optimizer Decision:
1. Scan Method for products
CHOSEN: Sequential Scan (cost: 1000.00, rows: 50000)
Reasoning: Low selectivity (50% of rows match).
Sequential scan more efficient.
REJECTED: Index Scan on idx_category (cost: 3000.00, 3.0x more expensive)
Reason: High number of matching rows causes excessive random I/O
Selectivity Thresholds: - < 1%: Index scan strongly preferred - 1-5%: Index scan usually preferred - 5-20%: Depends on index quality and random_page_cost - > 20%: Sequential scan usually preferred
Why Not Analysis¶
Understand why certain optimizations weren't applied:
Index Not Used¶
EXPLAIN ANALYZE (WHY_NOT)
SELECT * FROM orders WHERE customer_id = 123;
-- Why Not Analysis:
INDEX USAGE ANALYSIS (2 indexes)
────────────────────────────────────────────────────────
1. Index: idx_orders_date
Table: orders | Type: BTree | Severity: Low
Reason: Column 'order_date' not in WHERE clause
Suggestion: Not applicable to this query
💰 Cost Impact: Would add cost (unnecessary index scan)
2. Index: idx_orders_customer_date (covering index)
Table: orders | Type: BTree | Severity: High
Reason: Statistics outdated (analyzed 45 days ago)
Suggestion: Run ANALYZE orders; to update statistics
This covering index could eliminate table lookups.
Estimated cost reduction: 60.0% (50 -> 20)
💰 Cost Impact: 2.5x slower without fresh stats (20.00 vs 50.00)
Statistics Warnings¶
STATISTICS WARNINGS (1 tables)
────────────────────────────────────────────────────────
1. Table: orders
Last Updated: 2024-09-15 | Severity: High
Freshness: 45 days old (35.0% of data modified since)
Impact: Stale statistics cause suboptimal query plans
Action: ANALYZE orders;
Statistics Freshness Guidelines: - < 7 days: Fresh (Severity: Low) - 7-14 days: Aging (Severity: Low) - 14-30 days: Stale (Severity: Medium) - > 30 days: Critical (Severity: High)
Impact of Stale Statistics: - Incorrect cardinality estimates - Wrong join order - Suboptimal index selection - Parallel execution disabled
Cardinality Mismatches¶
CARDINALITY MISMATCHES (1 detected)
────────────────────────────────────────────────────────
1. Operation: Hash Join
Estimated: 1000 rows | Actual: 100000 rows | Severity: Critical
Error Rate: 9900.0% (100.0x underestimate)
Likely Cause: Outdated statistics on join columns or correlated predicates
Fix: Run ANALYZE on joined tables
Consider multi-column statistics for correlated columns
Common Causes: - Outdated statistics - Correlated columns (not modeled) - Skewed data distributions - Complex predicates - Missing histograms
Optimization Suggestions¶
OPTIMIZATION SUGGESTIONS (5 recommendations)
────────────────────────────────────────────────────────
1. Create covering index for GROUP BY
Category: Indexing | Priority: Medium | Effort: Medium
Description: A covering index on GROUP BY columns with INCLUDE
for aggregate columns could enable index-only scan.
Expected Benefit: 81.0% improvement (1050.00 -> 200.00)
Implementation:
• Identify aggregate and group-by columns
• CREATE INDEX with INCLUDE clause for aggregate columns
• Test with EXPLAIN ANALYZE
2. Rewrite OR conditions as UNION
Category: QueryRewrite | Priority: High | Effort: Medium
Description: OR conditions prevent index usage. Rewriting as UNION ALL
allows each branch to use appropriate indexes.
Expected Benefit: 90.0% improvement (5000.00 -> 500.00)
Implementation:
• Identify OR conditions in WHERE clause
• Rewrite: SELECT * FROM t WHERE a=1 OR b=2
• As: SELECT * FROM t WHERE a=1 UNION ALL SELECT * FROM t WHERE b=2
• Add DISTINCT if needed to remove duplicates
Output Formats¶
TEXT (Default)¶
Human-readable, terminal-friendly output.
Best For: Quick debugging, terminal use
JSON¶
Machine-readable, structured data for tools.
{
"plan": {
"node_type": "IndexScan",
"table_id": "users",
"estimated_cost": 50.0,
"estimated_rows": 1
},
"active_features": {
"implicit": [
{
"name": "Index Scan",
"category": "Indexing",
"savings": {
"percent_reduction": 99.0
}
}
]
}
}
Best For: Tool integration, automation, logging
YAML¶
Human-friendly structured format.
plan:
estimated_cost: 50.0
estimated_rows: 1
planning_time_ms: 2.34
active_features:
total: 2
implicit: 2
explicit: 0
Best For: Configuration files, documentation
XML¶
Enterprise tool integration.
<?xml version="1.0" encoding="UTF-8"?>
<explain-plan version="7.0">
<summary>
<estimated-cost>50.0</estimated-cost>
<estimated-rows>1</estimated-rows>
</summary>
<active-features>
<feature type="implicit" category="Indexing">
<name>Index Scan</name>
<confidence>0.95</confidence>
</feature>
</active-features>
</explain-plan>
Best For: Enterprise ETL tools, SOAP APIs, XML parsers
HTML¶
Web-based visualization with CSS styling.
Generates complete HTML page with: - Responsive design - Color-coded nodes - Interactive elements - Print-friendly layout
Best For: Dashboards, reports, presentations
GraphViz (DOT)¶
Generate diagrams with GraphViz.
digraph ExplainPlan {
rankdir=TB;
node [shape=box, style=rounded];
node_0 [label="Hash Join\nCost: 250.00", fillcolor="#ffccbc", style=filled];
node_1 [label="TableScan\nTable: users", fillcolor="#bbdefb", style=filled];
node_2 [label="IndexScan\nTable: orders", fillcolor="#bbdefb", style=filled];
node_0 -> node_1 [label="left"];
node_0 -> node_2 [label="right"];
}
Render with: dot -Tpng explain.dot -o plan.png
Best For: Architecture diagrams, documentation, presentations
Mermaid¶
Markdown-based diagrams.
graph TB
summary["EXPLAIN PLAN<br/>Cost: 250.00<br/>Rows: 10000"]
summary:::summaryStyle
node0["Hash Join"]
node0:::joinStyle
node1["TableScan<br/>Table: users"]
node1:::scanStyle
node2["IndexScan<br/>Table: orders"]
node2:::scanStyle
summary --> node0
node0 -->|left| node1
node0 -->|right| node2
classDef summaryStyle fill:#e8f5e9,stroke:#4caf50
classDef scanStyle fill:#bbdefb,stroke:#2196f3
classDef joinStyle fill:#ffccbc,stroke:#ff5722
Best For: GitHub/GitLab documentation, Markdown files
Query Optimization Workflow¶
Step 1: Identify Slow Queries¶
-- Find slow queries in production
SELECT query, total_time, calls, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
Step 2: EXPLAIN the Query¶
Step 3: Check Statistics¶
Fix: Run ANALYZE
Step 4: Review Active Features¶
Fix: Check Why Not Analysis
Step 5: Apply Suggestions¶
OPTIMIZATION SUGGESTIONS:
1. Create index on customer_id
CREATE INDEX idx_orders_customer ON orders(customer_id);
Fix: Create recommended index
Step 6: Verify Improvement¶
Expected: Lower cost, index scan used
Best Practices¶
1. Regular Statistics Updates¶
-- Automated statistics collection
CREATE EXTENSION pg_cron;
SELECT cron.schedule('nightly-analyze', '0 2 * * *', $$
ANALYZE VERBOSE;
$$);
2. Monitor Query Plans¶
-- Track plan changes over time
CREATE TABLE query_plan_history (
query_id TEXT,
plan_hash TEXT,
estimated_cost NUMERIC,
captured_at TIMESTAMP DEFAULT NOW()
);
-- Log explain output
INSERT INTO query_plan_history (query_id, plan_hash, estimated_cost)
SELECT
md5(query),
md5(explain_plan),
(explain_json->>'estimated_cost')::numeric
FROM pg_stat_statements;
3. Index Maintenance¶
-- Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Drop if truly unused
DROP INDEX IF EXISTS unused_index_name;
4. Configuration Tuning¶
-- For OLTP (many small queries)
SET work_mem = '64MB';
SET random_page_cost = 1.1; -- SSD storage
SET effective_cache_size = '8GB';
-- For OLAP (complex analytical queries)
SET work_mem = '512MB';
SET max_parallel_workers_per_gather = 4;
SET enable_partitionwise_join = ON;
5. Query Patterns to Avoid¶
Avoid:
-- Function on indexed column
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Leading wildcard
SELECT * FROM products WHERE name LIKE '%phone%';
-- OR conditions preventing index use
SELECT * FROM orders WHERE status = 'pending' OR status = 'processing';
Instead:
-- Use functional index or normalize data
SELECT * FROM users WHERE email = 'test@example.com';
-- Use full-text search
SELECT * FROM products WHERE to_tsvector(name) @@ to_tsquery('phone');
-- Rewrite as UNION
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE status = 'processing';
Troubleshooting¶
Problem: Index Not Being Used¶
Symptoms: - EXPLAIN shows Sequential Scan - Query is slow despite having index
Diagnosis:
Common Causes:
-
Stale Statistics
-
Low Selectivity
-
Data Type Mismatch
-
Function on Column
Problem: Wrong Join Order¶
Symptoms: - Hash join on large tables first - Cardinality mismatches
Diagnosis:
Fix:
-- Update statistics
ANALYZE a, b, c;
-- Force different join order (last resort)
SET join_collapse_limit = 1;
SELECT * FROM a
JOIN (b JOIN c ON b.id = c.b_id) ON a.id = b.a_id;
Problem: Parallel Query Not Activating¶
Symptoms: - Expected parallel workers: 4 - Actual parallel workers: 0
Common Causes:
-
Table Too Small
-
Insufficient Workers
-
Query Too Cheap
Advanced Topics¶
Custom Statistics¶
-- Create extended statistics for correlated columns
CREATE STATISTICS orders_customer_date_stats (dependencies)
ON customer_id, order_date FROM orders;
ANALYZE orders;
Query Hints (Experimental)¶
-- Force index usage (PostgreSQL-style hint)
SELECT /*+ IndexScan(users idx_users_email) */
* FROM users WHERE email = 'test@example.com';
-- Force join order
SELECT /*+ Leading((a b) c) */
* FROM a, b, c WHERE ...;
Plan Stability¶
-- Freeze a good plan
CREATE EXTENSION pg_plan_freeze;
SELECT pg_plan_freeze('slow_query_md5_hash', 'good_plan_hash');
Cost Model Tuning¶
-- For SSD storage
SET random_page_cost = 1.1;
SET seq_page_cost = 1.0;
-- For HDD storage
SET random_page_cost = 4.0;
SET seq_page_cost = 1.0;
-- For in-memory data
SET random_page_cost = 0.1;
SET seq_page_cost = 0.1;
Appendix A: Configuration Reference¶
Complete Parameter List¶
| Parameter | Default | Range | Impact |
|---|---|---|---|
| work_mem | 256MB | 64KB-2GB | Sort/Hash memory |
| shared_buffers | 1024MB | 128MB-40% RAM | Page cache |
| effective_cache_size | 4096MB | 1GB-75% RAM | Cost calculations |
| random_page_cost | 4.0 | 0.1-10.0 | Index vs scan choice |
| cpu_tuple_cost | 0.01 | 0.001-0.1 | Processing cost |
| max_parallel_workers | 8 | 0-32 | Parallel queries |
Feature Flags¶
| Flag | Default | Description |
|---|---|---|
| enable_simd | true | SIMD vectorization |
| enable_hashjoin | true | Hash joins |
| enable_mergejoin | true | Merge joins |
| enable_nestloop | true | Nested loop joins |
| enable_indexscan | true | Index scans |
| enable_seqscan | true | Sequential scans |
| enable_partition_pruning | true | Partition pruning |
Appendix B: Error Codes¶
| Code | Message | Solution |
|---|---|---|
| E001 | Statistics too old | Run ANALYZE |
| E002 | No suitable index | Create index |
| E003 | Cardinality mismatch | Update statistics, check correlations |
| E004 | Out of memory (work_mem) | Increase work_mem |
| E005 | Query too complex | Simplify or break into CTEs |
Appendix C: Glossary¶
- Cardinality: Number of rows in a result set
- Selectivity: Fraction of rows matching a predicate (0.0 to 1.0)
- Cost: Arbitrary unit representing execution time
- Predicate: Filter condition (WHERE clause)
- Covering Index: Index containing all needed columns
- Sequential Scan: Read entire table sequentially
- Index Scan: Read via index structure
- Hash Join: Join using hash table
- Merge Join: Join pre-sorted inputs
- Nested Loop: Iterate outer, probe inner for each row
Appendix D: Further Reading¶
- PostgreSQL Query Performance Tuning
- Use The Index, Luke!
- HeliosDB Architecture Documentation
- Query Optimizer Internals
Document Version: 1.0 HeliosDB Version: 7.0 Feature Status: Phase 1 Complete (100%) Last Updated: 2025-11-17