Skip to content

Production Deployment: Troubleshooting

Part of: Production Deployment Guide


9.1 Common Issues

Issue: High Query Latency

Symptoms: - Slow query response times - Timeouts - High CPU usage

Diagnosis:

# Check slow queries
heliosdb-cli query slow-log --limit 10

# Check query plans
heliosdb-cli query explain --query-id <id>

# Check cache hit rate
heliosdb-cli metrics cache --component query-cache

Resolution: 1. Add appropriate indexes 2. Enable query cache 3. Increase compute nodes 4. Review and optimize query patterns

Issue: Replication Lag

Symptoms: - Stale reads - Data inconsistency - Replication lag alerts

Diagnosis:

# Check replication status
heliosdb-cli replication status

# Check network latency
heliosdb-cli network latency --nodes all

# Check WAL size
heliosdb-cli storage wal-status

Resolution: 1. Increase network bandwidth 2. Optimize WAL settings 3. Add more storage nodes 4. Check for slow nodes

Issue: Out of Memory

Symptoms: - OOM kills - Node crashes - Slow performance

Diagnosis:

# Check memory usage
heliosdb-cli metrics memory

# Check cache sizes
heliosdb-cli cache stats

# Review query memory usage
heliosdb-cli query memory-usage --top 10

Resolution: 1. Reduce cache sizes 2. Optimize query work_mem settings 3. Add more RAM or nodes 4. Enable memory limits per query

9.2 Debug Procedures

Enable Debug Logging:

# Set log level
heliosdb-cli config set logging.level debug

# Enable query logging
heliosdb-cli config set logging.query_log_enabled true

# Enable tracing
export RUST_LOG=trace
export RUST_BACKTRACE=full

Collect Diagnostics:

# Generate diagnostic report
heliosdb-cli diagnostics collect \
  --output /tmp/heliosdb-diagnostics.tar.gz \
  --include-logs \
  --include-metrics \
  --include-config \
  --time-range 1h

9.3 Performance Tuning

Query Performance:

-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM large_table WHERE id = 1;

-- Update statistics
ANALYZE large_table;

-- Rebuild indexes
REINDEX TABLE large_table;

Storage Performance:

# Check I/O statistics
heliosdb-cli storage io-stats

# Optimize compaction
heliosdb-cli storage compact --level 0-1

# Check fragmentation
heliosdb-cli storage fragmentation

Network Performance:

# Test network bandwidth
heliosdb-cli network benchmark --nodes all

# Check connection pool
heliosdb-cli network connections --status

# Optimize TCP settings
sysctl -w net.ipv4.tcp_window_scaling=1
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728