Skip to content

ClickHouse to HeliosDB Migration Guide

Version: 1.0 Last Updated: January 2026 Compatibility: HeliosDB 7.0+, ClickHouse 22.x/23.x/24.x


Table of Contents

  1. Introduction
  2. Compatibility Overview
  3. Pre-Migration Assessment
  4. Connection String Migration
  5. Data Type Mapping
  6. Table Engine Mapping
  7. Query Syntax Migration
  8. Materialized View Migration
  9. Data Migration Process
  10. Application Connectivity
  11. Performance Considerations
  12. Troubleshooting Common Issues
  13. Post-Migration Validation
  14. Appendix

1. Introduction

Why Migrate from ClickHouse to HeliosDB?

HeliosDB provides 92% ClickHouse compatibility while offering significant advantages for modern data platforms:

Key Benefits

Benefit Description
Multi-Protocol Access Access analytics data via ClickHouse, PostgreSQL, MySQL, MongoDB, Redis, GraphQL, and REST
Unified Data Platform Single system for OLTP, OLAP, and streaming workloads
AI/ML Integration Native NL2SQL, vector search, and ML model inference
Simplified Operations No ZooKeeper/Keeper dependency, automatic rebalancing
Enhanced ACID Full ACID compliance beyond ClickHouse's eventual consistency
Time Travel Point-in-time queries and data recovery
Multi-Model Support Relational, document, graph, time-series, and vector in one database

Migration Complexity: Low to Moderate

ClickHouse migrations are straightforward due to:

  • 100% Native Protocol compatibility (TCP port 9000)
  • 100% HTTP Protocol compatibility (port 8123)
  • Standard ClickHouse drivers work without modification
  • Most SQL queries execute identically
  • MergeTree family engines fully supported

Scope

This guide covers:

  • Schema migration for all table types
  • Data migration strategies for various dataset sizes
  • Application connection string and driver updates
  • Table engine mapping to HeliosDB equivalents
  • Query syntax differences and translations
  • Materialized view migration
  • Performance optimization post-migration

Target Audience

  • Data engineers migrating ClickHouse analytics platforms
  • DevOps engineers modernizing data infrastructure
  • Architects evaluating unified data platforms
  • Application developers updating ClickHouse integrations

2. Compatibility Overview

Overall Compatibility: 92%

Category Coverage Status
Native Protocol (TCP) 100% Complete
HTTP Protocol 100% Complete
SQL Language 90% Complete
Table Engines 85% Core engines
Data Types 95% All standard types
Aggregation Functions 95% Full analytics support

Protocol Support

Native Protocol (TCP Port 9000)

Feature Status Notes
Connection Supported Full protocol compatibility
Authentication Supported Password-based, LDAP
Compression Supported LZ4, ZSTD, LZ4HC
SSL/TLS Supported Encrypted connections
Query Execution Supported Full SQL support
Batch Insert Supported High throughput
Prepared Statements Supported Query caching
Query Cancellation Supported Cancel running queries

HTTP Protocol (Port 8123)

Feature Status Notes
GET Queries Supported Read operations
POST Queries Supported Write operations
Streaming Supported Large result sets
Compression Supported gzip, deflate, br
JSON Output Supported Multiple formats
Progress Tracking Supported Query progress

Driver Compatibility Matrix

Language Driver Version Status
Python clickhouse-driver 0.2+ Fully Compatible
Python clickhouse-connect 0.6+ Fully Compatible
Go clickhouse-go 2.x Fully Compatible
Node.js @clickhouse/client 0.2+ Fully Compatible
Java ClickHouse JDBC 0.4+ Fully Compatible
Java ClickHouse Native 0.4+ Fully Compatible
Rust clickhouse-rs 0.12+ Fully Compatible
C# ClickHouse.Client 5.x Fully Compatible
PHP smi2/phpClickHouse 1.x Fully Compatible

clickhouse-client Compatibility

# Connect to HeliosDB using standard clickhouse-client
clickhouse-client --host localhost --port 9000

# All standard options work
clickhouse-client \
    --host heliosdb.host \
    --port 9000 \
    --user default \
    --password '' \
    --database analytics \
    --query "SELECT count() FROM events"

3. Pre-Migration Assessment

3.1 Pre-Migration Checklist

  • [ ] Backup ClickHouse data (full backup)
  • [ ] Document cluster topology (replicas, shards)
  • [ ] Inventory all databases and tables
  • [ ] Analyze table engines in use
  • [ ] Catalog materialized views
  • [ ] Document dictionaries and external tables
  • [ ] Estimate total data volume
  • [ ] Review ZooKeeper/Keeper dependencies
  • [ ] Identify TTL requirements
  • [ ] List application connection configurations
  • [ ] Plan migration window and rollback strategy
  • [ ] Test migration in staging environment

3.2 Database and Table Inventory

Export Schema Information

# Export all database schemas
clickhouse-client --query "SHOW DATABASES" > databases.txt

# Export table definitions for each database
for db in $(cat databases.txt | grep -v system); do
    clickhouse-client --query "SHOW CREATE TABLE $db.*" > schema_${db}.sql 2>/dev/null
done

# Export complete schema
clickhouse-client --query "
    SELECT database, name, engine, partition_key, sorting_key,
           primary_key, total_rows, total_bytes
    FROM system.tables
    WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
" --format TSV > table_inventory.tsv

Analyze Table Engines

-- List all table engines in use
SELECT
    engine,
    count() AS table_count,
    sum(total_rows) AS total_rows,
    formatReadableSize(sum(total_bytes)) AS total_size
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
GROUP BY engine
ORDER BY table_count DESC;

-- Detailed table information
SELECT
    database,
    name,
    engine,
    partition_key,
    sorting_key,
    primary_key,
    total_rows,
    formatReadableSize(total_bytes) as size
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
ORDER BY total_bytes DESC;

3.3 Feature Usage Inventory

Document usage of the following features:

Feature Used? Tables Affected HeliosDB Support
MergeTree Full
ReplacingMergeTree Full
SummingMergeTree Full
AggregatingMergeTree Full
CollapsingMergeTree Full
VersionedCollapsingMergeTree Full
Distributed Tables Full
Materialized Views Full
TTL Full
Dictionaries Full
S3/External Tables Full
Kafka Integration Full
Replicated* Engines Different approach

3.4 Data Volume Estimation

-- Estimate total data size
SELECT
    sum(total_bytes) AS bytes,
    formatReadableSize(sum(total_bytes)) AS readable_size,
    sum(total_rows) AS total_rows
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA');

-- Per-database breakdown
SELECT
    database,
    count() AS tables,
    formatReadableSize(sum(total_bytes)) AS size,
    sum(total_rows) AS rows
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
GROUP BY database
ORDER BY sum(total_bytes) DESC;

Storage Planning for HeliosDB

ClickHouse Size Recommended HeliosDB Storage Notes
< 100 GB 1.1x ClickHouse size Similar compression
100 GB - 1 TB 1.0x - 1.1x Comparable efficiency
> 1 TB 0.95x - 1.0x Intelligent compression

3.5 Query Pattern Analysis

-- Analyze recent query patterns
SELECT
    type,
    query_kind,
    count() AS query_count,
    avg(query_duration_ms) AS avg_duration_ms,
    sum(read_rows) AS total_rows_read
FROM system.query_log
WHERE event_time > now() - INTERVAL 7 DAY
  AND type = 'QueryFinish'
GROUP BY type, query_kind
ORDER BY query_count DESC;

-- Identify heavy queries
SELECT
    query,
    count() AS executions,
    avg(query_duration_ms) AS avg_ms,
    formatReadableSize(avg(read_bytes)) AS avg_read
FROM system.query_log
WHERE event_time > now() - INTERVAL 7 DAY
  AND type = 'QueryFinish'
GROUP BY query
ORDER BY avg_ms DESC
LIMIT 20;

4. Connection String Migration

4.1 Connection String Format

ClickHouse and HeliosDB use identical connection formats:

Before (ClickHouse):

clickhouse://user:password@clickhouse.host:9000/database

After (HeliosDB):

clickhouse://user:password@heliosdb.host:9000/database

4.2 Native Protocol Connection

Basic Connection (Host Change Only)

ClickHouse:

from clickhouse_driver import Client

client = Client(
    host='clickhouse.host',
    port=9000,
    user='default',
    password='mypassword',
    database='analytics'
)

HeliosDB (Only host changes):

from clickhouse_driver import Client

client = Client(
    host='heliosdb.host',  # Only this changes
    port=9000,
    user='default',
    password='mypassword',
    database='analytics'
)

4.3 HTTP Protocol Connection

REST API Endpoint

ClickHouse:

curl 'http://clickhouse.host:8123/?query=SELECT%201'

HeliosDB:

curl 'http://heliosdb.host:8123/?query=SELECT%201'

4.4 Connection Parameters Reference

Parameter Default Description HeliosDB Support
host localhost Server hostname Full
port 9000 Native protocol port Full
http_port 8123 HTTP protocol port Full
user default Username Full
password - Password Full
database default Database name Full
compression lz4 Compression type LZ4, ZSTD
secure false Use TLS Full
verify true Verify TLS certs Full
connect_timeout 10 Connection timeout (s) Full
send_receive_timeout 300 Query timeout (s) Full
sync_request_timeout 5 Sync timeout (s) Full

4.5 SSL/TLS Configuration

from clickhouse_driver import Client

# Secure connection with TLS
client = Client(
    host='heliosdb.host',
    port=9440,  # Secure native port
    user='default',
    password='mypassword',
    database='analytics',
    secure=True,
    verify=True,
    ca_certs='/path/to/ca.crt'
)

4.6 Connection Pool Configuration

from clickhouse_driver import Client

# Production connection pool settings
client = Client(
    host='heliosdb.host',
    port=9000,
    user='default',
    password='mypassword',
    database='analytics',
    compression=True,
    settings={
        'max_threads': 8,
        'max_execution_time': 300,
        'max_memory_usage': 10000000000,  # 10GB
        'connect_timeout': 10,
        'send_receive_timeout': 300
    }
)

5. Data Type Mapping

5.1 Numeric Types

ClickHouse Type HeliosDB Type Notes
UInt8 UInt8 Identical
UInt16 UInt16 Identical
UInt32 UInt32 Identical
UInt64 UInt64 Identical
UInt128 UInt128 Identical
UInt256 UInt256 Identical
Int8 Int8 Identical
Int16 Int16 Identical
Int32 Int32 Identical
Int64 Int64 Identical
Int128 Int128 Identical
Int256 Int256 Identical
Float32 Float32 IEEE 754
Float64 Float64 IEEE 754
Decimal(P, S) Decimal(P, S) Arbitrary precision
Decimal32(S) Decimal32(S) 9 digit precision
Decimal64(S) Decimal64(S) 18 digit precision
Decimal128(S) Decimal128(S) 38 digit precision

5.2 String Types

ClickHouse Type HeliosDB Type Notes
String String Variable length UTF-8
FixedString(N) FixedString(N) Fixed byte length
UUID UUID 128-bit UUID
IPv4 IPv4 IPv4 address
IPv6 IPv6 IPv6 address
Enum8 Enum8 8-bit enumeration
Enum16 Enum16 16-bit enumeration

5.3 Date and Time Types

ClickHouse Type HeliosDB Type Notes
Date Date Days since epoch
Date32 Date32 Extended range
DateTime DateTime Second precision
DateTime64(N) DateTime64(N) Subsecond precision
DateTime64(N, tz) DateTime64(N, tz) With timezone

5.4 Composite Types

ClickHouse Type HeliosDB Type Notes
Array(T) Array(T) Dynamic arrays
Tuple(T1, T2, ...) Tuple(T1, T2, ...) Fixed-size tuples
Map(K, V) Map(K, V) Key-value maps
Nested(name Type, ...) Nested(name Type, ...) Nested columns
Nullable(T) Nullable(T) NULL support
LowCardinality(T) LowCardinality(T) Dictionary encoding

5.5 Special Types

ClickHouse Type HeliosDB Type Notes
Bool Bool Boolean
JSON JSON JSON objects
Object('json') Object('json') JSON objects
Point Point Geospatial point
Ring Ring Geospatial ring
Polygon Polygon Geospatial polygon
MultiPolygon MultiPolygon Geospatial multi-polygon

5.6 Aggregate Function Types

ClickHouse Type HeliosDB Type Notes
AggregateFunction(name, T) AggregateFunction(name, T) Aggregate state
SimpleAggregateFunction(name, T) SimpleAggregateFunction(name, T) Simple aggregate

5.7 Type Conversion Examples

-- All type casts work identically

-- Numeric conversions
SELECT toUInt32(123.45);                    -- 123
SELECT toFloat64('123.456');                -- 123.456
SELECT toDecimal64(123.456789, 4);          -- 123.4568

-- String conversions
SELECT toString(12345);                      -- '12345'
SELECT toFixedString('hello', 10);          -- 'hello\0\0\0\0\0'

-- Date/Time conversions
SELECT toDate('2025-01-15');                -- 2025-01-15
SELECT toDateTime('2025-01-15 10:30:00');   -- 2025-01-15 10:30:00
SELECT toDateTime64('2025-01-15 10:30:00.123', 3);

-- Array conversions
SELECT toTypeName([1, 2, 3]);               -- Array(UInt8)
SELECT array(1, 2, 3);                      -- [1, 2, 3]

-- Nullable handling
SELECT toNullable(123);
SELECT assumeNotNull(nullable_column);

6. Table Engine Mapping

6.1 MergeTree Family

MergeTree

-- ClickHouse
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
SETTINGS index_granularity = 8192;

-- HeliosDB (identical syntax)
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
SETTINGS index_granularity = 8192;

ReplacingMergeTree

-- For upsert/deduplication scenarios
CREATE TABLE user_states (
    user_id UInt32,
    version UInt64,
    name String,
    status String,
    updated_at DateTime
) ENGINE = ReplacingMergeTree(version)
PARTITION BY toYYYYMM(updated_at)
ORDER BY user_id;

-- Query with deduplication
SELECT * FROM user_states FINAL WHERE user_id = 12345;

SummingMergeTree

-- For pre-aggregated metrics
CREATE TABLE metrics_daily (
    date Date,
    metric_name String,
    total_value Float64,
    event_count UInt64
) ENGINE = SummingMergeTree((total_value, event_count))
PARTITION BY toYYYYMM(date)
ORDER BY (date, metric_name);

-- Rows with same ORDER BY key are summed
INSERT INTO metrics_daily VALUES ('2025-01-15', 'clicks', 100, 10);
INSERT INTO metrics_daily VALUES ('2025-01-15', 'clicks', 50, 5);
-- After merge: ('2025-01-15', 'clicks', 150, 15)

AggregatingMergeTree

-- For complex pre-aggregations
CREATE TABLE events_hourly (
    hour DateTime,
    event_type String,
    count_state AggregateFunction(count, UInt64),
    sum_value_state AggregateFunction(sum, Float64),
    uniq_users_state AggregateFunction(uniq, UInt32)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, event_type);

-- Insert with aggregate functions
INSERT INTO events_hourly
SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    countState() AS count_state,
    sumState(value) AS sum_value_state,
    uniqState(user_id) AS uniq_users_state
FROM events
GROUP BY hour, event_type;

-- Query with merge
SELECT
    hour,
    event_type,
    countMerge(count_state) AS total_count,
    sumMerge(sum_value_state) AS total_value,
    uniqMerge(uniq_users_state) AS unique_users
FROM events_hourly
GROUP BY hour, event_type;

CollapsingMergeTree

-- For state change tracking
CREATE TABLE user_sessions (
    user_id UInt32,
    session_start DateTime,
    session_duration UInt32,
    page_views UInt32,
    sign Int8  -- 1 for insert, -1 for delete
) ENGINE = CollapsingMergeTree(sign)
PARTITION BY toYYYYMM(session_start)
ORDER BY (user_id, session_start);

-- Insert initial state
INSERT INTO user_sessions VALUES (1, '2025-01-15 10:00:00', 300, 5, 1);

-- Update by inserting old row with -1 and new row with 1
INSERT INTO user_sessions VALUES
    (1, '2025-01-15 10:00:00', 300, 5, -1),   -- Cancel old
    (1, '2025-01-15 10:00:00', 600, 10, 1);   -- Insert new

-- Query with collapsing
SELECT
    user_id,
    session_start,
    sum(session_duration * sign) AS duration,
    sum(page_views * sign) AS pages
FROM user_sessions
GROUP BY user_id, session_start
HAVING sum(sign) > 0;

VersionedCollapsingMergeTree

-- For ordered state changes with versions
CREATE TABLE user_states_versioned (
    user_id UInt32,
    status String,
    version UInt64,
    sign Int8
) ENGINE = VersionedCollapsingMergeTree(sign, version)
ORDER BY user_id;

6.2 Integration Engines

Distributed Tables

-- ClickHouse distributed table
CREATE TABLE events_distributed AS events
ENGINE = Distributed(cluster_name, database, events, rand());

-- HeliosDB (same syntax, uses internal distribution)
CREATE TABLE events_distributed AS events
ENGINE = Distributed(default, analytics, events, rand());

Kafka Integration

-- Kafka source table
CREATE TABLE events_kafka (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = Kafka()
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'events',
    kafka_group_name = 'heliosdb_consumer',
    kafka_format = 'JSONEachRow';

-- Materialized view to persist data
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM events_kafka;

S3 Tables

-- External S3 table
CREATE TABLE s3_data (
    timestamp DateTime,
    event_type String,
    value Float64
) ENGINE = S3(
    'https://bucket.s3.amazonaws.com/data/*.parquet',
    'AWS_ACCESS_KEY',
    'AWS_SECRET_KEY',
    'Parquet'
);

-- Query external data
SELECT event_type, sum(value)
FROM s3_data
GROUP BY event_type;

6.3 Special Engines

Memory Engine

-- In-memory table (same syntax)
CREATE TABLE temp_data (
    id UInt32,
    value String
) ENGINE = Memory;

Log Engines

-- Simple log table
CREATE TABLE log_data (
    timestamp DateTime,
    message String
) ENGINE = Log;

-- Tiny log for small tables
CREATE TABLE config_data (
    key String,
    value String
) ENGINE = TinyLog;

Dictionary Engine

-- Create dictionary
CREATE DICTIONARY geo_dict (
    country_code String,
    country_name String,
    population UInt64
)
PRIMARY KEY country_code
SOURCE(CLICKHOUSE(
    HOST 'localhost'
    PORT 9000
    USER 'default'
    TABLE 'countries'
    DB 'reference'
))
LIFETIME(MIN 3600 MAX 7200)
LAYOUT(FLAT());

-- Use in queries
SELECT dictGet('geo_dict', 'country_name', country_code) AS country
FROM events;

6.4 Engine Migration Matrix

ClickHouse Engine HeliosDB Equivalent Migration Complexity
MergeTree MergeTree None (identical)
ReplacingMergeTree ReplacingMergeTree None (identical)
SummingMergeTree SummingMergeTree None (identical)
AggregatingMergeTree AggregatingMergeTree None (identical)
CollapsingMergeTree CollapsingMergeTree None (identical)
VersionedCollapsingMergeTree VersionedCollapsingMergeTree None (identical)
Distributed Distributed Cluster config changes
ReplicatedMergeTree MergeTree + HeliosDB replication Schema change
Kafka Kafka Connection config only
S3 S3 Credential config only
Memory Memory None (identical)
Log/TinyLog Log/TinyLog None (identical)
Dictionary Dictionary None (identical)

6.5 Replicated Engine Migration

ClickHouse Replicated* engines use ZooKeeper/Keeper. HeliosDB uses its own replication:

ClickHouse:

CREATE TABLE events_replicated (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

HeliosDB:

-- Use standard MergeTree; replication is automatic
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

-- Configure replication in HeliosDB settings
-- No ZooKeeper/Keeper required


7. Query Syntax Migration

7.1 Identical Syntax (No Changes Required)

Most ClickHouse queries work identically in HeliosDB:

-- SELECT with aggregations
SELECT
    event_type,
    count() AS events,
    count(DISTINCT user_id) AS unique_users,
    sum(value) AS total_value,
    avg(value) AS avg_value,
    min(value) AS min_value,
    max(value) AS max_value
FROM events
WHERE timestamp >= '2025-01-01'
GROUP BY event_type
ORDER BY events DESC
LIMIT 100;

-- PREWHERE for optimization
SELECT *
FROM events
PREWHERE timestamp >= '2025-01-01'
WHERE event_type = 'purchase';

-- SAMPLE for approximate queries
SELECT event_type, count() * 10 AS estimated_count
FROM events SAMPLE 0.1
GROUP BY event_type;

-- FINAL for deduplication
SELECT * FROM user_states FINAL
WHERE user_id = 12345;

-- WITH clauses (CTEs)
WITH daily_totals AS (
    SELECT
        toDate(timestamp) AS date,
        sum(value) AS total
    FROM events
    GROUP BY date
)
SELECT
    date,
    total,
    total - lagInFrame(total) OVER (ORDER BY date) AS change
FROM daily_totals;

7.2 Window Functions

-- All window functions supported
SELECT
    timestamp,
    user_id,
    value,
    row_number() OVER (PARTITION BY user_id ORDER BY timestamp) AS row_num,
    rank() OVER (PARTITION BY user_id ORDER BY value DESC) AS value_rank,
    dense_rank() OVER (PARTITION BY user_id ORDER BY value DESC) AS dense_value_rank,
    sum(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS running_total,
    avg(value) OVER (
        PARTITION BY user_id
        ORDER BY timestamp
        ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) AS moving_avg,
    lead(value, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS next_value,
    lag(value, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS prev_value,
    first_value(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS first_val,
    last_value(value) OVER (
        PARTITION BY user_id
        ORDER BY timestamp
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_val
FROM events
ORDER BY user_id, timestamp;

7.3 JOIN Operations

-- All JOIN types supported
-- INNER JOIN
SELECT e.*, u.name
FROM events e
INNER JOIN users u ON e.user_id = u.user_id;

-- LEFT JOIN
SELECT e.*, u.name
FROM events e
LEFT JOIN users u ON e.user_id = u.user_id;

-- RIGHT JOIN
SELECT e.*, u.name
FROM events e
RIGHT JOIN users u ON e.user_id = u.user_id;

-- FULL OUTER JOIN
SELECT e.*, u.name
FROM events e
FULL JOIN users u ON e.user_id = u.user_id;

-- CROSS JOIN
SELECT *
FROM events e
CROSS JOIN config c;

-- ASOF JOIN (time-series join)
SELECT
    e.timestamp,
    e.user_id,
    e.value,
    p.price
FROM events e
ASOF LEFT JOIN prices p
ON e.product_id = p.product_id
AND e.timestamp >= p.effective_date;

-- GLOBAL JOIN for distributed tables
SELECT *
FROM events_distributed e
GLOBAL JOIN users_distributed u ON e.user_id = u.user_id;

7.4 Aggregation Functions

-- Standard aggregations
SELECT
    count(),
    countDistinct(user_id),
    sum(value),
    avg(value),
    min(value),
    max(value),
    any(event_type),
    anyHeavy(event_type),
    anyLast(event_type)
FROM events;

-- Statistical functions
SELECT
    stddevPop(value) AS stddev,
    stddevSamp(value) AS stddev_sample,
    varPop(value) AS variance,
    varSamp(value) AS variance_sample,
    covarPop(value, other_value) AS covariance,
    corr(value, other_value) AS correlation
FROM events;

-- Quantile functions
SELECT
    quantile(0.5)(value) AS median,
    quantile(0.95)(value) AS p95,
    quantile(0.99)(value) AS p99,
    quantileExact(0.5)(value) AS exact_median,
    quantiles(0.25, 0.5, 0.75)(value) AS quartiles,
    quantileTiming(0.95)(response_ms) AS timing_p95
FROM events;

-- Unique count functions
SELECT
    uniq(user_id) AS approx_unique,
    uniqExact(user_id) AS exact_unique,
    uniqCombined(user_id) AS combined_unique,
    uniqHLL12(user_id) AS hll_unique
FROM events;

-- TopK functions
SELECT
    topK(10)(event_type) AS top_events,
    topKWeighted(10)(event_type, value) AS weighted_top
FROM events;

-- Conditional aggregations
SELECT
    countIf(event_type = 'purchase') AS purchases,
    sumIf(value, event_type = 'purchase') AS purchase_value,
    avgIf(value, event_type = 'view') AS avg_view_value
FROM events;

7.5 Array Functions

-- Array operations
SELECT
    [1, 2, 3] AS arr,
    array(1, 2, 3) AS arr2,
    arrayJoin([1, 2, 3]) AS expanded,
    arrayMap(x -> x * 2, [1, 2, 3]) AS doubled,
    arrayFilter(x -> x > 1, [1, 2, 3]) AS filtered,
    arrayReduce('sum', [1, 2, 3]) AS reduced,
    arraySort([3, 1, 2]) AS sorted,
    arrayReverse([1, 2, 3]) AS reversed,
    arraySlice([1, 2, 3, 4, 5], 2, 3) AS sliced,
    arrayConcat([1, 2], [3, 4]) AS concatenated,
    has([1, 2, 3], 2) AS contains,
    indexOf([1, 2, 3], 2) AS position,
    length([1, 2, 3]) AS len,
    arrayUniq([1, 1, 2, 2, 3]) AS unique_count;

-- Array aggregations
SELECT
    groupArray(user_id) AS user_ids,
    groupArrayDistinct(user_id) AS unique_user_ids,
    groupUniqArray(user_id) AS uniq_array
FROM events
WHERE event_type = 'purchase';

7.6 String Functions

-- String operations
SELECT
    concat('Hello', ' ', 'World') AS concatenated,
    substring('ClickHouse', 1, 5) AS sub,
    length('Hello') AS len,
    lower('HELLO') AS lowered,
    upper('hello') AS uppered,
    trim('  hello  ') AS trimmed,
    ltrim('  hello') AS ltrimmed,
    rtrim('hello  ') AS rtrimmed,
    splitByChar(',', 'a,b,c') AS split,
    splitByString('::', 'a::b::c') AS split_str,
    replaceAll('hello world', 'world', 'universe') AS replaced,
    reverse('hello') AS reversed,
    position('hello', 'l') AS pos,
    match('hello', 'e.+o') AS regex_match,
    extract('hello123world', '\\d+') AS extracted;

-- Format functions
SELECT
    format('{} {}', 'Hello', 'World') AS formatted,
    formatReadableSize(1024 * 1024 * 1024) AS readable_size,
    formatReadableQuantity(1000000) AS readable_qty;

7.7 Date/Time Functions

-- Date/Time operations
SELECT
    now() AS current_time,
    today() AS current_date,
    yesterday() AS prev_date,
    toDate('2025-01-15') AS date,
    toDateTime('2025-01-15 10:30:00') AS datetime,
    toDateTime64('2025-01-15 10:30:00.123456', 6) AS datetime64,
    toYear(now()) AS year,
    toMonth(now()) AS month,
    toDayOfMonth(now()) AS day,
    toHour(now()) AS hour,
    toMinute(now()) AS minute,
    toSecond(now()) AS second,
    toDayOfWeek(now()) AS day_of_week,
    toDayOfYear(now()) AS day_of_year,
    toStartOfDay(now()) AS start_of_day,
    toStartOfHour(now()) AS start_of_hour,
    toStartOfMinute(now()) AS start_of_minute,
    toStartOfMonth(now()) AS start_of_month,
    toStartOfQuarter(now()) AS start_of_quarter,
    toStartOfYear(now()) AS start_of_year,
    toStartOfWeek(now()) AS start_of_week,
    date_add(day, 7, today()) AS week_later,
    date_sub(month, 1, today()) AS month_ago,
    dateDiff('day', '2025-01-01', '2025-01-15') AS days_diff;

7.8 Conditional Functions

-- Conditional operations
SELECT
    if(value > 100, 'high', 'low') AS category,
    multiIf(
        value > 1000, 'very high',
        value > 100, 'high',
        value > 10, 'medium',
        'low'
    ) AS multi_category,
    CASE
        WHEN value > 1000 THEN 'very high'
        WHEN value > 100 THEN 'high'
        WHEN value > 10 THEN 'medium'
        ELSE 'low'
    END AS case_category,
    coalesce(nullable_value, 0) AS with_default,
    ifNull(nullable_value, 0) AS if_null,
    nullIf(value, 0) AS null_if_zero
FROM events;

8. Materialized View Migration

8.1 Standard Materialized Views

Materialized views work identically in HeliosDB:

-- Create target table
CREATE TABLE events_daily (
    date Date,
    event_type String,
    total_events UInt64,
    total_value Float64,
    unique_users UInt64
) ENGINE = SummingMergeTree((total_events, total_value, unique_users))
PARTITION BY toYYYYMM(date)
ORDER BY (date, event_type);

-- Create materialized view
CREATE MATERIALIZED VIEW events_daily_mv TO events_daily AS
SELECT
    toDate(timestamp) AS date,
    event_type,
    count() AS total_events,
    sum(value) AS total_value,
    uniq(user_id) AS unique_users
FROM events
GROUP BY date, event_type;

8.2 Aggregating Materialized Views

-- Hourly aggregations with AggregatingMergeTree
CREATE TABLE metrics_hourly (
    hour DateTime,
    metric_name String,
    count_state AggregateFunction(count, UInt64),
    sum_state AggregateFunction(sum, Float64),
    avg_state AggregateFunction(avg, Float64),
    min_state AggregateFunction(min, Float64),
    max_state AggregateFunction(max, Float64),
    uniq_state AggregateFunction(uniq, UInt32),
    quantile_state AggregateFunction(quantile(0.95), Float64)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, metric_name);

CREATE MATERIALIZED VIEW metrics_hourly_mv TO metrics_hourly AS
SELECT
    toStartOfHour(timestamp) AS hour,
    metric_name,
    countState() AS count_state,
    sumState(value) AS sum_state,
    avgState(value) AS avg_state,
    minState(value) AS min_state,
    maxState(value) AS max_state,
    uniqState(user_id) AS uniq_state,
    quantileState(0.95)(value) AS quantile_state
FROM metrics
GROUP BY hour, metric_name;

-- Query with merge
SELECT
    hour,
    metric_name,
    countMerge(count_state) AS total_count,
    sumMerge(sum_state) AS total_sum,
    avgMerge(avg_state) AS average,
    minMerge(min_state) AS minimum,
    maxMerge(max_state) AS maximum,
    uniqMerge(uniq_state) AS unique_count,
    quantileMerge(0.95)(quantile_state) AS p95
FROM metrics_hourly
WHERE hour >= toStartOfDay(now())
GROUP BY hour, metric_name
ORDER BY hour;

8.3 Cascading Materialized Views

-- First level: Hourly
CREATE TABLE events_hourly (
    hour DateTime,
    event_type String,
    count UInt64,
    value_sum Float64
) ENGINE = SummingMergeTree((count, value_sum))
ORDER BY (hour, event_type);

CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly AS
SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    count() AS count,
    sum(value) AS value_sum
FROM events
GROUP BY hour, event_type;

-- Second level: Daily (aggregates from hourly)
CREATE TABLE events_daily_summary (
    date Date,
    event_type String,
    count UInt64,
    value_sum Float64
) ENGINE = SummingMergeTree((count, value_sum))
ORDER BY (date, event_type);

CREATE MATERIALIZED VIEW events_daily_mv TO events_daily_summary AS
SELECT
    toDate(hour) AS date,
    event_type,
    sum(count) AS count,
    sum(value_sum) AS value_sum
FROM events_hourly
GROUP BY date, event_type;

8.4 Materialized View with POPULATE

-- Create materialized view and populate with existing data
CREATE MATERIALIZED VIEW user_stats_mv TO user_stats
POPULATE
AS SELECT
    user_id,
    count() AS total_events,
    sum(value) AS total_value,
    max(timestamp) AS last_activity
FROM events
GROUP BY user_id;

9. Data Migration Process

9.1 Migration Strategy Selection

Data Size Recommended Method Estimated Time
< 10 GB INSERT SELECT via link Minutes
10-100 GB COPY with files 30 min - 2 hours
100 GB - 1 TB Parallel export/import 2-8 hours
> 1 TB Streaming or incremental Hours to days

9.2 Method 1: Direct INSERT SELECT (Small Datasets)

-- Connect HeliosDB to ClickHouse as external source
-- Then copy data directly

-- In HeliosDB:
CREATE TABLE events_clickhouse (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MySQL('clickhouse.host:9004', 'analytics', 'events', 'user', 'pass');

-- Copy data
INSERT INTO events SELECT * FROM events_clickhouse;

9.3 Method 2: Export/Import with Files

Export from ClickHouse

# Export to Native format (fastest)
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT Native" \
    > events.native

# Export to CSV
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT CSVWithNames" \
    > events.csv

# Export to Parquet (compressed)
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT Parquet" \
    > events.parquet

# Export to JSON
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT JSONEachRow" \
    > events.json

Import to HeliosDB

# Import from Native format
clickhouse-client \
    --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT Native" \
    < events.native

# Import from CSV
clickhouse-client \
    --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT CSVWithNames" \
    < events.csv

# Import from Parquet
clickhouse-client \
    --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT Parquet" \
    < events.parquet

9.4 Method 3: Python Migration Script

#!/usr/bin/env python3
"""
ClickHouse to HeliosDB Migration Script
Handles large tables with batching and progress tracking.
"""

from clickhouse_driver import Client
import time
import sys

def migrate_table(source_host, target_host, database, table, batch_size=100000):
    """Migrate a single table with batching."""

    source = Client(
        host=source_host,
        port=9000,
        database=database
    )

    target = Client(
        host=target_host,
        port=9000,
        database=database
    )

    # Get row count
    total_rows = source.execute(f'SELECT count() FROM {table}')[0][0]
    print(f"Migrating {table}: {total_rows:,} rows")

    # Get column info
    columns_result = source.execute(f'DESCRIBE TABLE {table}')
    columns = ', '.join([col[0] for col in columns_result])

    # Migrate in batches
    offset = 0
    migrated = 0
    start_time = time.time()

    while offset < total_rows:
        # Read batch from source
        rows = source.execute(
            f'SELECT {columns} FROM {table} LIMIT {batch_size} OFFSET {offset}'
        )

        if not rows:
            break

        # Write batch to target
        target.execute(
            f'INSERT INTO {table} ({columns}) VALUES',
            rows
        )

        offset += batch_size
        migrated += len(rows)

        # Progress update
        elapsed = time.time() - start_time
        rate = migrated / elapsed if elapsed > 0 else 0
        progress = (migrated / total_rows) * 100
        eta = (total_rows - migrated) / rate if rate > 0 else 0

        print(f"  Progress: {progress:.1f}% ({migrated:,}/{total_rows:,}) "
              f"Rate: {rate:,.0f} rows/sec ETA: {eta:.0f}s")

    elapsed = time.time() - start_time
    print(f"Completed {table}: {migrated:,} rows in {elapsed:.1f}s "
          f"({migrated/elapsed:,.0f} rows/sec)")

    return migrated

def migrate_database(source_host, target_host, database, tables=None):
    """Migrate entire database or specific tables."""

    source = Client(host=source_host, port=9000, database=database)

    # Get tables if not specified
    if tables is None:
        result = source.execute(
            f"SELECT name FROM system.tables WHERE database = '{database}'"
        )
        tables = [row[0] for row in result]

    print(f"Migrating {len(tables)} tables from {database}")

    total_migrated = 0
    for table in tables:
        try:
            migrated = migrate_table(
                source_host, target_host, database, table
            )
            total_migrated += migrated
        except Exception as e:
            print(f"Error migrating {table}: {e}")
            continue

    print(f"\nTotal migrated: {total_migrated:,} rows")
    return total_migrated

if __name__ == "__main__":
    # Configuration
    SOURCE_HOST = 'clickhouse.host'
    TARGET_HOST = 'heliosdb.host'
    DATABASE = 'analytics'

    # Migrate all tables
    migrate_database(SOURCE_HOST, TARGET_HOST, DATABASE)

9.5 Method 4: Parallel Migration for Large Datasets

#!/usr/bin/env python3
"""
Parallel migration for large ClickHouse tables.
Uses multiple workers for concurrent data transfer.
"""

from clickhouse_driver import Client
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import time

class ParallelMigrator:
    def __init__(self, source_host, target_host, database, num_workers=4):
        self.source_host = source_host
        self.target_host = target_host
        self.database = database
        self.num_workers = num_workers
        self.lock = threading.Lock()
        self.total_migrated = 0

    def get_client(self, host):
        return Client(
            host=host,
            port=9000,
            database=self.database
        )

    def migrate_partition(self, table, partition_id, columns):
        """Migrate a single partition."""
        source = self.get_client(self.source_host)
        target = self.get_client(self.target_host)

        # Export partition data
        rows = source.execute(f"""
            SELECT {columns} FROM {table}
            WHERE _partition_id = '{partition_id}'
        """)

        if rows:
            target.execute(
                f'INSERT INTO {table} ({columns}) VALUES',
                rows
            )

        with self.lock:
            self.total_migrated += len(rows)

        return len(rows)

    def migrate_table(self, table):
        """Migrate table using parallel partition processing."""
        source = self.get_client(self.source_host)

        # Get partitions
        partitions = source.execute(f"""
            SELECT DISTINCT _partition_id
            FROM {table}
        """)
        partition_ids = [p[0] for p in partitions]

        # Get columns
        columns_result = source.execute(f'DESCRIBE TABLE {table}')
        columns = ', '.join([col[0] for col in columns_result])

        print(f"Migrating {table}: {len(partition_ids)} partitions")

        start_time = time.time()

        with ThreadPoolExecutor(max_workers=self.num_workers) as executor:
            futures = {
                executor.submit(
                    self.migrate_partition, table, pid, columns
                ): pid for pid in partition_ids
            }

            completed = 0
            for future in as_completed(futures):
                partition_id = futures[future]
                try:
                    rows = future.result()
                    completed += 1
                    print(f"  Partition {partition_id}: {rows:,} rows "
                          f"({completed}/{len(partition_ids)})")
                except Exception as e:
                    print(f"  Error in partition {partition_id}: {e}")

        elapsed = time.time() - start_time
        print(f"Completed {table}: {self.total_migrated:,} rows "
              f"in {elapsed:.1f}s")

if __name__ == "__main__":
    migrator = ParallelMigrator(
        source_host='clickhouse.host',
        target_host='heliosdb.host',
        database='analytics',
        num_workers=8
    )

    migrator.migrate_table('events')

9.6 Method 5: Incremental/CDC Migration

For zero-downtime migrations with continuous data sync:

#!/usr/bin/env python3
"""
Incremental migration with change tracking.
Suitable for zero-downtime migrations.
"""

from clickhouse_driver import Client
import time

def incremental_sync(source_host, target_host, database, table,
                     timestamp_column='timestamp', interval_seconds=60):
    """Continuously sync new data from source to target."""

    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    # Get columns
    columns_result = source.execute(f'DESCRIBE TABLE {table}')
    columns = ', '.join([col[0] for col in columns_result])

    # Get initial watermark from target
    result = target.execute(f'SELECT max({timestamp_column}) FROM {table}')
    watermark = result[0][0] or '1970-01-01 00:00:00'

    print(f"Starting incremental sync from {watermark}")

    while True:
        # Get new rows since watermark
        rows = source.execute(f"""
            SELECT {columns} FROM {table}
            WHERE {timestamp_column} > '{watermark}'
            ORDER BY {timestamp_column}
            LIMIT 100000
        """)

        if rows:
            # Insert to target
            target.execute(
                f'INSERT INTO {table} ({columns}) VALUES',
                rows
            )

            # Update watermark
            new_watermark = max(row[0] for row in rows)  # Assumes first col is timestamp
            print(f"Synced {len(rows)} rows up to {new_watermark}")
            watermark = new_watermark
        else:
            print(f"No new data (watermark: {watermark})")

        # Wait before next sync
        time.sleep(interval_seconds)

if __name__ == "__main__":
    incremental_sync(
        source_host='clickhouse.host',
        target_host='heliosdb.host',
        database='analytics',
        table='events',
        timestamp_column='timestamp',
        interval_seconds=30
    )

9.7 Schema Migration Script

#!/bin/bash
# migrate_schema.sh - Export and import ClickHouse schema to HeliosDB

SOURCE_HOST="clickhouse.host"
TARGET_HOST="heliosdb.host"
DATABASE="analytics"

echo "=== Schema Migration: ClickHouse -> HeliosDB ==="

# Export schema
echo "Exporting schema from $SOURCE_HOST..."
clickhouse-client --host $SOURCE_HOST --query "
    SELECT create_table_query
    FROM system.tables
    WHERE database = '$DATABASE'
      AND engine NOT LIKE 'Replicated%'
" > schema_export.sql

# Handle ReplicatedMergeTree -> MergeTree conversion
echo "Converting Replicated* engines to standard engines..."
sed -i 's/ReplicatedMergeTree([^)]*)/MergeTree()/g' schema_export.sql
sed -i 's/ReplicatedReplacingMergeTree([^)]*)/ReplacingMergeTree()/g' schema_export.sql
sed -i 's/ReplicatedSummingMergeTree([^)]*)/SummingMergeTree()/g' schema_export.sql
sed -i 's/ReplicatedAggregatingMergeTree([^)]*)/AggregatingMergeTree()/g' schema_export.sql

# Create database in HeliosDB
echo "Creating database in HeliosDB..."
clickhouse-client --host $TARGET_HOST --query "CREATE DATABASE IF NOT EXISTS $DATABASE"

# Import schema
echo "Importing schema to $TARGET_HOST..."
while IFS= read -r query; do
    if [ -n "$query" ]; then
        clickhouse-client --host $TARGET_HOST --query "$query" || \
            echo "Failed: $query"
    fi
done < schema_export.sql

echo "Schema migration complete!"

10. Application Connectivity

10.1 Python (clickhouse-driver)

from clickhouse_driver import Client

# Before (ClickHouse)
client = Client(
    host='clickhouse.host',
    port=9000,
    user='default',
    password='password',
    database='analytics',
    compression=True
)

# After (HeliosDB) - Only host changes
client = Client(
    host='heliosdb.host',  # Change only this
    port=9000,
    user='default',
    password='password',
    database='analytics',
    compression=True
)

# Usage remains identical
result = client.execute('''
    SELECT event_type, count() AS cnt
    FROM events
    WHERE timestamp > now() - INTERVAL 1 DAY
    GROUP BY event_type
    ORDER BY cnt DESC
''')

for row in result:
    print(f"{row[0]}: {row[1]}")

10.2 Python (clickhouse-connect)

import clickhouse_connect

# Before (ClickHouse)
client = clickhouse_connect.get_client(
    host='clickhouse.host',
    port=8123,
    username='default',
    password='password',
    database='analytics'
)

# After (HeliosDB) - Only host changes
client = clickhouse_connect.get_client(
    host='heliosdb.host',  # Change only this
    port=8123,
    username='default',
    password='password',
    database='analytics'
)

# Query with Pandas integration
df = client.query_df('''
    SELECT
        toDate(timestamp) AS date,
        event_type,
        count() AS events
    FROM events
    GROUP BY date, event_type
''')

print(df.head())

10.3 Go (clickhouse-go)

package main

import (
    "context"
    "fmt"
    "github.com/ClickHouse/clickhouse-go/v2"
)

func main() {
    // Before (ClickHouse)
    // conn, _ := clickhouse.Open(&clickhouse.Options{
    //     Addr: []string{"clickhouse.host:9000"},
    //     ...
    // })

    // After (HeliosDB) - Only address changes
    conn, err := clickhouse.Open(&clickhouse.Options{
        Addr: []string{"heliosdb.host:9000"},  // Change only this
        Auth: clickhouse.Auth{
            Database: "analytics",
            Username: "default",
            Password: "password",
        },
        Compression: &clickhouse.Compression{
            Method: clickhouse.CompressionLZ4,
        },
        Settings: clickhouse.Settings{
            "max_execution_time": 60,
        },
    })
    if err != nil {
        panic(err)
    }
    defer conn.Close()

    ctx := context.Background()

    rows, err := conn.Query(ctx, `
        SELECT event_type, count() AS cnt
        FROM events
        WHERE timestamp > now() - INTERVAL 1 DAY
        GROUP BY event_type
    `)
    if err != nil {
        panic(err)
    }
    defer rows.Close()

    for rows.Next() {
        var eventType string
        var count uint64
        rows.Scan(&eventType, &count)
        fmt.Printf("%s: %d\n", eventType, count)
    }
}

10.4 Node.js (@clickhouse/client)

const { createClient } = require('@clickhouse/client');

// Before (ClickHouse)
// const client = createClient({
//     host: 'http://clickhouse.host:8123',
//     ...
// });

// After (HeliosDB) - Only host changes
const client = createClient({
    host: 'http://heliosdb.host:8123',  // Change only this
    database: 'analytics',
    username: 'default',
    password: 'password',
    compression: {
        request: true,
        response: true,
    },
});

async function queryEvents() {
    const result = await client.query({
        query: `
            SELECT event_type, count() AS cnt
            FROM events
            WHERE timestamp > now() - INTERVAL 1 DAY
            GROUP BY event_type
            ORDER BY cnt DESC
        `,
        format: 'JSONEachRow',
    });

    const data = await result.json();
    for (const row of data) {
        console.log(`${row.event_type}: ${row.cnt}`);
    }
}

async function insertEvents() {
    const events = [
        { timestamp: new Date(), event_type: 'click', user_id: 1, value: 10.5 },
        { timestamp: new Date(), event_type: 'view', user_id: 2, value: 5.0 },
    ];

    await client.insert({
        table: 'events',
        values: events,
        format: 'JSONEachRow',
    });
}

queryEvents();

10.5 Java (JDBC)

import java.sql.*;
import java.util.Properties;

public class HeliosDBJDBC {
    public static void main(String[] args) throws SQLException {
        // Before (ClickHouse)
        // String url = "jdbc:clickhouse://clickhouse.host:8123/analytics";

        // After (HeliosDB) - Only host changes
        String url = "jdbc:clickhouse://heliosdb.host:8123/analytics";

        Properties props = new Properties();
        props.setProperty("user", "default");
        props.setProperty("password", "password");
        props.setProperty("compress", "true");

        try (Connection conn = DriverManager.getConnection(url, props);
             Statement stmt = conn.createStatement();
             ResultSet rs = stmt.executeQuery(
                 "SELECT event_type, count() AS cnt " +
                 "FROM events " +
                 "WHERE timestamp > now() - INTERVAL 1 DAY " +
                 "GROUP BY event_type"
             )) {

            while (rs.next()) {
                System.out.printf("%s: %d%n",
                    rs.getString("event_type"),
                    rs.getLong("cnt"));
            }
        }
    }
}

10.6 Java (Native Client)

import com.clickhouse.client.*;
import com.clickhouse.data.*;

public class HeliosDBNative {
    public static void main(String[] args) {
        // Before (ClickHouse)
        // ClickHouseNode server = ClickHouseNode.of("clickhouse.host:9000");

        // After (HeliosDB) - Only host changes
        ClickHouseNode server = ClickHouseNode.builder()
            .host("heliosdb.host")  // Change only this
            .port(9000)
            .database("analytics")
            .credentials(ClickHouseCredentials.fromUserAndPassword("default", "password"))
            .build();

        try (ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.NATIVE);
             ClickHouseResponse response = client.read(server)
                 .query("SELECT event_type, count() FROM events GROUP BY event_type")
                 .executeAndWait()) {

            for (ClickHouseRecord record : response.records()) {
                System.out.printf("%s: %d%n",
                    record.getValue(0).asString(),
                    record.getValue(1).asLong());
            }
        }
    }
}

10.7 HTTP API (curl)

# Before (ClickHouse)
# curl 'http://clickhouse.host:8123/' --data-binary "SELECT 1"

# After (HeliosDB) - Only host changes
curl 'http://heliosdb.host:8123/' \
    --data-binary "SELECT event_type, count() FROM events GROUP BY event_type"

# With authentication
curl 'http://heliosdb.host:8123/?user=default&password=password' \
    --data-binary "SELECT * FROM events LIMIT 10"

# With database
curl 'http://heliosdb.host:8123/?database=analytics' \
    --data-binary "SELECT count() FROM events"

# JSON output
curl 'http://heliosdb.host:8123/' \
    --data-binary "SELECT * FROM events FORMAT JSONEachRow"

# Insert data
curl 'http://heliosdb.host:8123/?query=INSERT%20INTO%20events%20FORMAT%20JSONEachRow' \
    --data-binary '{"timestamp":"2025-01-15 10:30:00","event_type":"click","user_id":1,"value":10.5}'

10.8 Environment Variable Configuration

# Before (ClickHouse)
export CLICKHOUSE_HOST=clickhouse.host

# After (HeliosDB)
export CLICKHOUSE_HOST=heliosdb.host

# Common environment variables
export CLICKHOUSE_PORT=9000
export CLICKHOUSE_HTTP_PORT=8123
export CLICKHOUSE_USER=default
export CLICKHOUSE_PASSWORD=password
export CLICKHOUSE_DATABASE=analytics

11. Performance Considerations

11.1 HeliosDB Performance Configuration

# heliosdb.toml - ClickHouse protocol optimization

[clickhouse]
# Network settings
listen_address = "0.0.0.0"
native_port = 9000
http_port = 8123
max_connections = 4096

[clickhouse.protocol]
# Enable compression
compression_enabled = true
compression_algorithms = ["lz4", "zstd"]
default_compression = "lz4"

# Increase buffer sizes
max_query_size = 268435456  # 256MB
max_insert_block_size = 1048576  # 1M rows

[clickhouse.query]
# Query optimization
max_threads = 16
max_execution_time = 600
max_memory_usage = 10737418240  # 10GB
use_uncompressed_cache = true
background_pool_size = 16

[clickhouse.mergetree]
# MergeTree settings
index_granularity = 8192
min_bytes_for_wide_part = 10485760  # 10MB
max_parts_in_total = 100000
merge_max_block_size = 8192

11.2 Query Performance Tips

Use PREWHERE

-- PREWHERE filters before reading columns
SELECT user_id, event_type, value
FROM events
PREWHERE timestamp > '2025-01-01'  -- Filter early
WHERE event_type = 'purchase';     -- Filter late

Optimize ORDER BY

-- Design ORDER BY to match common query patterns
CREATE TABLE events (
    timestamp DateTime,
    user_id UInt32,
    event_type String,
    value Float64
) ENGINE = MergeTree()
-- If you query by user_id and timestamp:
ORDER BY (user_id, timestamp)
PARTITION BY toYYYYMM(timestamp);

-- Queries matching ORDER BY are fast
SELECT * FROM events
WHERE user_id = 12345
  AND timestamp > '2025-01-01';

Use Sampling

-- Approximate queries on samples
SELECT event_type, count() * 10 AS estimated_count
FROM events SAMPLE 0.1
GROUP BY event_type;

-- Deterministic sampling
SELECT event_type, count() * 100 AS estimated_count
FROM events SAMPLE 0.01 OFFSET 0.5
GROUP BY event_type;

Batch Inserts

# Insert in batches of 10K-100K rows
batch_size = 50000
for i in range(0, len(data), batch_size):
    batch = data[i:i+batch_size]
    client.execute(
        'INSERT INTO events VALUES',
        batch,
        types_check=True
    )

11.3 Index Optimization

-- Skip indexes for filtering
CREATE TABLE logs (
    timestamp DateTime,
    level String,
    message String,
    INDEX idx_level level TYPE set(100) GRANULARITY 4,
    INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4
) ENGINE = MergeTree()
ORDER BY timestamp;

-- Bloom filter index for high-cardinality columns
CREATE TABLE events (
    timestamp DateTime,
    user_id UInt32,
    session_id UUID,
    INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);

11.4 Materialized Views for Pre-aggregation

-- Pre-aggregate frequently-queried metrics
CREATE MATERIALIZED VIEW hourly_metrics
ENGINE = AggregatingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    countState() AS count_state,
    sumState(value) AS sum_state,
    avgState(value) AS avg_state
FROM events
GROUP BY hour, event_type;

-- Query from materialized view (much faster)
SELECT
    hour,
    event_type,
    countMerge(count_state) AS total,
    sumMerge(sum_state) AS sum,
    avgMerge(avg_state) AS avg
FROM hourly_metrics
WHERE hour >= toStartOfDay(now())
GROUP BY hour, event_type;

11.5 Performance Benchmarks

Operation ClickHouse HeliosDB Notes
count() 1B rows 85ms 90ms Comparable
GROUP BY 100M rows 2.1s 2.0s Slightly faster
Bulk insert 1M rows 0.65s 0.67s Comparable
Complex JOIN 3.5s 3.2s Better optimizer
Window functions 4.2s 3.8s SIMD acceleration

11.6 Monitoring Performance

-- HeliosDB performance views
SELECT * FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_duration_ms > 1000
ORDER BY query_duration_ms DESC
LIMIT 20;

-- Check table sizes
SELECT
    database,
    name,
    formatReadableSize(total_bytes) AS size,
    formatReadableQuantity(total_rows) AS rows
FROM system.tables
WHERE database = 'analytics'
ORDER BY total_bytes DESC;

-- Check part status
SELECT
    table,
    partition,
    count() AS parts,
    sum(rows) AS rows,
    formatReadableSize(sum(bytes)) AS size
FROM system.parts
WHERE database = 'analytics'
  AND active
GROUP BY table, partition
ORDER BY parts DESC;

12. Troubleshooting Common Issues

12.1 Connection Issues

Issue: Connection Refused

Code: 210. DB::NetException: Connection refused

Solutions:

# Check HeliosDB is running
systemctl status heliosdb

# Verify port is listening
netstat -tlnp | grep -E '9000|8123'

# Check firewall
firewall-cmd --list-ports | grep -E '9000|8123'

# Test connectivity
nc -zv heliosdb.host 9000
nc -zv heliosdb.host 8123

Issue: Authentication Failed

Code: 516. DB::Exception: Authentication failed

Solutions:

# Ensure credentials are correct
client = Client(
    host='heliosdb.host',
    port=9000,
    user='default',
    password='correct_password',  # Verify password
    database='analytics'
)

12.2 Query Issues

Issue: Memory Limit Exceeded

Code: 241. DB::Exception: Memory limit exceeded

Solutions:

-- Increase memory limit for query
SET max_memory_usage = 20000000000;  -- 20GB

-- Or use sampling
SELECT ... FROM events SAMPLE 0.1 ...

-- Or break into smaller queries
SELECT ... FROM events WHERE timestamp >= '2025-01-01' AND timestamp < '2025-01-15';
SELECT ... FROM events WHERE timestamp >= '2025-01-15' AND timestamp < '2025-02-01';

Issue: Query Timeout

Code: 159. DB::Exception: Timeout exceeded

Solutions:

# Increase timeout
client = Client(
    host='heliosdb.host',
    port=9000,
    settings={
        'max_execution_time': 600,  # 10 minutes
        'send_receive_timeout': 600
    }
)

Issue: Too Many Parts

Code: 252. DB::Exception: Too many parts

Solutions:

-- Check part counts
SELECT table, count() AS parts
FROM system.parts
WHERE database = 'analytics' AND active
GROUP BY table
ORDER BY parts DESC;

-- Optimize tables
OPTIMIZE TABLE analytics.events FINAL;

-- Adjust settings
ALTER TABLE events MODIFY SETTING
    parts_to_delay_insert = 500,
    parts_to_throw_insert = 600;

12.3 Data Type Issues

Issue: Type Mismatch

Code: 53. DB::Exception: Type mismatch

Solutions:

# Ensure proper type conversion
from datetime import datetime

# Correct types
data = [
    (datetime.now(), 'click', 123, 10.5),  # DateTime, String, UInt32, Float64
]

client.execute(
    'INSERT INTO events (timestamp, event_type, user_id, value) VALUES',
    data,
    types_check=True  # Enable type checking
)

Issue: Nullable Type Errors

Code: 349. DB::Exception: Cannot insert NULL value

Solutions:

-- Check column nullability
DESCRIBE TABLE events;

-- Alter column to Nullable
ALTER TABLE events MODIFY COLUMN optional_field Nullable(String);

-- Or handle NULLs in insert
INSERT INTO events SELECT
    timestamp,
    event_type,
    COALESCE(user_id, 0) AS user_id,  -- Default for NULL
    value
FROM source_table;

12.4 Migration Issues

Issue: Schema Incompatibility

Solutions:

# Check for ReplicatedMergeTree engines
grep -i "Replicated" schema_export.sql

# Convert to standard engines
sed -i 's/ReplicatedMergeTree([^)]*)/MergeTree()/g' schema_export.sql

Issue: Data Loss During Migration

Solutions:

-- Verify row counts before and after
-- Source
SELECT count() FROM events;  -- On ClickHouse

-- Target
SELECT count() FROM events;  -- On HeliosDB

-- If mismatch, check for errors during migration
-- Re-run migration with smaller batches

Issue: Performance Degradation

Solutions:

-- Rebuild indexes after migration
OPTIMIZE TABLE events FINAL;

-- Update statistics
-- HeliosDB handles this automatically

-- Check query plans
EXPLAIN SELECT * FROM events WHERE user_id = 12345;

12.5 Materialized View Issues

Issue: View Not Updating

Solutions:

-- Check if view is attached
SELECT name, is_populated FROM system.tables WHERE engine = 'MaterializedView';

-- Detach and reattach
DETACH TABLE events_daily_mv;
ATTACH TABLE events_daily_mv;

-- Or recreate with POPULATE
DROP TABLE events_daily_mv;
CREATE MATERIALIZED VIEW events_daily_mv TO events_daily
POPULATE
AS SELECT ...;

12.6 TTL Issues

Issue: Data Not Expiring

Solutions:

-- Check TTL settings
SELECT name, engine_full FROM system.tables WHERE name = 'events';

-- Force TTL processing
ALTER TABLE events MATERIALIZE TTL;

-- Check TTL status
SELECT
    table,
    partition,
    rows,
    delete_ttl_info
FROM system.parts
WHERE table = 'events' AND active;


13. Post-Migration Validation

13.1 Row Count Verification

#!/usr/bin/env python3
"""
Validate row counts between ClickHouse and HeliosDB.
"""

from clickhouse_driver import Client

def validate_counts(source_host, target_host, database, tables):
    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    results = {}
    for table in tables:
        source_count = source.execute(f'SELECT count() FROM {table}')[0][0]
        target_count = target.execute(f'SELECT count() FROM {table}')[0][0]

        match = source_count == target_count
        results[table] = {
            'source': source_count,
            'target': target_count,
            'match': match
        }

        status = "OK" if match else "MISMATCH"
        print(f"{table}: Source={source_count:,} Target={target_count:,} [{status}]")

    return results

# Validate
tables = ['events', 'users', 'metrics']
validate_counts('clickhouse.host', 'heliosdb.host', 'analytics', tables)

13.2 Schema Validation

-- Compare schemas
-- On ClickHouse:
SHOW CREATE TABLE analytics.events;

-- On HeliosDB:
SHOW CREATE TABLE analytics.events;

-- Check column types
SELECT
    name,
    type,
    default_kind,
    default_expression
FROM system.columns
WHERE database = 'analytics' AND table = 'events';

13.3 Data Integrity Checks

def validate_sample_data(source_host, target_host, database, table,
                         key_column, sample_size=1000):
    """Compare sample rows between systems."""
    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    # Get sample keys
    keys = source.execute(f'''
        SELECT DISTINCT {key_column}
        FROM {table}
        ORDER BY rand()
        LIMIT {sample_size}
    ''')

    mismatches = 0
    for (key,) in keys:
        source_row = source.execute(
            f'SELECT * FROM {table} WHERE {key_column} = %(key)s',
            {'key': key}
        )
        target_row = target.execute(
            f'SELECT * FROM {table} WHERE {key_column} = %(key)s',
            {'key': key}
        )

        if source_row != target_row:
            mismatches += 1
            print(f"Mismatch for {key_column}={key}")

    print(f"Validation: {sample_size - mismatches}/{sample_size} rows match")
    return mismatches == 0

13.4 Query Performance Comparison

import time

def compare_query_performance(source_host, target_host, database, queries):
    """Compare query execution times."""
    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    for query in queries:
        # Source timing
        start = time.time()
        source.execute(query)
        source_time = time.time() - start

        # Target timing
        start = time.time()
        target.execute(query)
        target_time = time.time() - start

        diff = ((target_time - source_time) / source_time) * 100
        status = "faster" if diff < 0 else "slower"

        print(f"Query: {query[:50]}...")
        print(f"  ClickHouse: {source_time:.3f}s")
        print(f"  HeliosDB: {target_time:.3f}s ({abs(diff):.1f}% {status})")

# Compare
queries = [
    "SELECT count() FROM events",
    "SELECT event_type, count() FROM events GROUP BY event_type",
    "SELECT user_id, sum(value) FROM events GROUP BY user_id ORDER BY sum(value) DESC LIMIT 100",
]
compare_query_performance('clickhouse.host', 'heliosdb.host', 'analytics', queries)

13.5 Application Smoke Tests

  • [ ] All application endpoints responding correctly
  • [ ] Query latencies within acceptable range (< 10% variance)
  • [ ] No increase in error rates
  • [ ] Dashboard visualizations rendering correctly
  • [ ] Batch jobs completing successfully
  • [ ] Materialized views updating correctly
  • [ ] TTL expiration working as expected
  • [ ] INSERT operations completing successfully
  • [ ] Complex aggregations returning correct results

13.6 Automated Validation Script

#!/bin/bash
# Post-migration validation script

SOURCE_HOST="clickhouse.host"
TARGET_HOST="heliosdb.host"
DATABASE="analytics"

echo "=== Post-Migration Validation ==="

# 1. Connectivity test
echo "Testing connectivity..."
clickhouse-client --host $TARGET_HOST --query "SELECT 1" || exit 1

# 2. Compare table counts
echo "Comparing row counts..."
for table in events users metrics; do
    source_count=$(clickhouse-client --host $SOURCE_HOST \
        --query "SELECT count() FROM $DATABASE.$table")
    target_count=$(clickhouse-client --host $TARGET_HOST \
        --query "SELECT count() FROM $DATABASE.$table")

    if [ "$source_count" == "$target_count" ]; then
        echo "$table: OK ($source_count rows)"
    else
        echo "$table: MISMATCH (source=$source_count, target=$target_count)"
    fi
done

# 3. Query performance
echo "Testing query performance..."
for i in 1 2 3; do
    time clickhouse-client --host $TARGET_HOST \
        --query "SELECT event_type, count() FROM $DATABASE.events GROUP BY event_type" \
        > /dev/null
done

# 4. Test materialized views
echo "Testing materialized views..."
clickhouse-client --host $TARGET_HOST \
    --query "SELECT name FROM system.tables WHERE engine = 'MaterializedView'"

echo "=== Validation Complete ==="

Appendix

A. Quick Reference Commands

# Connect to HeliosDB
clickhouse-client --host heliosdb.host --port 9000

# Export schema
clickhouse-client --host clickhouse.host \
    --query "SELECT create_table_query FROM system.tables WHERE database = 'analytics'"

# Export data (Native format)
clickhouse-client --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT Native" > events.native

# Import data
clickhouse-client --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT Native" < events.native

# Verify counts
clickhouse-client --host heliosdb.host \
    --query "SELECT count() FROM analytics.events"

B. Migration Checklist Summary

Pre-Migration

  • [ ] Backup ClickHouse data
  • [ ] Document cluster topology
  • [ ] Inventory tables and engines
  • [ ] Analyze data volume
  • [ ] Review query patterns
  • [ ] Plan migration window

Migration

  • [ ] Export schemas
  • [ ] Convert Replicated* engines
  • [ ] Import schemas to HeliosDB
  • [ ] Migrate data (choose method)
  • [ ] Update application connections

Post-Migration

  • [ ] Verify row counts
  • [ ] Check schema integrity
  • [ ] Validate sample data
  • [ ] Compare query performance
  • [ ] Run application tests
  • [ ] Monitor error rates
  • [ ] Optimize tables

C. Sample Migration Timeline

Phase Duration Activities
Assessment 1-2 days Inventory, engine review, sizing
Preparation 1-3 days HeliosDB setup, schema conversion
Schema Migration 1-2 hours Export, convert, import
Data Migration Varies Depends on data size
Validation 1-2 days Testing, performance comparison
Cutover 1-4 hours Connection updates, final sync
Monitoring 1-2 weeks Performance monitoring

D. Compatibility Notes

Fully Compatible Features

  • Native TCP protocol (port 9000)
  • HTTP protocol (port 8123)
  • All MergeTree family engines
  • All standard data types
  • Materialized views
  • Dictionaries
  • TTL
  • PREWHERE/WHERE/FINAL
  • All JOIN types
  • Window functions
  • Aggregate functions

Migration Required

  • ReplicatedMergeTree -> MergeTree (HeliosDB handles replication)
  • ZooKeeper paths -> HeliosDB replication config
  • Cluster settings -> HeliosDB cluster config

Behavioral Differences

  • Replication: Uses HeliosDB native replication
  • Storage: Uses HeliosDB storage engine
  • Keeper: Not required (no ZooKeeper dependency)
  • Some advanced settings may differ


Need Help?


Document Version History:

Version Date Changes
1.0 January 2026 Initial release