ClickHouse to HeliosDB Migration Guide¶

Version: 1.0 Last Updated: January 2026 Compatibility: HeliosDB 7.0+, ClickHouse 22.x/23.x/24.x

Table of Contents¶

Introduction
Compatibility Overview
Pre-Migration Assessment
Connection String Migration
Data Type Mapping
Table Engine Mapping
Query Syntax Migration
Materialized View Migration
Data Migration Process
Application Connectivity
Performance Considerations
Troubleshooting Common Issues
Post-Migration Validation
Appendix

1. Introduction¶

Why Migrate from ClickHouse to HeliosDB?¶

HeliosDB provides 92% ClickHouse compatibility while offering significant advantages for modern data platforms:

Key Benefits¶

Benefit	Description
Multi-Protocol Access	Access analytics data via ClickHouse, PostgreSQL, MySQL, MongoDB, Redis, GraphQL, and REST
Unified Data Platform	Single system for OLTP, OLAP, and streaming workloads
AI/ML Integration	Native NL2SQL, vector search, and ML model inference
Simplified Operations	No ZooKeeper/Keeper dependency, automatic rebalancing
Enhanced ACID	Full ACID compliance beyond ClickHouse's eventual consistency
Time Travel	Point-in-time queries and data recovery
Multi-Model Support	Relational, document, graph, time-series, and vector in one database

Migration Complexity: Low to Moderate¶

ClickHouse migrations are straightforward due to:

100% Native Protocol compatibility (TCP port 9000)
100% HTTP Protocol compatibility (port 8123)
Standard ClickHouse drivers work without modification
Most SQL queries execute identically
MergeTree family engines fully supported

Scope¶

This guide covers:

Schema migration for all table types
Data migration strategies for various dataset sizes
Application connection string and driver updates
Table engine mapping to HeliosDB equivalents
Query syntax differences and translations
Materialized view migration
Performance optimization post-migration

Target Audience¶

Data engineers migrating ClickHouse analytics platforms
DevOps engineers modernizing data infrastructure
Architects evaluating unified data platforms
Application developers updating ClickHouse integrations

2. Compatibility Overview¶

Overall Compatibility: 92%¶

Category	Coverage	Status
Native Protocol (TCP)	100%	Complete
HTTP Protocol	100%	Complete
SQL Language	90%	Complete
Table Engines	85%	Core engines
Data Types	95%	All standard types
Aggregation Functions	95%	Full analytics support

Protocol Support¶

Native Protocol (TCP Port 9000)¶

Feature	Status	Notes
Connection	Supported	Full protocol compatibility
Authentication	Supported	Password-based, LDAP
Compression	Supported	LZ4, ZSTD, LZ4HC
SSL/TLS	Supported	Encrypted connections
Query Execution	Supported	Full SQL support
Batch Insert	Supported	High throughput
Prepared Statements	Supported	Query caching
Query Cancellation	Supported	Cancel running queries

HTTP Protocol (Port 8123)¶

Feature	Status	Notes
GET Queries	Supported	Read operations
POST Queries	Supported	Write operations
Streaming	Supported	Large result sets
Compression	Supported	gzip, deflate, br
JSON Output	Supported	Multiple formats
Progress Tracking	Supported	Query progress

Driver Compatibility Matrix¶

Language	Driver	Version	Status
Python	clickhouse-driver	0.2+	Fully Compatible
Python	clickhouse-connect	0.6+	Fully Compatible
Go	clickhouse-go	2.x	Fully Compatible
Node.js	@clickhouse/client	0.2+	Fully Compatible
Java	ClickHouse JDBC	0.4+	Fully Compatible
Java	ClickHouse Native	0.4+	Fully Compatible
Rust	clickhouse-rs	0.12+	Fully Compatible
C#	ClickHouse.Client	5.x	Fully Compatible
PHP	smi2/phpClickHouse	1.x	Fully Compatible

clickhouse-client Compatibility¶

# Connect to HeliosDB using standard clickhouse-client
clickhouse-client --host localhost --port 9000

# All standard options work
clickhouse-client \
    --host heliosdb.host \
    --port 9000 \
    --user default \
    --password '' \
    --database analytics \
    --query "SELECT count() FROM events"

3. Pre-Migration Assessment¶

3.1 Pre-Migration Checklist¶

[ ] Backup ClickHouse data (full backup)
[ ] Document cluster topology (replicas, shards)
[ ] Inventory all databases and tables
[ ] Analyze table engines in use
[ ] Catalog materialized views
[ ] Document dictionaries and external tables
[ ] Estimate total data volume
[ ] Review ZooKeeper/Keeper dependencies
[ ] Identify TTL requirements
[ ] List application connection configurations
[ ] Plan migration window and rollback strategy
[ ] Test migration in staging environment

3.2 Database and Table Inventory¶

Export Schema Information¶

# Export all database schemas
clickhouse-client --query "SHOW DATABASES" > databases.txt

# Export table definitions for each database
for db in $(cat databases.txt | grep -v system); do
    clickhouse-client --query "SHOW CREATE TABLE $db.*" > schema_${db}.sql 2>/dev/null
done

# Export complete schema
clickhouse-client --query "
    SELECT database, name, engine, partition_key, sorting_key,
           primary_key, total_rows, total_bytes
    FROM system.tables
    WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
" --format TSV > table_inventory.tsv

Analyze Table Engines¶

-- List all table engines in use
SELECT
    engine,
    count() AS table_count,
    sum(total_rows) AS total_rows,
    formatReadableSize(sum(total_bytes)) AS total_size
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
GROUP BY engine
ORDER BY table_count DESC;

-- Detailed table information
SELECT
    database,
    name,
    engine,
    partition_key,
    sorting_key,
    primary_key,
    total_rows,
    formatReadableSize(total_bytes) as size
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
ORDER BY total_bytes DESC;

3.3 Feature Usage Inventory¶

Document usage of the following features:

Feature	Used?	Tables Affected	HeliosDB Support
MergeTree			Full
ReplacingMergeTree			Full
SummingMergeTree			Full
AggregatingMergeTree			Full
CollapsingMergeTree			Full
VersionedCollapsingMergeTree			Full
Distributed Tables			Full
Materialized Views			Full
TTL			Full
Dictionaries			Full
S3/External Tables			Full
Kafka Integration			Full
Replicated* Engines			Different approach

3.4 Data Volume Estimation¶

-- Estimate total data size
SELECT
    sum(total_bytes) AS bytes,
    formatReadableSize(sum(total_bytes)) AS readable_size,
    sum(total_rows) AS total_rows
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA');

-- Per-database breakdown
SELECT
    database,
    count() AS tables,
    formatReadableSize(sum(total_bytes)) AS size,
    sum(total_rows) AS rows
FROM system.tables
WHERE database NOT IN ('system', 'INFORMATION_SCHEMA')
GROUP BY database
ORDER BY sum(total_bytes) DESC;

Storage Planning for HeliosDB¶

ClickHouse Size	Recommended HeliosDB Storage	Notes
< 100 GB	1.1x ClickHouse size	Similar compression
100 GB - 1 TB	1.0x - 1.1x	Comparable efficiency
> 1 TB	0.95x - 1.0x	Intelligent compression

3.5 Query Pattern Analysis¶

-- Analyze recent query patterns
SELECT
    type,
    query_kind,
    count() AS query_count,
    avg(query_duration_ms) AS avg_duration_ms,
    sum(read_rows) AS total_rows_read
FROM system.query_log
WHERE event_time > now() - INTERVAL 7 DAY
  AND type = 'QueryFinish'
GROUP BY type, query_kind
ORDER BY query_count DESC;

-- Identify heavy queries
SELECT
    query,
    count() AS executions,
    avg(query_duration_ms) AS avg_ms,
    formatReadableSize(avg(read_bytes)) AS avg_read
FROM system.query_log
WHERE event_time > now() - INTERVAL 7 DAY
  AND type = 'QueryFinish'
GROUP BY query
ORDER BY avg_ms DESC
LIMIT 20;

4. Connection String Migration¶

4.1 Connection String Format¶

ClickHouse and HeliosDB use identical connection formats:

Before (ClickHouse):

clickhouse://user:password@clickhouse.host:9000/database

After (HeliosDB):

clickhouse://user:password@heliosdb.host:9000/database

4.2 Native Protocol Connection¶

Basic Connection (Host Change Only)¶

ClickHouse:

from clickhouse_driver import Client

client = Client(
    host='clickhouse.host',
    port=9000,
    user='default',
    password='mypassword',
    database='analytics'
)

HeliosDB (Only host changes):

from clickhouse_driver import Client

client = Client(
    host='heliosdb.host',  # Only this changes
    port=9000,
    user='default',
    password='mypassword',
    database='analytics'
)

4.3 HTTP Protocol Connection¶

REST API Endpoint¶

ClickHouse:

curl 'http://clickhouse.host:8123/?query=SELECT%201'

HeliosDB:

curl 'http://heliosdb.host:8123/?query=SELECT%201'

4.4 Connection Parameters Reference¶

Parameter	Default	Description	HeliosDB Support
`host`	localhost	Server hostname	Full
`port`	9000	Native protocol port	Full
`http_port`	8123	HTTP protocol port	Full
`user`	default	Username	Full
`password`	-	Password	Full
`database`	default	Database name	Full
`compression`	lz4	Compression type	LZ4, ZSTD
`secure`	false	Use TLS	Full
`verify`	true	Verify TLS certs	Full
`connect_timeout`	10	Connection timeout (s)	Full
`send_receive_timeout`	300	Query timeout (s)	Full
`sync_request_timeout`	5	Sync timeout (s)	Full

4.5 SSL/TLS Configuration¶

from clickhouse_driver import Client

# Secure connection with TLS
client = Client(
    host='heliosdb.host',
    port=9440,  # Secure native port
    user='default',
    password='mypassword',
    database='analytics',
    secure=True,
    verify=True,
    ca_certs='/path/to/ca.crt'
)

4.6 Connection Pool Configuration¶

from clickhouse_driver import Client

# Production connection pool settings
client = Client(
    host='heliosdb.host',
    port=9000,
    user='default',
    password='mypassword',
    database='analytics',
    compression=True,
    settings={
        'max_threads': 8,
        'max_execution_time': 300,
        'max_memory_usage': 10000000000,  # 10GB
        'connect_timeout': 10,
        'send_receive_timeout': 300
    }
)

5. Data Type Mapping¶

5.1 Numeric Types¶

ClickHouse Type	HeliosDB Type	Notes
UInt8	UInt8	Identical
UInt16	UInt16	Identical
UInt32	UInt32	Identical
UInt64	UInt64	Identical
UInt128	UInt128	Identical
UInt256	UInt256	Identical
Int8	Int8	Identical
Int16	Int16	Identical
Int32	Int32	Identical
Int64	Int64	Identical
Int128	Int128	Identical
Int256	Int256	Identical
Float32	Float32	IEEE 754
Float64	Float64	IEEE 754
Decimal(P, S)	Decimal(P, S)	Arbitrary precision
Decimal32(S)	Decimal32(S)	9 digit precision
Decimal64(S)	Decimal64(S)	18 digit precision
Decimal128(S)	Decimal128(S)	38 digit precision

5.2 String Types¶

ClickHouse Type	HeliosDB Type	Notes
String	String	Variable length UTF-8
FixedString(N)	FixedString(N)	Fixed byte length
UUID	UUID	128-bit UUID
IPv4	IPv4	IPv4 address
IPv6	IPv6	IPv6 address
Enum8	Enum8	8-bit enumeration
Enum16	Enum16	16-bit enumeration

5.3 Date and Time Types¶

ClickHouse Type	HeliosDB Type	Notes
Date	Date	Days since epoch
Date32	Date32	Extended range
DateTime	DateTime	Second precision
DateTime64(N)	DateTime64(N)	Subsecond precision
DateTime64(N, tz)	DateTime64(N, tz)	With timezone

5.4 Composite Types¶

ClickHouse Type	HeliosDB Type	Notes
Array(T)	Array(T)	Dynamic arrays
Tuple(T1, T2, ...)	Tuple(T1, T2, ...)	Fixed-size tuples
Map(K, V)	Map(K, V)	Key-value maps
Nested(name Type, ...)	Nested(name Type, ...)	Nested columns
Nullable(T)	Nullable(T)	NULL support
LowCardinality(T)	LowCardinality(T)	Dictionary encoding

5.5 Special Types¶

ClickHouse Type	HeliosDB Type	Notes
Bool	Bool	Boolean
JSON	JSON	JSON objects
Object('json')	Object('json')	JSON objects
Point	Point	Geospatial point
Ring	Ring	Geospatial ring
Polygon	Polygon	Geospatial polygon
MultiPolygon	MultiPolygon	Geospatial multi-polygon

5.6 Aggregate Function Types¶

ClickHouse Type	HeliosDB Type	Notes
AggregateFunction(name, T)	AggregateFunction(name, T)	Aggregate state
SimpleAggregateFunction(name, T)	SimpleAggregateFunction(name, T)	Simple aggregate

5.7 Type Conversion Examples¶

-- All type casts work identically

-- Numeric conversions
SELECT toUInt32(123.45);                    -- 123
SELECT toFloat64('123.456');                -- 123.456
SELECT toDecimal64(123.456789, 4);          -- 123.4568

-- String conversions
SELECT toString(12345);                      -- '12345'
SELECT toFixedString('hello', 10);          -- 'hello\0\0\0\0\0'

-- Date/Time conversions
SELECT toDate('2025-01-15');                -- 2025-01-15
SELECT toDateTime('2025-01-15 10:30:00');   -- 2025-01-15 10:30:00
SELECT toDateTime64('2025-01-15 10:30:00.123', 3);

-- Array conversions
SELECT toTypeName([1, 2, 3]);               -- Array(UInt8)
SELECT array(1, 2, 3);                      -- [1, 2, 3]

-- Nullable handling
SELECT toNullable(123);
SELECT assumeNotNull(nullable_column);

6. Table Engine Mapping¶

6.1 MergeTree Family¶

MergeTree¶

-- ClickHouse
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
SETTINGS index_granularity = 8192;

-- HeliosDB (identical syntax)
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
SETTINGS index_granularity = 8192;

ReplacingMergeTree¶

-- For upsert/deduplication scenarios
CREATE TABLE user_states (
    user_id UInt32,
    version UInt64,
    name String,
    status String,
    updated_at DateTime
) ENGINE = ReplacingMergeTree(version)
PARTITION BY toYYYYMM(updated_at)
ORDER BY user_id;

-- Query with deduplication
SELECT * FROM user_states FINAL WHERE user_id = 12345;

SummingMergeTree¶

-- For pre-aggregated metrics
CREATE TABLE metrics_daily (
    date Date,
    metric_name String,
    total_value Float64,
    event_count UInt64
) ENGINE = SummingMergeTree((total_value, event_count))
PARTITION BY toYYYYMM(date)
ORDER BY (date, metric_name);

-- Rows with same ORDER BY key are summed
INSERT INTO metrics_daily VALUES ('2025-01-15', 'clicks', 100, 10);
INSERT INTO metrics_daily VALUES ('2025-01-15', 'clicks', 50, 5);
-- After merge: ('2025-01-15', 'clicks', 150, 15)

AggregatingMergeTree¶

-- For complex pre-aggregations
CREATE TABLE events_hourly (
    hour DateTime,
    event_type String,
    count_state AggregateFunction(count, UInt64),
    sum_value_state AggregateFunction(sum, Float64),
    uniq_users_state AggregateFunction(uniq, UInt32)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, event_type);

-- Insert with aggregate functions
INSERT INTO events_hourly
SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    countState() AS count_state,
    sumState(value) AS sum_value_state,
    uniqState(user_id) AS uniq_users_state
FROM events
GROUP BY hour, event_type;

-- Query with merge
SELECT
    hour,
    event_type,
    countMerge(count_state) AS total_count,
    sumMerge(sum_value_state) AS total_value,
    uniqMerge(uniq_users_state) AS unique_users
FROM events_hourly
GROUP BY hour, event_type;

CollapsingMergeTree¶

-- For state change tracking
CREATE TABLE user_sessions (
    user_id UInt32,
    session_start DateTime,
    session_duration UInt32,
    page_views UInt32,
    sign Int8  -- 1 for insert, -1 for delete
) ENGINE = CollapsingMergeTree(sign)
PARTITION BY toYYYYMM(session_start)
ORDER BY (user_id, session_start);

-- Insert initial state
INSERT INTO user_sessions VALUES (1, '2025-01-15 10:00:00', 300, 5, 1);

-- Update by inserting old row with -1 and new row with 1
INSERT INTO user_sessions VALUES
    (1, '2025-01-15 10:00:00', 300, 5, -1),   -- Cancel old
    (1, '2025-01-15 10:00:00', 600, 10, 1);   -- Insert new

-- Query with collapsing
SELECT
    user_id,
    session_start,
    sum(session_duration * sign) AS duration,
    sum(page_views * sign) AS pages
FROM user_sessions
GROUP BY user_id, session_start
HAVING sum(sign) > 0;

VersionedCollapsingMergeTree¶

-- For ordered state changes with versions
CREATE TABLE user_states_versioned (
    user_id UInt32,
    status String,
    version UInt64,
    sign Int8
) ENGINE = VersionedCollapsingMergeTree(sign, version)
ORDER BY user_id;

6.2 Integration Engines¶

Distributed Tables¶

-- ClickHouse distributed table
CREATE TABLE events_distributed AS events
ENGINE = Distributed(cluster_name, database, events, rand());

-- HeliosDB (same syntax, uses internal distribution)
CREATE TABLE events_distributed AS events
ENGINE = Distributed(default, analytics, events, rand());

Kafka Integration¶

-- Kafka source table
CREATE TABLE events_kafka (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = Kafka()
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'events',
    kafka_group_name = 'heliosdb_consumer',
    kafka_format = 'JSONEachRow';

-- Materialized view to persist data
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM events_kafka;

S3 Tables¶

-- External S3 table
CREATE TABLE s3_data (
    timestamp DateTime,
    event_type String,
    value Float64
) ENGINE = S3(
    'https://bucket.s3.amazonaws.com/data/*.parquet',
    'AWS_ACCESS_KEY',
    'AWS_SECRET_KEY',
    'Parquet'
);

-- Query external data
SELECT event_type, sum(value)
FROM s3_data
GROUP BY event_type;

6.3 Special Engines¶

Memory Engine¶

-- In-memory table (same syntax)
CREATE TABLE temp_data (
    id UInt32,
    value String
) ENGINE = Memory;

Log Engines¶

-- Simple log table
CREATE TABLE log_data (
    timestamp DateTime,
    message String
) ENGINE = Log;

-- Tiny log for small tables
CREATE TABLE config_data (
    key String,
    value String
) ENGINE = TinyLog;

Dictionary Engine¶

-- Create dictionary
CREATE DICTIONARY geo_dict (
    country_code String,
    country_name String,
    population UInt64
)
PRIMARY KEY country_code
SOURCE(CLICKHOUSE(
    HOST 'localhost'
    PORT 9000
    USER 'default'
    TABLE 'countries'
    DB 'reference'
))
LIFETIME(MIN 3600 MAX 7200)
LAYOUT(FLAT());

-- Use in queries
SELECT dictGet('geo_dict', 'country_name', country_code) AS country
FROM events;

6.4 Engine Migration Matrix¶

ClickHouse Engine	HeliosDB Equivalent	Migration Complexity
MergeTree	MergeTree	None (identical)
ReplacingMergeTree	ReplacingMergeTree	None (identical)
SummingMergeTree	SummingMergeTree	None (identical)
AggregatingMergeTree	AggregatingMergeTree	None (identical)
CollapsingMergeTree	CollapsingMergeTree	None (identical)
VersionedCollapsingMergeTree	VersionedCollapsingMergeTree	None (identical)
Distributed	Distributed	Cluster config changes
ReplicatedMergeTree	MergeTree + HeliosDB replication	Schema change
Kafka	Kafka	Connection config only
S3	S3	Credential config only
Memory	Memory	None (identical)
Log/TinyLog	Log/TinyLog	None (identical)
Dictionary	Dictionary	None (identical)

6.5 Replicated Engine Migration¶

ClickHouse Replicated* engines use ZooKeeper/Keeper. HeliosDB uses its own replication:

ClickHouse:

CREATE TABLE events_replicated (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

HeliosDB:

-- Use standard MergeTree; replication is automatic
CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

-- Configure replication in HeliosDB settings
-- No ZooKeeper/Keeper required

7. Query Syntax Migration¶

7.1 Identical Syntax (No Changes Required)¶

Most ClickHouse queries work identically in HeliosDB:

-- SELECT with aggregations
SELECT
    event_type,
    count() AS events,
    count(DISTINCT user_id) AS unique_users,
    sum(value) AS total_value,
    avg(value) AS avg_value,
    min(value) AS min_value,
    max(value) AS max_value
FROM events
WHERE timestamp >= '2025-01-01'
GROUP BY event_type
ORDER BY events DESC
LIMIT 100;

-- PREWHERE for optimization
SELECT *
FROM events
PREWHERE timestamp >= '2025-01-01'
WHERE event_type = 'purchase';

-- SAMPLE for approximate queries
SELECT event_type, count() * 10 AS estimated_count
FROM events SAMPLE 0.1
GROUP BY event_type;

-- FINAL for deduplication
SELECT * FROM user_states FINAL
WHERE user_id = 12345;

-- WITH clauses (CTEs)
WITH daily_totals AS (
    SELECT
        toDate(timestamp) AS date,
        sum(value) AS total
    FROM events
    GROUP BY date
)
SELECT
    date,
    total,
    total - lagInFrame(total) OVER (ORDER BY date) AS change
FROM daily_totals;

7.2 Window Functions¶

-- All window functions supported
SELECT
    timestamp,
    user_id,
    value,
    row_number() OVER (PARTITION BY user_id ORDER BY timestamp) AS row_num,
    rank() OVER (PARTITION BY user_id ORDER BY value DESC) AS value_rank,
    dense_rank() OVER (PARTITION BY user_id ORDER BY value DESC) AS dense_value_rank,
    sum(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS running_total,
    avg(value) OVER (
        PARTITION BY user_id
        ORDER BY timestamp
        ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) AS moving_avg,
    lead(value, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS next_value,
    lag(value, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS prev_value,
    first_value(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS first_val,
    last_value(value) OVER (
        PARTITION BY user_id
        ORDER BY timestamp
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_val
FROM events
ORDER BY user_id, timestamp;

7.3 JOIN Operations¶

-- All JOIN types supported
-- INNER JOIN
SELECT e.*, u.name
FROM events e
INNER JOIN users u ON e.user_id = u.user_id;

-- LEFT JOIN
SELECT e.*, u.name
FROM events e
LEFT JOIN users u ON e.user_id = u.user_id;

-- RIGHT JOIN
SELECT e.*, u.name
FROM events e
RIGHT JOIN users u ON e.user_id = u.user_id;

-- FULL OUTER JOIN
SELECT e.*, u.name
FROM events e
FULL JOIN users u ON e.user_id = u.user_id;

-- CROSS JOIN
SELECT *
FROM events e
CROSS JOIN config c;

-- ASOF JOIN (time-series join)
SELECT
    e.timestamp,
    e.user_id,
    e.value,
    p.price
FROM events e
ASOF LEFT JOIN prices p
ON e.product_id = p.product_id
AND e.timestamp >= p.effective_date;

-- GLOBAL JOIN for distributed tables
SELECT *
FROM events_distributed e
GLOBAL JOIN users_distributed u ON e.user_id = u.user_id;

7.4 Aggregation Functions¶

-- Standard aggregations
SELECT
    count(),
    countDistinct(user_id),
    sum(value),
    avg(value),
    min(value),
    max(value),
    any(event_type),
    anyHeavy(event_type),
    anyLast(event_type)
FROM events;

-- Statistical functions
SELECT
    stddevPop(value) AS stddev,
    stddevSamp(value) AS stddev_sample,
    varPop(value) AS variance,
    varSamp(value) AS variance_sample,
    covarPop(value, other_value) AS covariance,
    corr(value, other_value) AS correlation
FROM events;

-- Quantile functions
SELECT
    quantile(0.5)(value) AS median,
    quantile(0.95)(value) AS p95,
    quantile(0.99)(value) AS p99,
    quantileExact(0.5)(value) AS exact_median,
    quantiles(0.25, 0.5, 0.75)(value) AS quartiles,
    quantileTiming(0.95)(response_ms) AS timing_p95
FROM events;

-- Unique count functions
SELECT
    uniq(user_id) AS approx_unique,
    uniqExact(user_id) AS exact_unique,
    uniqCombined(user_id) AS combined_unique,
    uniqHLL12(user_id) AS hll_unique
FROM events;

-- TopK functions
SELECT
    topK(10)(event_type) AS top_events,
    topKWeighted(10)(event_type, value) AS weighted_top
FROM events;

-- Conditional aggregations
SELECT
    countIf(event_type = 'purchase') AS purchases,
    sumIf(value, event_type = 'purchase') AS purchase_value,
    avgIf(value, event_type = 'view') AS avg_view_value
FROM events;

7.5 Array Functions¶

-- Array operations
SELECT
    [1, 2, 3] AS arr,
    array(1, 2, 3) AS arr2,
    arrayJoin([1, 2, 3]) AS expanded,
    arrayMap(x -> x * 2, [1, 2, 3]) AS doubled,
    arrayFilter(x -> x > 1, [1, 2, 3]) AS filtered,
    arrayReduce('sum', [1, 2, 3]) AS reduced,
    arraySort([3, 1, 2]) AS sorted,
    arrayReverse([1, 2, 3]) AS reversed,
    arraySlice([1, 2, 3, 4, 5], 2, 3) AS sliced,
    arrayConcat([1, 2], [3, 4]) AS concatenated,
    has([1, 2, 3], 2) AS contains,
    indexOf([1, 2, 3], 2) AS position,
    length([1, 2, 3]) AS len,
    arrayUniq([1, 1, 2, 2, 3]) AS unique_count;

-- Array aggregations
SELECT
    groupArray(user_id) AS user_ids,
    groupArrayDistinct(user_id) AS unique_user_ids,
    groupUniqArray(user_id) AS uniq_array
FROM events
WHERE event_type = 'purchase';

7.6 String Functions¶

-- String operations
SELECT
    concat('Hello', ' ', 'World') AS concatenated,
    substring('ClickHouse', 1, 5) AS sub,
    length('Hello') AS len,
    lower('HELLO') AS lowered,
    upper('hello') AS uppered,
    trim('  hello  ') AS trimmed,
    ltrim('  hello') AS ltrimmed,
    rtrim('hello  ') AS rtrimmed,
    splitByChar(',', 'a,b,c') AS split,
    splitByString('::', 'a::b::c') AS split_str,
    replaceAll('hello world', 'world', 'universe') AS replaced,
    reverse('hello') AS reversed,
    position('hello', 'l') AS pos,
    match('hello', 'e.+o') AS regex_match,
    extract('hello123world', '\\d+') AS extracted;

-- Format functions
SELECT
    format('{} {}', 'Hello', 'World') AS formatted,
    formatReadableSize(1024 * 1024 * 1024) AS readable_size,
    formatReadableQuantity(1000000) AS readable_qty;

7.7 Date/Time Functions¶

-- Date/Time operations
SELECT
    now() AS current_time,
    today() AS current_date,
    yesterday() AS prev_date,
    toDate('2025-01-15') AS date,
    toDateTime('2025-01-15 10:30:00') AS datetime,
    toDateTime64('2025-01-15 10:30:00.123456', 6) AS datetime64,
    toYear(now()) AS year,
    toMonth(now()) AS month,
    toDayOfMonth(now()) AS day,
    toHour(now()) AS hour,
    toMinute(now()) AS minute,
    toSecond(now()) AS second,
    toDayOfWeek(now()) AS day_of_week,
    toDayOfYear(now()) AS day_of_year,
    toStartOfDay(now()) AS start_of_day,
    toStartOfHour(now()) AS start_of_hour,
    toStartOfMinute(now()) AS start_of_minute,
    toStartOfMonth(now()) AS start_of_month,
    toStartOfQuarter(now()) AS start_of_quarter,
    toStartOfYear(now()) AS start_of_year,
    toStartOfWeek(now()) AS start_of_week,
    date_add(day, 7, today()) AS week_later,
    date_sub(month, 1, today()) AS month_ago,
    dateDiff('day', '2025-01-01', '2025-01-15') AS days_diff;

7.8 Conditional Functions¶

-- Conditional operations
SELECT
    if(value > 100, 'high', 'low') AS category,
    multiIf(
        value > 1000, 'very high',
        value > 100, 'high',
        value > 10, 'medium',
        'low'
    ) AS multi_category,
    CASE
        WHEN value > 1000 THEN 'very high'
        WHEN value > 100 THEN 'high'
        WHEN value > 10 THEN 'medium'
        ELSE 'low'
    END AS case_category,
    coalesce(nullable_value, 0) AS with_default,
    ifNull(nullable_value, 0) AS if_null,
    nullIf(value, 0) AS null_if_zero
FROM events;

8. Materialized View Migration¶

8.1 Standard Materialized Views¶

Materialized views work identically in HeliosDB:

-- Create target table
CREATE TABLE events_daily (
    date Date,
    event_type String,
    total_events UInt64,
    total_value Float64,
    unique_users UInt64
) ENGINE = SummingMergeTree((total_events, total_value, unique_users))
PARTITION BY toYYYYMM(date)
ORDER BY (date, event_type);

-- Create materialized view
CREATE MATERIALIZED VIEW events_daily_mv TO events_daily AS
SELECT
    toDate(timestamp) AS date,
    event_type,
    count() AS total_events,
    sum(value) AS total_value,
    uniq(user_id) AS unique_users
FROM events
GROUP BY date, event_type;

8.2 Aggregating Materialized Views¶

-- Hourly aggregations with AggregatingMergeTree
CREATE TABLE metrics_hourly (
    hour DateTime,
    metric_name String,
    count_state AggregateFunction(count, UInt64),
    sum_state AggregateFunction(sum, Float64),
    avg_state AggregateFunction(avg, Float64),
    min_state AggregateFunction(min, Float64),
    max_state AggregateFunction(max, Float64),
    uniq_state AggregateFunction(uniq, UInt32),
    quantile_state AggregateFunction(quantile(0.95), Float64)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, metric_name);

CREATE MATERIALIZED VIEW metrics_hourly_mv TO metrics_hourly AS
SELECT
    toStartOfHour(timestamp) AS hour,
    metric_name,
    countState() AS count_state,
    sumState(value) AS sum_state,
    avgState(value) AS avg_state,
    minState(value) AS min_state,
    maxState(value) AS max_state,
    uniqState(user_id) AS uniq_state,
    quantileState(0.95)(value) AS quantile_state
FROM metrics
GROUP BY hour, metric_name;

-- Query with merge
SELECT
    hour,
    metric_name,
    countMerge(count_state) AS total_count,
    sumMerge(sum_state) AS total_sum,
    avgMerge(avg_state) AS average,
    minMerge(min_state) AS minimum,
    maxMerge(max_state) AS maximum,
    uniqMerge(uniq_state) AS unique_count,
    quantileMerge(0.95)(quantile_state) AS p95
FROM metrics_hourly
WHERE hour >= toStartOfDay(now())
GROUP BY hour, metric_name
ORDER BY hour;

8.3 Cascading Materialized Views¶

-- First level: Hourly
CREATE TABLE events_hourly (
    hour DateTime,
    event_type String,
    count UInt64,
    value_sum Float64
) ENGINE = SummingMergeTree((count, value_sum))
ORDER BY (hour, event_type);

CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly AS
SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    count() AS count,
    sum(value) AS value_sum
FROM events
GROUP BY hour, event_type;

-- Second level: Daily (aggregates from hourly)
CREATE TABLE events_daily_summary (
    date Date,
    event_type String,
    count UInt64,
    value_sum Float64
) ENGINE = SummingMergeTree((count, value_sum))
ORDER BY (date, event_type);

CREATE MATERIALIZED VIEW events_daily_mv TO events_daily_summary AS
SELECT
    toDate(hour) AS date,
    event_type,
    sum(count) AS count,
    sum(value_sum) AS value_sum
FROM events_hourly
GROUP BY date, event_type;

8.4 Materialized View with POPULATE¶

-- Create materialized view and populate with existing data
CREATE MATERIALIZED VIEW user_stats_mv TO user_stats
POPULATE
AS SELECT
    user_id,
    count() AS total_events,
    sum(value) AS total_value,
    max(timestamp) AS last_activity
FROM events
GROUP BY user_id;

9. Data Migration Process¶

9.1 Migration Strategy Selection¶

Data Size	Recommended Method	Estimated Time
< 10 GB	INSERT SELECT via link	Minutes
10-100 GB	COPY with files	30 min - 2 hours
100 GB - 1 TB	Parallel export/import	2-8 hours
> 1 TB	Streaming or incremental	Hours to days

9.2 Method 1: Direct INSERT SELECT (Small Datasets)¶

-- Connect HeliosDB to ClickHouse as external source
-- Then copy data directly

-- In HeliosDB:
CREATE TABLE events_clickhouse (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float64
) ENGINE = MySQL('clickhouse.host:9004', 'analytics', 'events', 'user', 'pass');

-- Copy data
INSERT INTO events SELECT * FROM events_clickhouse;

9.3 Method 2: Export/Import with Files¶

Export from ClickHouse¶

# Export to Native format (fastest)
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT Native" \
    > events.native

# Export to CSV
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT CSVWithNames" \
    > events.csv

# Export to Parquet (compressed)
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT Parquet" \
    > events.parquet

# Export to JSON
clickhouse-client \
    --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT JSONEachRow" \
    > events.json

Import to HeliosDB¶

# Import from Native format
clickhouse-client \
    --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT Native" \
    < events.native

# Import from CSV
clickhouse-client \
    --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT CSVWithNames" \
    < events.csv

# Import from Parquet
clickhouse-client \
    --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT Parquet" \
    < events.parquet

9.4 Method 3: Python Migration Script¶

#!/usr/bin/env python3
"""
ClickHouse to HeliosDB Migration Script
Handles large tables with batching and progress tracking.
"""

from clickhouse_driver import Client
import time
import sys

def migrate_table(source_host, target_host, database, table, batch_size=100000):
    """Migrate a single table with batching."""

    source = Client(
        host=source_host,
        port=9000,
        database=database
    )

    target = Client(
        host=target_host,
        port=9000,
        database=database
    )

    # Get row count
    total_rows = source.execute(f'SELECT count() FROM {table}')[0][0]
    print(f"Migrating {table}: {total_rows:,} rows")

    # Get column info
    columns_result = source.execute(f'DESCRIBE TABLE {table}')
    columns = ', '.join([col[0] for col in columns_result])

    # Migrate in batches
    offset = 0
    migrated = 0
    start_time = time.time()

    while offset < total_rows:
        # Read batch from source
        rows = source.execute(
            f'SELECT {columns} FROM {table} LIMIT {batch_size} OFFSET {offset}'
        )

        if not rows:
            break

        # Write batch to target
        target.execute(
            f'INSERT INTO {table} ({columns}) VALUES',
            rows
        )

        offset += batch_size
        migrated += len(rows)

        # Progress update
        elapsed = time.time() - start_time
        rate = migrated / elapsed if elapsed > 0 else 0
        progress = (migrated / total_rows) * 100
        eta = (total_rows - migrated) / rate if rate > 0 else 0

        print(f"  Progress: {progress:.1f}% ({migrated:,}/{total_rows:,}) "
              f"Rate: {rate:,.0f} rows/sec ETA: {eta:.0f}s")

    elapsed = time.time() - start_time
    print(f"Completed {table}: {migrated:,} rows in {elapsed:.1f}s "
          f"({migrated/elapsed:,.0f} rows/sec)")

    return migrated

def migrate_database(source_host, target_host, database, tables=None):
    """Migrate entire database or specific tables."""

    source = Client(host=source_host, port=9000, database=database)

    # Get tables if not specified
    if tables is None:
        result = source.execute(
            f"SELECT name FROM system.tables WHERE database = '{database}'"
        )
        tables = [row[0] for row in result]

    print(f"Migrating {len(tables)} tables from {database}")

    total_migrated = 0
    for table in tables:
        try:
            migrated = migrate_table(
                source_host, target_host, database, table
            )
            total_migrated += migrated
        except Exception as e:
            print(f"Error migrating {table}: {e}")
            continue

    print(f"\nTotal migrated: {total_migrated:,} rows")
    return total_migrated

if __name__ == "__main__":
    # Configuration
    SOURCE_HOST = 'clickhouse.host'
    TARGET_HOST = 'heliosdb.host'
    DATABASE = 'analytics'

    # Migrate all tables
    migrate_database(SOURCE_HOST, TARGET_HOST, DATABASE)

9.5 Method 4: Parallel Migration for Large Datasets¶

#!/usr/bin/env python3
"""
Parallel migration for large ClickHouse tables.
Uses multiple workers for concurrent data transfer.
"""

from clickhouse_driver import Client
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import time

class ParallelMigrator:
    def __init__(self, source_host, target_host, database, num_workers=4):
        self.source_host = source_host
        self.target_host = target_host
        self.database = database
        self.num_workers = num_workers
        self.lock = threading.Lock()
        self.total_migrated = 0

    def get_client(self, host):
        return Client(
            host=host,
            port=9000,
            database=self.database
        )

    def migrate_partition(self, table, partition_id, columns):
        """Migrate a single partition."""
        source = self.get_client(self.source_host)
        target = self.get_client(self.target_host)

        # Export partition data
        rows = source.execute(f"""
            SELECT {columns} FROM {table}
            WHERE _partition_id = '{partition_id}'
        """)

        if rows:
            target.execute(
                f'INSERT INTO {table} ({columns}) VALUES',
                rows
            )

        with self.lock:
            self.total_migrated += len(rows)

        return len(rows)

    def migrate_table(self, table):
        """Migrate table using parallel partition processing."""
        source = self.get_client(self.source_host)

        # Get partitions
        partitions = source.execute(f"""
            SELECT DISTINCT _partition_id
            FROM {table}
        """)
        partition_ids = [p[0] for p in partitions]

        # Get columns
        columns_result = source.execute(f'DESCRIBE TABLE {table}')
        columns = ', '.join([col[0] for col in columns_result])

        print(f"Migrating {table}: {len(partition_ids)} partitions")

        start_time = time.time()

        with ThreadPoolExecutor(max_workers=self.num_workers) as executor:
            futures = {
                executor.submit(
                    self.migrate_partition, table, pid, columns
                ): pid for pid in partition_ids
            }

            completed = 0
            for future in as_completed(futures):
                partition_id = futures[future]
                try:
                    rows = future.result()
                    completed += 1
                    print(f"  Partition {partition_id}: {rows:,} rows "
                          f"({completed}/{len(partition_ids)})")
                except Exception as e:
                    print(f"  Error in partition {partition_id}: {e}")

        elapsed = time.time() - start_time
        print(f"Completed {table}: {self.total_migrated:,} rows "
              f"in {elapsed:.1f}s")

if __name__ == "__main__":
    migrator = ParallelMigrator(
        source_host='clickhouse.host',
        target_host='heliosdb.host',
        database='analytics',
        num_workers=8
    )

    migrator.migrate_table('events')

9.6 Method 5: Incremental/CDC Migration¶

For zero-downtime migrations with continuous data sync:

#!/usr/bin/env python3
"""
Incremental migration with change tracking.
Suitable for zero-downtime migrations.
"""

from clickhouse_driver import Client
import time

def incremental_sync(source_host, target_host, database, table,
                     timestamp_column='timestamp', interval_seconds=60):
    """Continuously sync new data from source to target."""

    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    # Get columns
    columns_result = source.execute(f'DESCRIBE TABLE {table}')
    columns = ', '.join([col[0] for col in columns_result])

    # Get initial watermark from target
    result = target.execute(f'SELECT max({timestamp_column}) FROM {table}')
    watermark = result[0][0] or '1970-01-01 00:00:00'

    print(f"Starting incremental sync from {watermark}")

    while True:
        # Get new rows since watermark
        rows = source.execute(f"""
            SELECT {columns} FROM {table}
            WHERE {timestamp_column} > '{watermark}'
            ORDER BY {timestamp_column}
            LIMIT 100000
        """)

        if rows:
            # Insert to target
            target.execute(
                f'INSERT INTO {table} ({columns}) VALUES',
                rows
            )

            # Update watermark
            new_watermark = max(row[0] for row in rows)  # Assumes first col is timestamp
            print(f"Synced {len(rows)} rows up to {new_watermark}")
            watermark = new_watermark
        else:
            print(f"No new data (watermark: {watermark})")

        # Wait before next sync
        time.sleep(interval_seconds)

if __name__ == "__main__":
    incremental_sync(
        source_host='clickhouse.host',
        target_host='heliosdb.host',
        database='analytics',
        table='events',
        timestamp_column='timestamp',
        interval_seconds=30
    )

9.7 Schema Migration Script¶

#!/bin/bash
# migrate_schema.sh - Export and import ClickHouse schema to HeliosDB

SOURCE_HOST="clickhouse.host"
TARGET_HOST="heliosdb.host"
DATABASE="analytics"

echo "=== Schema Migration: ClickHouse -> HeliosDB ==="

# Export schema
echo "Exporting schema from $SOURCE_HOST..."
clickhouse-client --host $SOURCE_HOST --query "
    SELECT create_table_query
    FROM system.tables
    WHERE database = '$DATABASE'
      AND engine NOT LIKE 'Replicated%'
" > schema_export.sql

# Handle ReplicatedMergeTree -> MergeTree conversion
echo "Converting Replicated* engines to standard engines..."
sed -i 's/ReplicatedMergeTree([^)]*)/MergeTree()/g' schema_export.sql
sed -i 's/ReplicatedReplacingMergeTree([^)]*)/ReplacingMergeTree()/g' schema_export.sql
sed -i 's/ReplicatedSummingMergeTree([^)]*)/SummingMergeTree()/g' schema_export.sql
sed -i 's/ReplicatedAggregatingMergeTree([^)]*)/AggregatingMergeTree()/g' schema_export.sql

# Create database in HeliosDB
echo "Creating database in HeliosDB..."
clickhouse-client --host $TARGET_HOST --query "CREATE DATABASE IF NOT EXISTS $DATABASE"

# Import schema
echo "Importing schema to $TARGET_HOST..."
while IFS= read -r query; do
    if [ -n "$query" ]; then
        clickhouse-client --host $TARGET_HOST --query "$query" || \
            echo "Failed: $query"
    fi
done < schema_export.sql

echo "Schema migration complete!"

10. Application Connectivity¶

10.1 Python (clickhouse-driver)¶

from clickhouse_driver import Client

# Before (ClickHouse)
client = Client(
    host='clickhouse.host',
    port=9000,
    user='default',
    password='password',
    database='analytics',
    compression=True
)

# After (HeliosDB) - Only host changes
client = Client(
    host='heliosdb.host',  # Change only this
    port=9000,
    user='default',
    password='password',
    database='analytics',
    compression=True
)

# Usage remains identical
result = client.execute('''
    SELECT event_type, count() AS cnt
    FROM events
    WHERE timestamp > now() - INTERVAL 1 DAY
    GROUP BY event_type
    ORDER BY cnt DESC
''')

for row in result:
    print(f"{row[0]}: {row[1]}")

10.2 Python (clickhouse-connect)¶

import clickhouse_connect

# Before (ClickHouse)
client = clickhouse_connect.get_client(
    host='clickhouse.host',
    port=8123,
    username='default',
    password='password',
    database='analytics'
)

# After (HeliosDB) - Only host changes
client = clickhouse_connect.get_client(
    host='heliosdb.host',  # Change only this
    port=8123,
    username='default',
    password='password',
    database='analytics'
)

# Query with Pandas integration
df = client.query_df('''
    SELECT
        toDate(timestamp) AS date,
        event_type,
        count() AS events
    FROM events
    GROUP BY date, event_type
''')

print(df.head())

10.3 Go (clickhouse-go)¶

package main

import (
    "context"
    "fmt"
    "github.com/ClickHouse/clickhouse-go/v2"
)

func main() {
    // Before (ClickHouse)
    // conn, _ := clickhouse.Open(&clickhouse.Options{
    //     Addr: []string{"clickhouse.host:9000"},
    //     ...
    // })

    // After (HeliosDB) - Only address changes
    conn, err := clickhouse.Open(&clickhouse.Options{
        Addr: []string{"heliosdb.host:9000"},  // Change only this
        Auth: clickhouse.Auth{
            Database: "analytics",
            Username: "default",
            Password: "password",
        },
        Compression: &clickhouse.Compression{
            Method: clickhouse.CompressionLZ4,
        },
        Settings: clickhouse.Settings{
            "max_execution_time": 60,
        },
    })
    if err != nil {
        panic(err)
    }
    defer conn.Close()

    ctx := context.Background()

    rows, err := conn.Query(ctx, `
        SELECT event_type, count() AS cnt
        FROM events
        WHERE timestamp > now() - INTERVAL 1 DAY
        GROUP BY event_type
    `)
    if err != nil {
        panic(err)
    }
    defer rows.Close()

    for rows.Next() {
        var eventType string
        var count uint64
        rows.Scan(&eventType, &count)
        fmt.Printf("%s: %d\n", eventType, count)
    }
}

10.4 Node.js (@clickhouse/client)¶

const { createClient } = require('@clickhouse/client');

// Before (ClickHouse)
// const client = createClient({
//     host: 'http://clickhouse.host:8123',
//     ...
// });

// After (HeliosDB) - Only host changes
const client = createClient({
    host: 'http://heliosdb.host:8123',  // Change only this
    database: 'analytics',
    username: 'default',
    password: 'password',
    compression: {
        request: true,
        response: true,
    },
});

async function queryEvents() {
    const result = await client.query({
        query: `
            SELECT event_type, count() AS cnt
            FROM events
            WHERE timestamp > now() - INTERVAL 1 DAY
            GROUP BY event_type
            ORDER BY cnt DESC
        `,
        format: 'JSONEachRow',
    });

    const data = await result.json();
    for (const row of data) {
        console.log(`${row.event_type}: ${row.cnt}`);
    }
}

async function insertEvents() {
    const events = [
        { timestamp: new Date(), event_type: 'click', user_id: 1, value: 10.5 },
        { timestamp: new Date(), event_type: 'view', user_id: 2, value: 5.0 },
    ];

    await client.insert({
        table: 'events',
        values: events,
        format: 'JSONEachRow',
    });
}

queryEvents();

10.5 Java (JDBC)¶

import java.sql.*;
import java.util.Properties;

public class HeliosDBJDBC {
    public static void main(String[] args) throws SQLException {
        // Before (ClickHouse)
        // String url = "jdbc:clickhouse://clickhouse.host:8123/analytics";

        // After (HeliosDB) - Only host changes
        String url = "jdbc:clickhouse://heliosdb.host:8123/analytics";

        Properties props = new Properties();
        props.setProperty("user", "default");
        props.setProperty("password", "password");
        props.setProperty("compress", "true");

        try (Connection conn = DriverManager.getConnection(url, props);
             Statement stmt = conn.createStatement();
             ResultSet rs = stmt.executeQuery(
                 "SELECT event_type, count() AS cnt " +
                 "FROM events " +
                 "WHERE timestamp > now() - INTERVAL 1 DAY " +
                 "GROUP BY event_type"
             )) {

            while (rs.next()) {
                System.out.printf("%s: %d%n",
                    rs.getString("event_type"),
                    rs.getLong("cnt"));
            }
        }
    }
}

10.6 Java (Native Client)¶

import com.clickhouse.client.*;
import com.clickhouse.data.*;

public class HeliosDBNative {
    public static void main(String[] args) {
        // Before (ClickHouse)
        // ClickHouseNode server = ClickHouseNode.of("clickhouse.host:9000");

        // After (HeliosDB) - Only host changes
        ClickHouseNode server = ClickHouseNode.builder()
            .host("heliosdb.host")  // Change only this
            .port(9000)
            .database("analytics")
            .credentials(ClickHouseCredentials.fromUserAndPassword("default", "password"))
            .build();

        try (ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.NATIVE);
             ClickHouseResponse response = client.read(server)
                 .query("SELECT event_type, count() FROM events GROUP BY event_type")
                 .executeAndWait()) {

            for (ClickHouseRecord record : response.records()) {
                System.out.printf("%s: %d%n",
                    record.getValue(0).asString(),
                    record.getValue(1).asLong());
            }
        }
    }
}

10.7 HTTP API (curl)¶

# Before (ClickHouse)
# curl 'http://clickhouse.host:8123/' --data-binary "SELECT 1"

# After (HeliosDB) - Only host changes
curl 'http://heliosdb.host:8123/' \
    --data-binary "SELECT event_type, count() FROM events GROUP BY event_type"

# With authentication
curl 'http://heliosdb.host:8123/?user=default&password=password' \
    --data-binary "SELECT * FROM events LIMIT 10"

# With database
curl 'http://heliosdb.host:8123/?database=analytics' \
    --data-binary "SELECT count() FROM events"

# JSON output
curl 'http://heliosdb.host:8123/' \
    --data-binary "SELECT * FROM events FORMAT JSONEachRow"

# Insert data
curl 'http://heliosdb.host:8123/?query=INSERT%20INTO%20events%20FORMAT%20JSONEachRow' \
    --data-binary '{"timestamp":"2025-01-15 10:30:00","event_type":"click","user_id":1,"value":10.5}'

10.8 Environment Variable Configuration¶

# Before (ClickHouse)
export CLICKHOUSE_HOST=clickhouse.host

# After (HeliosDB)
export CLICKHOUSE_HOST=heliosdb.host

# Common environment variables
export CLICKHOUSE_PORT=9000
export CLICKHOUSE_HTTP_PORT=8123
export CLICKHOUSE_USER=default
export CLICKHOUSE_PASSWORD=password
export CLICKHOUSE_DATABASE=analytics

11. Performance Considerations¶

11.1 HeliosDB Performance Configuration¶

# heliosdb.toml - ClickHouse protocol optimization

[clickhouse]
# Network settings
listen_address = "0.0.0.0"
native_port = 9000
http_port = 8123
max_connections = 4096

[clickhouse.protocol]
# Enable compression
compression_enabled = true
compression_algorithms = ["lz4", "zstd"]
default_compression = "lz4"

# Increase buffer sizes
max_query_size = 268435456  # 256MB
max_insert_block_size = 1048576  # 1M rows

[clickhouse.query]
# Query optimization
max_threads = 16
max_execution_time = 600
max_memory_usage = 10737418240  # 10GB
use_uncompressed_cache = true
background_pool_size = 16

[clickhouse.mergetree]
# MergeTree settings
index_granularity = 8192
min_bytes_for_wide_part = 10485760  # 10MB
max_parts_in_total = 100000
merge_max_block_size = 8192

11.2 Query Performance Tips¶

Use PREWHERE¶

-- PREWHERE filters before reading columns
SELECT user_id, event_type, value
FROM events
PREWHERE timestamp > '2025-01-01'  -- Filter early
WHERE event_type = 'purchase';     -- Filter late

Optimize ORDER BY¶

-- Design ORDER BY to match common query patterns
CREATE TABLE events (
    timestamp DateTime,
    user_id UInt32,
    event_type String,
    value Float64
) ENGINE = MergeTree()
-- If you query by user_id and timestamp:
ORDER BY (user_id, timestamp)
PARTITION BY toYYYYMM(timestamp);

-- Queries matching ORDER BY are fast
SELECT * FROM events
WHERE user_id = 12345
  AND timestamp > '2025-01-01';

Use Sampling¶

-- Approximate queries on samples
SELECT event_type, count() * 10 AS estimated_count
FROM events SAMPLE 0.1
GROUP BY event_type;

-- Deterministic sampling
SELECT event_type, count() * 100 AS estimated_count
FROM events SAMPLE 0.01 OFFSET 0.5
GROUP BY event_type;

Batch Inserts¶

# Insert in batches of 10K-100K rows
batch_size = 50000
for i in range(0, len(data), batch_size):
    batch = data[i:i+batch_size]
    client.execute(
        'INSERT INTO events VALUES',
        batch,
        types_check=True
    )

11.3 Index Optimization¶

-- Skip indexes for filtering
CREATE TABLE logs (
    timestamp DateTime,
    level String,
    message String,
    INDEX idx_level level TYPE set(100) GRANULARITY 4,
    INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4
) ENGINE = MergeTree()
ORDER BY timestamp;

-- Bloom filter index for high-cardinality columns
CREATE TABLE events (
    timestamp DateTime,
    user_id UInt32,
    session_id UUID,
    INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);

11.4 Materialized Views for Pre-aggregation¶

-- Pre-aggregate frequently-queried metrics
CREATE MATERIALIZED VIEW hourly_metrics
ENGINE = AggregatingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    countState() AS count_state,
    sumState(value) AS sum_state,
    avgState(value) AS avg_state
FROM events
GROUP BY hour, event_type;

-- Query from materialized view (much faster)
SELECT
    hour,
    event_type,
    countMerge(count_state) AS total,
    sumMerge(sum_state) AS sum,
    avgMerge(avg_state) AS avg
FROM hourly_metrics
WHERE hour >= toStartOfDay(now())
GROUP BY hour, event_type;

11.5 Performance Benchmarks¶

Operation	ClickHouse	HeliosDB	Notes
count() 1B rows	85ms	90ms	Comparable
GROUP BY 100M rows	2.1s	2.0s	Slightly faster
Bulk insert 1M rows	0.65s	0.67s	Comparable
Complex JOIN	3.5s	3.2s	Better optimizer
Window functions	4.2s	3.8s	SIMD acceleration

11.6 Monitoring Performance¶

-- HeliosDB performance views
SELECT * FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_duration_ms > 1000
ORDER BY query_duration_ms DESC
LIMIT 20;

-- Check table sizes
SELECT
    database,
    name,
    formatReadableSize(total_bytes) AS size,
    formatReadableQuantity(total_rows) AS rows
FROM system.tables
WHERE database = 'analytics'
ORDER BY total_bytes DESC;

-- Check part status
SELECT
    table,
    partition,
    count() AS parts,
    sum(rows) AS rows,
    formatReadableSize(sum(bytes)) AS size
FROM system.parts
WHERE database = 'analytics'
  AND active
GROUP BY table, partition
ORDER BY parts DESC;

12. Troubleshooting Common Issues¶

12.1 Connection Issues¶

Issue: Connection Refused¶

Code: 210. DB::NetException: Connection refused

Solutions:

# Check HeliosDB is running
systemctl status heliosdb

# Verify port is listening
netstat -tlnp | grep -E '9000|8123'

# Check firewall
firewall-cmd --list-ports | grep -E '9000|8123'

# Test connectivity
nc -zv heliosdb.host 9000
nc -zv heliosdb.host 8123

Issue: Authentication Failed¶

Code: 516. DB::Exception: Authentication failed

Solutions:

# Ensure credentials are correct
client = Client(
    host='heliosdb.host',
    port=9000,
    user='default',
    password='correct_password',  # Verify password
    database='analytics'
)

12.2 Query Issues¶

Issue: Memory Limit Exceeded¶

Code: 241. DB::Exception: Memory limit exceeded

Solutions:

-- Increase memory limit for query
SET max_memory_usage = 20000000000;  -- 20GB

-- Or use sampling
SELECT ... FROM events SAMPLE 0.1 ...

-- Or break into smaller queries
SELECT ... FROM events WHERE timestamp >= '2025-01-01' AND timestamp < '2025-01-15';
SELECT ... FROM events WHERE timestamp >= '2025-01-15' AND timestamp < '2025-02-01';

Issue: Query Timeout¶

Code: 159. DB::Exception: Timeout exceeded

Solutions:

# Increase timeout
client = Client(
    host='heliosdb.host',
    port=9000,
    settings={
        'max_execution_time': 600,  # 10 minutes
        'send_receive_timeout': 600
    }
)

Issue: Too Many Parts¶

Code: 252. DB::Exception: Too many parts

Solutions:

-- Check part counts
SELECT table, count() AS parts
FROM system.parts
WHERE database = 'analytics' AND active
GROUP BY table
ORDER BY parts DESC;

-- Optimize tables
OPTIMIZE TABLE analytics.events FINAL;

-- Adjust settings
ALTER TABLE events MODIFY SETTING
    parts_to_delay_insert = 500,
    parts_to_throw_insert = 600;

12.3 Data Type Issues¶

Issue: Type Mismatch¶

Code: 53. DB::Exception: Type mismatch

Solutions:

# Ensure proper type conversion
from datetime import datetime

# Correct types
data = [
    (datetime.now(), 'click', 123, 10.5),  # DateTime, String, UInt32, Float64
]

client.execute(
    'INSERT INTO events (timestamp, event_type, user_id, value) VALUES',
    data,
    types_check=True  # Enable type checking
)

Issue: Nullable Type Errors¶

Code: 349. DB::Exception: Cannot insert NULL value

Solutions:

-- Check column nullability
DESCRIBE TABLE events;

-- Alter column to Nullable
ALTER TABLE events MODIFY COLUMN optional_field Nullable(String);

-- Or handle NULLs in insert
INSERT INTO events SELECT
    timestamp,
    event_type,
    COALESCE(user_id, 0) AS user_id,  -- Default for NULL
    value
FROM source_table;

12.4 Migration Issues¶

Issue: Schema Incompatibility¶

Solutions:

# Check for ReplicatedMergeTree engines
grep -i "Replicated" schema_export.sql

# Convert to standard engines
sed -i 's/ReplicatedMergeTree([^)]*)/MergeTree()/g' schema_export.sql

Issue: Data Loss During Migration¶

Solutions:

-- Verify row counts before and after
-- Source
SELECT count() FROM events;  -- On ClickHouse

-- Target
SELECT count() FROM events;  -- On HeliosDB

-- If mismatch, check for errors during migration
-- Re-run migration with smaller batches

Issue: Performance Degradation¶

Solutions:

-- Rebuild indexes after migration
OPTIMIZE TABLE events FINAL;

-- Update statistics
-- HeliosDB handles this automatically

-- Check query plans
EXPLAIN SELECT * FROM events WHERE user_id = 12345;

12.5 Materialized View Issues¶

Issue: View Not Updating¶

Solutions:

-- Check if view is attached
SELECT name, is_populated FROM system.tables WHERE engine = 'MaterializedView';

-- Detach and reattach
DETACH TABLE events_daily_mv;
ATTACH TABLE events_daily_mv;

-- Or recreate with POPULATE
DROP TABLE events_daily_mv;
CREATE MATERIALIZED VIEW events_daily_mv TO events_daily
POPULATE
AS SELECT ...;

12.6 TTL Issues¶

Issue: Data Not Expiring¶

Solutions:

-- Check TTL settings
SELECT name, engine_full FROM system.tables WHERE name = 'events';

-- Force TTL processing
ALTER TABLE events MATERIALIZE TTL;

-- Check TTL status
SELECT
    table,
    partition,
    rows,
    delete_ttl_info
FROM system.parts
WHERE table = 'events' AND active;

13. Post-Migration Validation¶

13.1 Row Count Verification¶

#!/usr/bin/env python3
"""
Validate row counts between ClickHouse and HeliosDB.
"""

from clickhouse_driver import Client

def validate_counts(source_host, target_host, database, tables):
    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    results = {}
    for table in tables:
        source_count = source.execute(f'SELECT count() FROM {table}')[0][0]
        target_count = target.execute(f'SELECT count() FROM {table}')[0][0]

        match = source_count == target_count
        results[table] = {
            'source': source_count,
            'target': target_count,
            'match': match
        }

        status = "OK" if match else "MISMATCH"
        print(f"{table}: Source={source_count:,} Target={target_count:,} [{status}]")

    return results

# Validate
tables = ['events', 'users', 'metrics']
validate_counts('clickhouse.host', 'heliosdb.host', 'analytics', tables)

13.2 Schema Validation¶

-- Compare schemas
-- On ClickHouse:
SHOW CREATE TABLE analytics.events;

-- On HeliosDB:
SHOW CREATE TABLE analytics.events;

-- Check column types
SELECT
    name,
    type,
    default_kind,
    default_expression
FROM system.columns
WHERE database = 'analytics' AND table = 'events';

13.3 Data Integrity Checks¶

def validate_sample_data(source_host, target_host, database, table,
                         key_column, sample_size=1000):
    """Compare sample rows between systems."""
    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    # Get sample keys
    keys = source.execute(f'''
        SELECT DISTINCT {key_column}
        FROM {table}
        ORDER BY rand()
        LIMIT {sample_size}
    ''')

    mismatches = 0
    for (key,) in keys:
        source_row = source.execute(
            f'SELECT * FROM {table} WHERE {key_column} = %(key)s',
            {'key': key}
        )
        target_row = target.execute(
            f'SELECT * FROM {table} WHERE {key_column} = %(key)s',
            {'key': key}
        )

        if source_row != target_row:
            mismatches += 1
            print(f"Mismatch for {key_column}={key}")

    print(f"Validation: {sample_size - mismatches}/{sample_size} rows match")
    return mismatches == 0

13.4 Query Performance Comparison¶

import time

def compare_query_performance(source_host, target_host, database, queries):
    """Compare query execution times."""
    source = Client(host=source_host, port=9000, database=database)
    target = Client(host=target_host, port=9000, database=database)

    for query in queries:
        # Source timing
        start = time.time()
        source.execute(query)
        source_time = time.time() - start

        # Target timing
        start = time.time()
        target.execute(query)
        target_time = time.time() - start

        diff = ((target_time - source_time) / source_time) * 100
        status = "faster" if diff < 0 else "slower"

        print(f"Query: {query[:50]}...")
        print(f"  ClickHouse: {source_time:.3f}s")
        print(f"  HeliosDB: {target_time:.3f}s ({abs(diff):.1f}% {status})")

# Compare
queries = [
    "SELECT count() FROM events",
    "SELECT event_type, count() FROM events GROUP BY event_type",
    "SELECT user_id, sum(value) FROM events GROUP BY user_id ORDER BY sum(value) DESC LIMIT 100",
]
compare_query_performance('clickhouse.host', 'heliosdb.host', 'analytics', queries)

13.5 Application Smoke Tests¶

[ ] All application endpoints responding correctly
[ ] Query latencies within acceptable range (< 10% variance)
[ ] No increase in error rates
[ ] Dashboard visualizations rendering correctly
[ ] Batch jobs completing successfully
[ ] Materialized views updating correctly
[ ] TTL expiration working as expected
[ ] INSERT operations completing successfully
[ ] Complex aggregations returning correct results

13.6 Automated Validation Script¶

#!/bin/bash
# Post-migration validation script

SOURCE_HOST="clickhouse.host"
TARGET_HOST="heliosdb.host"
DATABASE="analytics"

echo "=== Post-Migration Validation ==="

# 1. Connectivity test
echo "Testing connectivity..."
clickhouse-client --host $TARGET_HOST --query "SELECT 1" || exit 1

# 2. Compare table counts
echo "Comparing row counts..."
for table in events users metrics; do
    source_count=$(clickhouse-client --host $SOURCE_HOST \
        --query "SELECT count() FROM $DATABASE.$table")
    target_count=$(clickhouse-client --host $TARGET_HOST \
        --query "SELECT count() FROM $DATABASE.$table")

    if [ "$source_count" == "$target_count" ]; then
        echo "$table: OK ($source_count rows)"
    else
        echo "$table: MISMATCH (source=$source_count, target=$target_count)"
    fi
done

# 3. Query performance
echo "Testing query performance..."
for i in 1 2 3; do
    time clickhouse-client --host $TARGET_HOST \
        --query "SELECT event_type, count() FROM $DATABASE.events GROUP BY event_type" \
        > /dev/null
done

# 4. Test materialized views
echo "Testing materialized views..."
clickhouse-client --host $TARGET_HOST \
    --query "SELECT name FROM system.tables WHERE engine = 'MaterializedView'"

echo "=== Validation Complete ==="

Appendix¶

A. Quick Reference Commands¶

# Connect to HeliosDB
clickhouse-client --host heliosdb.host --port 9000

# Export schema
clickhouse-client --host clickhouse.host \
    --query "SELECT create_table_query FROM system.tables WHERE database = 'analytics'"

# Export data (Native format)
clickhouse-client --host clickhouse.host \
    --query "SELECT * FROM analytics.events FORMAT Native" > events.native

# Import data
clickhouse-client --host heliosdb.host \
    --query "INSERT INTO analytics.events FORMAT Native" < events.native

# Verify counts
clickhouse-client --host heliosdb.host \
    --query "SELECT count() FROM analytics.events"

B. Migration Checklist Summary¶

Pre-Migration¶

[ ] Backup ClickHouse data
[ ] Document cluster topology
[ ] Inventory tables and engines
[ ] Analyze data volume
[ ] Review query patterns
[ ] Plan migration window

Migration¶

[ ] Export schemas
[ ] Convert Replicated* engines
[ ] Import schemas to HeliosDB
[ ] Migrate data (choose method)
[ ] Update application connections

Post-Migration¶

[ ] Verify row counts
[ ] Check schema integrity
[ ] Validate sample data
[ ] Compare query performance
[ ] Run application tests
[ ] Monitor error rates
[ ] Optimize tables

C. Sample Migration Timeline¶

Phase	Duration	Activities
Assessment	1-2 days	Inventory, engine review, sizing
Preparation	1-3 days	HeliosDB setup, schema conversion
Schema Migration	1-2 hours	Export, convert, import
Data Migration	Varies	Depends on data size
Validation	1-2 days	Testing, performance comparison
Cutover	1-4 hours	Connection updates, final sync
Monitoring	1-2 weeks	Performance monitoring

D. Compatibility Notes¶

Fully Compatible Features¶

Native TCP protocol (port 9000)
HTTP protocol (port 8123)
All MergeTree family engines
All standard data types
Materialized views
Dictionaries
TTL
PREWHERE/WHERE/FINAL
All JOIN types
Window functions
Aggregate functions

Migration Required¶

ReplicatedMergeTree -> MergeTree (HeliosDB handles replication)
ZooKeeper paths -> HeliosDB replication config
Cluster settings -> HeliosDB cluster config

Behavioral Differences¶

Replication: Uses HeliosDB native replication
Storage: Uses HeliosDB storage engine
Keeper: Not required (no ZooKeeper dependency)
Some advanced settings may differ

Need Help?

Documentation: docs.heliosdb.io
Community: community.heliosdb.io
Support: support@heliosdb.io

Document Version History:

Version	Date	Changes
1.0	January 2026	Initial release