Skip to content

Cassandra to HeliosDB Migration Guide

Version: 1.0 Last Updated: January 2026


Table of Contents

  1. Introduction
  2. Compatibility Overview
  3. Pre-Migration Assessment
  4. Conceptual Mapping
  5. Step-by-Step Migration
  6. CQL Query Compatibility
  7. Feature Mapping
  8. Consistency and Replication
  9. Post-Migration Validation
  10. Performance Tuning
  11. Common Issues and Troubleshooting

1. Introduction

Why Migrate from Cassandra to HeliosDB?

HeliosDB provides full CQL wire protocol compatibility while offering significant advantages over Apache Cassandra:

Performance Benefits

  • Lower Latency: HeliosDB's unified storage engine provides sub-millisecond query response times
  • Higher Throughput: Optimized query execution delivers 100K+ operations per second
  • Efficient Compression: Advanced compression algorithms (LZ4, Snappy) with better compression ratios
  • Intelligent Caching: Automatic query and data caching without manual tuning

Operational Benefits

  • Simplified Operations: No need for nodetool, repair operations, or complex tuning
  • Zero-Downtime Scaling: Automatic rebalancing without manual intervention
  • Unified Management: Single interface for all protocols (SQL, CQL, MongoDB, etc.)
  • Built-in Monitoring: Comprehensive metrics without JMX configuration

Feature Enhancements

  • Multi-Protocol Access: Access same data via SQL, CQL, REST, or GraphQL
  • Advanced Analytics: Full SQL support including window functions and CTEs
  • ACID Transactions: True ACID compliance beyond Cassandra's lightweight transactions
  • AI Integration: Built-in ML/AI capabilities and natural language queries

Cost Reduction

  • Lower Hardware Requirements: More efficient resource utilization
  • Reduced Operational Overhead: Fewer administrators needed
  • No Licensing Fees: Open-source with enterprise features included

Migration Compatibility

HeliosDB supports Apache Cassandra Native Protocol v4/v5 with 100% CQL language compatibility. This means:

  • Existing Cassandra drivers work without modification
  • CQL queries execute without changes
  • Application code requires minimal to no changes
  • Schema can be migrated directly

2. Compatibility Overview

CQL Wire Protocol Support

Feature Compatibility Notes
Native Protocol v4 100% Full support
Native Protocol v5 100% Full support
Frame Compression (LZ4) 100% Recommended
Frame Compression (Snappy) 100% Supported
SASL Authentication 100% Password, LDAP
Prepared Statements 100% Full caching
Batch Operations 100% All batch types
Paging 100% Cursor-based
Event Notifications 100% SCHEMA_CHANGE, etc.

Driver Compatibility Matrix

Language Driver Version Status
Java DataStax Java Driver 4.x Fully Compatible
Python cassandra-driver 3.x Fully Compatible
Node.js cassandra-driver 4.x Fully Compatible
Go gocql 1.x Fully Compatible
C# CassandraCSharpDriver 3.x Fully Compatible
C++ cpp-driver 2.x Fully Compatible
Rust scylla-rust-driver 0.x Fully Compatible

cqlsh Compatibility

# Connect to HeliosDB using standard cqlsh
cqlsh localhost 9042

# All cqlsh commands work identically
DESCRIBE KEYSPACES;
DESCRIBE TABLES;
TRACING ON;

3. Pre-Migration Assessment

3.1 Pre-Migration Checklist

  • [ ] Complete backup of Cassandra cluster
  • [ ] Document current cluster topology
  • [ ] Inventory all keyspaces and tables
  • [ ] Analyze partition key distribution
  • [ ] Estimate total data volume
  • [ ] Identify data with TTL requirements
  • [ ] Review materialized views
  • [ ] Catalog user-defined types (UDTs)
  • [ ] List secondary indexes
  • [ ] Document consistency level requirements
  • [ ] Inventory application connection configurations
  • [ ] Plan migration window and rollback strategy
  • [ ] Test migration in development environment

3.2 Keyspace and Table Analysis

Export Keyspace Schema

# Export all keyspace schemas
cqlsh -e "DESCRIBE KEYSPACES" > keyspace_list.txt

# Export detailed schema for each keyspace
for ks in $(cqlsh -e "DESCRIBE KEYSPACES" | tr ' ' '\n' | grep -v '^$'); do
    cqlsh -e "DESCRIBE KEYSPACE $ks" > schema_${ks}.cql
done

# Export full cluster schema
cqlsh -e "DESCRIBE CLUSTER" > cluster_info.txt
cqlsh -e "DESCRIBE SCHEMA" > full_schema.cql

Analyze Table Structure

-- List all tables in a keyspace
SELECT table_name, id
FROM system_schema.tables
WHERE keyspace_name = 'your_keyspace';

-- Get table details
SELECT *
FROM system_schema.columns
WHERE keyspace_name = 'your_keyspace'
AND table_name = 'your_table';

-- Check for clustering columns
SELECT column_name, clustering_order, kind
FROM system_schema.columns
WHERE keyspace_name = 'your_keyspace'
AND table_name = 'your_table'
AND kind IN ('partition_key', 'clustering');

3.3 Data Volume Estimation

Cassandra Data Size Analysis

# Check data directory size
du -sh /var/lib/cassandra/data/*

# Per-table statistics using nodetool
nodetool tablestats your_keyspace.your_table

Query Table Statistics

-- Estimate row counts (approximate)
SELECT count(*) FROM your_keyspace.your_table;

-- For large tables, use token ranges
SELECT count(*) FROM your_keyspace.your_table
WHERE token(partition_key) > token(0)
AND token(partition_key) <= token(1000000);

Storage Planning for HeliosDB

Cassandra Size Recommended HeliosDB Storage Notes
< 100 GB 1.2x Cassandra size Compression efficiency
100 GB - 1 TB 1.0x - 1.1x Index optimization
> 1 TB 0.9x - 1.0x Better compression ratio

3.4 Partition Key Analysis

Understanding partition key distribution is critical for migration planning:

-- Analyze partition key cardinality
SELECT partition_key, count(*) as row_count
FROM your_keyspace.your_table
GROUP BY partition_key
LIMIT 1000;

-- Identify hot partitions
SELECT partition_key, count(*) as row_count
FROM your_keyspace.your_table
GROUP BY partition_key
HAVING count(*) > 10000
ALLOW FILTERING;

Identify Large Partitions

# Use nodetool to find large partitions
nodetool tablehistograms your_keyspace your_table

# Check for partition tombstones
nodetool cfstats your_keyspace.your_table | grep -E "Tombstones|Partition"

3.5 Feature Usage Inventory

Document usage of the following features:

Feature Used? Tables Affected Notes
Materialized Views
Secondary Indexes
User-Defined Types
Counter Columns
TTL (Time-To-Live)
Lightweight Transactions
Collections (List/Set/Map)
Custom Types
UDFs/UDAs

4. Conceptual Mapping

4.1 Keyspaces to Schemas

Cassandra keyspaces map directly to HeliosDB schemas:

Cassandra HeliosDB Notes
Keyspace Schema 1:1 mapping
Replication Strategy Replication Config Simplified management
Durable Writes Transaction Mode Always durable

Keyspace Migration Example

-- Cassandra keyspace definition
CREATE KEYSPACE production WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 2
} AND DURABLE_WRITES = true;

-- HeliosDB equivalent (same syntax works)
CREATE KEYSPACE production WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 2
} AND DURABLE_WRITES = true;

-- Note: HeliosDB manages replication automatically
-- Replication strategy is honored for compatibility

4.2 Tables with Clustering Columns

Cassandra's wide-column model maps to HeliosDB's relational storage:

Cassandra Concept HeliosDB Mapping Description
Partition Key Primary Key (first column) Hash distribution
Clustering Columns Composite Primary Key Sort order within partition
Regular Columns Standard Columns Data columns
Static Columns Indexed Columns Shared within partition

Table Migration Example

-- Cassandra table with clustering columns
CREATE TABLE user_events (
    user_id UUID,
    event_date DATE,
    event_time TIMESTAMP,
    event_type TEXT,
    event_data MAP<TEXT, TEXT>,
    PRIMARY KEY ((user_id, event_date), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC)
  AND default_time_to_live = 86400
  AND gc_grace_seconds = 864000;

-- HeliosDB: Same DDL works identically
CREATE TABLE user_events (
    user_id UUID,
    event_date DATE,
    event_time TIMESTAMP,
    event_type TEXT,
    event_data MAP<TEXT, TEXT>,
    PRIMARY KEY ((user_id, event_date), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC)
  AND default_time_to_live = 86400
  AND gc_grace_seconds = 864000;

4.3 Materialized Views Mapping

Cassandra materialized views are fully supported:

-- Cassandra materialized view
CREATE MATERIALIZED VIEW users_by_email AS
    SELECT * FROM users
    WHERE email IS NOT NULL
    AND user_id IS NOT NULL
    PRIMARY KEY (email, user_id);

-- HeliosDB: Identical syntax, automatic maintenance
CREATE MATERIALIZED VIEW users_by_email AS
    SELECT * FROM users
    WHERE email IS NOT NULL
    AND user_id IS NOT NULL
    PRIMARY KEY (email, user_id);

-- HeliosDB advantage: Views can also be created as SQL views
-- for more complex transformations
CREATE VIEW users_with_orders AS
    SELECT u.*, COUNT(o.order_id) as order_count
    FROM users u
    LEFT JOIN orders o ON u.user_id = o.user_id
    GROUP BY u.user_id;

4.4 Data Type Mapping

Cassandra Type HeliosDB Type Notes
ASCII TEXT UTF-8 normalized
BIGINT BIGINT Identical
BLOB BYTEA Binary storage
BOOLEAN BOOLEAN Identical
COUNTER BIGINT + trigger Counter semantics
DATE DATE Identical
DECIMAL DECIMAL/NUMERIC Arbitrary precision
DOUBLE DOUBLE PRECISION 64-bit IEEE
DURATION INTERVAL Time intervals
FLOAT REAL 32-bit IEEE
INET INET IPv4/IPv6
INT INTEGER 32-bit signed
SMALLINT SMALLINT 16-bit signed
TEXT TEXT UTF-8 string
TIME TIME Time of day
TIMESTAMP TIMESTAMPTZ With timezone
TIMEUUID UUID Time-based UUID
TINYINT SMALLINT Mapped to 16-bit
UUID UUID 128-bit UUID
VARCHAR TEXT UTF-8 string
VARINT NUMERIC Arbitrary integer
LIST JSONB array Ordered collection
SET JSONB array Unique values
MAP JSONB object Key-value pairs
TUPLE<...> JSONB array Fixed-length
FROZEN JSONB Immutable
UDT JSONB User-defined type

5. Step-by-Step Migration

5.1 Phase 1: Schema Translation

Export Cassandra Schema

# Export complete schema
cqlsh cassandra-host -e "DESCRIBE SCHEMA" > cassandra_schema.cql

# Export specific keyspace
cqlsh cassandra-host -e "DESCRIBE KEYSPACE production" > production_schema.cql

Schema Validation Script

#!/usr/bin/env python3
"""
Cassandra to HeliosDB Schema Validator
Validates CQL schema compatibility before migration.
"""

import re
import sys

def validate_schema(schema_file):
    """Validate CQL schema for HeliosDB compatibility."""
    with open(schema_file, 'r') as f:
        schema = f.read()

    warnings = []

    # Check for unsupported features
    if 'CUSTOM INDEX' in schema.upper():
        warnings.append("CUSTOM INDEX: SASI indexes have partial support")

    if 'TINYINT' in schema.upper():
        warnings.append("TINYINT: Mapped to SMALLINT in HeliosDB")

    if 'COUNTER' in schema.upper():
        warnings.append("COUNTER: Uses trigger-based implementation")

    # Report findings
    if warnings:
        print("Schema Compatibility Warnings:")
        for w in warnings:
            print(f"  - {w}")
    else:
        print("Schema is fully compatible with HeliosDB")

    return len(warnings) == 0

if __name__ == "__main__":
    validate_schema(sys.argv[1])

Import Schema to HeliosDB

# Connect to HeliosDB and import schema
cqlsh heliosdb-host 9042 -f cassandra_schema.cql

# Verify schema import
cqlsh heliosdb-host 9042 -e "DESCRIBE KEYSPACES"
cqlsh heliosdb-host 9042 -e "DESCRIBE KEYSPACE production"

5.2 Phase 2: Data Export from Cassandra

-- Export table data to CSV
COPY production.users TO '/data/export/users.csv'
WITH HEADER = true
AND DELIMITER = '|'
AND NULL = ''
AND MAXREQUESTS = 6
AND PAGESIZE = 5000;

-- Export with specific columns
COPY production.events (user_id, event_time, event_type, event_data)
TO '/data/export/events.csv'
WITH HEADER = true;

Method 2: Spark for Large Datasets

// Spark DataFrame export for large tables
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.cassandra._

val spark = SparkSession.builder()
  .appName("CassandraExport")
  .config("spark.cassandra.connection.host", "cassandra-host")
  .getOrCreate()

// Read from Cassandra
val df = spark.read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("keyspace" -> "production", "table" -> "users"))
  .load()

// Export to Parquet for efficient transfer
df.write
  .mode("overwrite")
  .parquet("/data/export/users_parquet")

Method 3: sstableloader (Advanced)

# Export SSTables (for very large datasets)
# Note: This requires direct access to Cassandra data directory

# First, flush memtables
nodetool flush production users

# Copy SSTable files
cp -r /var/lib/cassandra/data/production/users-*/* /data/export/sstables/

# For HeliosDB import, convert to CQL or CSV format
# HeliosDB provides conversion utility
heliosdb-tools convert-sstable \
    --input /data/export/sstables \
    --output /data/export/users.cql \
    --keyspace production \
    --table users

5.3 Phase 3: Data Import into HeliosDB

Method 1: COPY FROM

-- Import data from CSV
USE production;

COPY users FROM '/data/export/users.csv'
WITH HEADER = true
AND DELIMITER = '|'
AND NULL = ''
AND CHUNKSIZE = 1000;

-- Verify import
SELECT COUNT(*) FROM users;

Method 2: Bulk Insert with Python

#!/usr/bin/env python3
"""
Bulk data migration from Cassandra to HeliosDB.
"""

from cassandra.cluster import Cluster
from cassandra.query import BatchStatement, BatchType
import csv
import uuid

# Source: Cassandra
source_cluster = Cluster(['cassandra-host'])
source_session = source_cluster.connect('production')

# Target: HeliosDB
target_cluster = Cluster(['heliosdb-host'])
target_session = target_cluster.connect('production')

def migrate_table(table_name, batch_size=100):
    """Migrate a single table with batching."""

    # Get all rows from source
    rows = source_session.execute(f"SELECT * FROM {table_name}")

    # Prepare insert statement
    columns = rows.column_names
    placeholders = ', '.join(['?' for _ in columns])
    insert_cql = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})"
    prepared = target_session.prepare(insert_cql)

    # Batch insert
    batch = BatchStatement(batch_type=BatchType.UNLOGGED)
    count = 0
    total = 0

    for row in rows:
        batch.add(prepared, list(row))
        count += 1

        if count >= batch_size:
            target_session.execute(batch)
            total += count
            print(f"Migrated {total} rows...")
            batch = BatchStatement(batch_type=BatchType.UNLOGGED)
            count = 0

    # Final batch
    if count > 0:
        target_session.execute(batch)
        total += count

    print(f"Migration complete: {total} rows migrated to {table_name}")
    return total

# Migrate tables
tables = ['users', 'events', 'orders']
for table in tables:
    migrate_table(table)

source_cluster.shutdown()
target_cluster.shutdown()

Method 3: Parallel Migration for Large Datasets

#!/usr/bin/env python3
"""
Parallel token-range migration for large datasets.
"""

from cassandra.cluster import Cluster
from concurrent.futures import ThreadPoolExecutor, as_completed
import math

def get_token_ranges(num_ranges=16):
    """Generate token ranges for parallel processing."""
    min_token = -2**63
    max_token = 2**63 - 1
    range_size = (max_token - min_token) // num_ranges

    ranges = []
    current = min_token
    for i in range(num_ranges):
        end = current + range_size if i < num_ranges - 1 else max_token
        ranges.append((current, end))
        current = end
    return ranges

def migrate_token_range(source_session, target_session, table, token_range, prepared):
    """Migrate a single token range."""
    start_token, end_token = token_range

    query = f"""
        SELECT * FROM {table}
        WHERE token(partition_key) >= {start_token}
        AND token(partition_key) < {end_token}
    """

    rows = source_session.execute(query)
    count = 0

    for row in rows:
        target_session.execute(prepared, list(row))
        count += 1

    return count

def parallel_migrate(table_name, partition_key_column, num_workers=8):
    """Migrate table using parallel token ranges."""
    source = Cluster(['cassandra-host']).connect('production')
    target = Cluster(['heliosdb-host']).connect('production')

    # Prepare insert
    # ... (similar to previous example)

    token_ranges = get_token_ranges(num_workers * 2)
    total_migrated = 0

    with ThreadPoolExecutor(max_workers=num_workers) as executor:
        futures = [
            executor.submit(migrate_token_range, source, target, table_name, tr, prepared)
            for tr in token_ranges
        ]

        for future in as_completed(futures):
            total_migrated += future.result()
            print(f"Progress: {total_migrated} rows migrated")

    return total_migrated

5.4 Phase 4: Application CQL Driver Configuration

Java Application Configuration

// Before (Cassandra)
CqlSession session = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("cassandra-host", 9042))
    .withLocalDatacenter("dc1")
    .withKeyspace("production")
    .build();

// After (HeliosDB) - Minimal changes
CqlSession session = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("heliosdb-host", 9042))  // Change host
    .withLocalDatacenter("datacenter1")  // HeliosDB default datacenter
    .withKeyspace("production")
    .build();

Python Application Configuration

# Before (Cassandra)
from cassandra.cluster import Cluster

cluster = Cluster(['cassandra-node1', 'cassandra-node2', 'cassandra-node3'])
session = cluster.connect('production')

# After (HeliosDB) - Change contact points only
cluster = Cluster(['heliosdb-host'])  # Single endpoint, HeliosDB handles routing
session = cluster.connect('production')

Node.js Application Configuration

// Before (Cassandra)
const cassandra = require('cassandra-driver');

const client = new cassandra.Client({
    contactPoints: ['cassandra-node1', 'cassandra-node2'],
    localDataCenter: 'dc1',
    keyspace: 'production'
});

// After (HeliosDB)
const client = new cassandra.Client({
    contactPoints: ['heliosdb-host'],
    localDataCenter: 'datacenter1',
    keyspace: 'production'
});

Go Application Configuration

// Before (Cassandra)
cluster := gocql.NewCluster("cassandra-node1", "cassandra-node2")
cluster.Keyspace = "production"
cluster.Consistency = gocql.Quorum

// After (HeliosDB)
cluster := gocql.NewCluster("heliosdb-host")
cluster.Keyspace = "production"
cluster.Consistency = gocql.Quorum  // Consistency levels work identically

6. CQL Query Compatibility

6.1 Supported CQL Statements

Data Definition Language (DDL)

Statement Support Example
CREATE KEYSPACE Full CREATE KEYSPACE ks WITH REPLICATION = {...}
ALTER KEYSPACE Full ALTER KEYSPACE ks WITH REPLICATION = {...}
DROP KEYSPACE Full DROP KEYSPACE IF EXISTS ks
CREATE TABLE Full CREATE TABLE t (id UUID PRIMARY KEY, ...)
ALTER TABLE Full ALTER TABLE t ADD column TEXT
DROP TABLE Full DROP TABLE IF EXISTS t
TRUNCATE Full TRUNCATE TABLE t
CREATE INDEX Full CREATE INDEX ON t (column)
DROP INDEX Full DROP INDEX idx_name
CREATE TYPE Full CREATE TYPE address (street TEXT, ...)
DROP TYPE Full DROP TYPE address
CREATE FUNCTION Full CREATE FUNCTION fn(...) ...
DROP FUNCTION Full DROP FUNCTION fn
CREATE MATERIALIZED VIEW Full CREATE MATERIALIZED VIEW mv AS ...
DROP MATERIALIZED VIEW Full DROP MATERIALIZED VIEW mv

Data Manipulation Language (DML)

-- All standard DML operations are supported

-- SELECT with all clauses
SELECT * FROM users
WHERE user_id = ?
AND event_time >= ?
AND event_time < ?
ORDER BY event_time DESC
LIMIT 100;

-- SELECT with IN clause
SELECT * FROM users WHERE user_id IN (?, ?, ?);

-- SELECT with token range
SELECT * FROM users WHERE token(user_id) > ? LIMIT 1000;

-- SELECT with ALLOW FILTERING
SELECT * FROM users WHERE email = ? ALLOW FILTERING;

-- SELECT as JSON
SELECT JSON * FROM users WHERE user_id = ?;

-- INSERT with options
INSERT INTO users (user_id, name, email)
VALUES (?, ?, ?)
USING TTL 86400 AND TIMESTAMP 1234567890;

-- INSERT IF NOT EXISTS
INSERT INTO users (user_id, name) VALUES (?, ?)
IF NOT EXISTS;

-- UPDATE with conditions
UPDATE users SET email = ?
WHERE user_id = ?
IF name = ?;

-- DELETE with conditions
DELETE FROM users WHERE user_id = ? IF EXISTS;

-- BATCH operations
BEGIN BATCH
    INSERT INTO users (user_id, name) VALUES (?, ?);
    UPDATE user_stats SET login_count = login_count + 1 WHERE user_id = ?;
APPLY BATCH;

6.2 Lightweight Transactions (LWT)

HeliosDB fully supports Cassandra's lightweight transactions:

-- Insert if not exists
INSERT INTO users (user_id, username, email)
VALUES (uuid(), 'newuser', 'new@example.com')
IF NOT EXISTS;

-- Update with condition
UPDATE users SET email = 'verified@example.com'
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
IF email = 'unverified@example.com';

-- Multiple conditions
UPDATE users SET status = 'active'
WHERE user_id = ?
IF status = 'pending' AND created_at < ?;

-- Delete with existence check
DELETE FROM users
WHERE user_id = ?
IF EXISTS;

-- LWT with IN operator
UPDATE users SET verified = true
WHERE user_id = ?
IF status IN ('pending', 'unverified');

-- LWT returns applied status
-- [applied] | user_id | ...
-- True      | ...     | ...
-- False     | ...     | ... (shows current values if not applied)

6.3 Time-Series Data Handling

HeliosDB excels at time-series workloads common in Cassandra:

-- Time-series table design (identical to Cassandra best practices)
CREATE TABLE sensor_readings (
    sensor_id UUID,
    bucket_date DATE,
    reading_time TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    pressure DOUBLE,
    PRIMARY KEY ((sensor_id, bucket_date), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
  AND default_time_to_live = 2592000;  -- 30 days

-- Efficient time-range queries
SELECT * FROM sensor_readings
WHERE sensor_id = ?
AND bucket_date = ?
AND reading_time >= ?
AND reading_time < ?
ORDER BY reading_time DESC;

-- Aggregations (HeliosDB enhancement: more aggregate functions)
SELECT
    sensor_id,
    bucket_date,
    AVG(temperature) as avg_temp,
    MAX(temperature) as max_temp,
    MIN(temperature) as min_temp,
    COUNT(*) as reading_count
FROM sensor_readings
WHERE sensor_id = ? AND bucket_date = ?;

-- HeliosDB bonus: Window functions for time-series analysis
-- (Via SQL interface, same data accessible)
SELECT
    reading_time,
    temperature,
    AVG(temperature) OVER (
        PARTITION BY sensor_id
        ORDER BY reading_time
        ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) as moving_avg
FROM sensor_readings
WHERE sensor_id = ? AND bucket_date = ?;

7. Feature Mapping

7.1 TTL Support

TTL (Time-To-Live) is fully supported at both row and column level:

-- Table-level default TTL
CREATE TABLE sessions (
    session_id UUID PRIMARY KEY,
    user_id UUID,
    data BLOB
) WITH default_time_to_live = 3600;  -- 1 hour

-- Row-level TTL on INSERT
INSERT INTO sessions (session_id, user_id, data)
VALUES (uuid(), ?, ?)
USING TTL 7200;  -- 2 hours

-- Row-level TTL on UPDATE
UPDATE sessions USING TTL 1800
SET data = ?
WHERE session_id = ?;

-- Query TTL value
SELECT session_id, TTL(data) FROM sessions WHERE session_id = ?;

-- HeliosDB extension: Modify TTL without changing data
UPDATE sessions USING TTL 3600
SET data = data  -- No-op assignment
WHERE session_id = ?;

7.2 Collections (Sets, Lists, Maps)

All collection types are supported with full operation semantics:

SET Operations

-- Create table with SET
CREATE TABLE user_tags (
    user_id UUID PRIMARY KEY,
    tags SET<TEXT>
);

-- Add to set
UPDATE user_tags SET tags = tags + {'premium', 'verified'}
WHERE user_id = ?;

-- Remove from set
UPDATE user_tags SET tags = tags - {'trial'}
WHERE user_id = ?;

-- Replace entire set
UPDATE user_tags SET tags = {'member'}
WHERE user_id = ?;

-- Query set contents
SELECT tags FROM user_tags WHERE user_id = ?;

LIST Operations

-- Create table with LIST
CREATE TABLE user_history (
    user_id UUID PRIMARY KEY,
    recent_pages LIST<TEXT>
);

-- Append to list
UPDATE user_history SET recent_pages = recent_pages + ['/dashboard']
WHERE user_id = ?;

-- Prepend to list
UPDATE user_history SET recent_pages = ['/home'] + recent_pages
WHERE user_id = ?;

-- Remove by value
UPDATE user_history SET recent_pages = recent_pages - ['/old-page']
WHERE user_id = ?;

-- Update by index
UPDATE user_history SET recent_pages[0] = '/new-home'
WHERE user_id = ?;

-- Delete by index
DELETE recent_pages[0] FROM user_history WHERE user_id = ?;

MAP Operations

-- Create table with MAP
CREATE TABLE user_preferences (
    user_id UUID PRIMARY KEY,
    settings MAP<TEXT, TEXT>
);

-- Add/update map entries
UPDATE user_preferences SET settings['theme'] = 'dark'
WHERE user_id = ?;

UPDATE user_preferences SET settings = settings + {'lang': 'en', 'tz': 'UTC'}
WHERE user_id = ?;

-- Remove map entries
DELETE settings['old_setting'] FROM user_preferences WHERE user_id = ?;

UPDATE user_preferences SET settings = settings - {'deprecated'}
WHERE user_id = ?;

-- Query map
SELECT settings['theme'] FROM user_preferences WHERE user_id = ?;

7.3 User-Defined Types (UDT)

UDTs are fully supported and stored efficiently as JSONB:

-- Create UDT
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip_code TEXT,
    country TEXT
);

CREATE TYPE contact_info (
    email TEXT,
    phone TEXT,
    address FROZEN<address>
);

-- Use UDT in table
CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    name TEXT,
    primary_contact FROZEN<contact_info>,
    shipping_addresses LIST<FROZEN<address>>
);

-- Insert with UDT
INSERT INTO customers (customer_id, name, primary_contact, shipping_addresses)
VALUES (
    uuid(),
    'John Doe',
    {
        email: 'john@example.com',
        phone: '+1-555-1234',
        address: {
            street: '123 Main St',
            city: 'Springfield',
            state: 'IL',
            zip_code: '62701',
            country: 'USA'
        }
    },
    [
        {street: '456 Oak Ave', city: 'Chicago', state: 'IL', zip_code: '60601', country: 'USA'}
    ]
);

-- Query UDT fields
SELECT
    name,
    primary_contact.email,
    primary_contact.address.city
FROM customers
WHERE customer_id = ?;

-- Update UDT field
UPDATE customers
SET primary_contact.phone = '+1-555-5678'
WHERE customer_id = ?;

7.4 Counter Columns

Counter columns are supported with atomic increment/decrement:

-- Counter table
CREATE TABLE page_views (
    page_id TEXT PRIMARY KEY,
    view_count COUNTER,
    unique_visitors COUNTER
);

-- Increment counters
UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home';
UPDATE page_views SET unique_visitors = unique_visitors + 1 WHERE page_id = '/home';

-- Decrement counters
UPDATE page_views SET view_count = view_count - 1 WHERE page_id = '/home';

-- Counter batches
BEGIN COUNTER BATCH
    UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home';
    UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/about';
APPLY BATCH;

-- Note: Counter limitations (same as Cassandra)
-- - Counter columns cannot be indexed
-- - Counter tables can only contain counter columns plus the primary key
-- - Cannot mix counter and non-counter columns
-- - Cannot use TTL with counters

7.5 Secondary Indexes

-- Create secondary index
CREATE INDEX users_by_email ON users (email);

-- Create index with custom name
CREATE INDEX idx_users_country ON users (country);

-- Index on collection elements
CREATE INDEX ON users (KEYS(preferences));  -- Index map keys
CREATE INDEX ON users (VALUES(tags));       -- Index set/list values
CREATE INDEX ON users (ENTRIES(preferences)); -- Index map entries

-- Query using index
SELECT * FROM users WHERE email = 'user@example.com';

-- Drop index
DROP INDEX IF EXISTS users_by_email;

8. Consistency and Replication

8.1 Consistency Level Mapping

All Cassandra consistency levels are supported:

Consistency Level HeliosDB Behavior Recommendation
ANY Write acknowledged Development only
ONE Single replica Low-latency reads
TWO Two replicas Moderate durability
THREE Three replicas High durability
QUORUM Majority replicas Recommended default
ALL All replicas Maximum consistency
LOCAL_QUORUM Local DC quorum Multi-DC deployments
EACH_QUORUM Quorum in each DC Strong multi-DC
LOCAL_ONE Single local replica DC-aware reads
SERIAL Linearizable (LWT) LWT operations
LOCAL_SERIAL Local linearizable Local LWT

Setting Consistency Levels

from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

# Per-query consistency
query = SimpleStatement(
    "SELECT * FROM users WHERE user_id = ?",
    consistency_level=ConsistencyLevel.LOCAL_QUORUM
)
session.execute(query, [user_id])

# Session default consistency
from cassandra.cluster import ExecutionProfile
from cassandra.policies import RoundRobinPolicy

profile = ExecutionProfile(
    load_balancing_policy=RoundRobinPolicy(),
    consistency_level=ConsistencyLevel.LOCAL_QUORUM,
    serial_consistency_level=ConsistencyLevel.LOCAL_SERIAL
)

cluster = Cluster(
    ['heliosdb-host'],
    execution_profiles={'default': profile}
)

8.2 Replication Strategy Configuration

-- SimpleStrategy (single datacenter)
CREATE KEYSPACE development WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};

-- NetworkTopologyStrategy (multi-datacenter)
CREATE KEYSPACE production WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3,
    'dc3': 2
};

-- Modify replication
ALTER KEYSPACE production WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3,
    'dc3': 3
};

8.3 HeliosDB Replication Advantages

Unlike Cassandra, HeliosDB provides:

  • Automatic Rebalancing: No manual repair operations needed
  • Faster Replica Sync: Streaming replication with parallel recovery
  • Cross-Protocol Replication: Data replicated across protocol interfaces
  • Integrated Backup: Point-in-time recovery built-in

9. Post-Migration Validation

9.1 Data Integrity Validation

Row Count Verification

#!/usr/bin/env python3
"""
Validate data integrity after migration.
"""

from cassandra.cluster import Cluster

def validate_row_counts(keyspace, tables):
    """Compare row counts between Cassandra and HeliosDB."""
    source = Cluster(['cassandra-host']).connect(keyspace)
    target = Cluster(['heliosdb-host']).connect(keyspace)

    results = {}
    for table in tables:
        source_count = source.execute(f"SELECT COUNT(*) FROM {table}").one()[0]
        target_count = target.execute(f"SELECT COUNT(*) FROM {table}").one()[0]

        results[table] = {
            'source': source_count,
            'target': target_count,
            'match': source_count == target_count
        }

        status = "OK" if results[table]['match'] else "MISMATCH"
        print(f"{table}: Source={source_count}, Target={target_count} [{status}]")

    source.cluster.shutdown()
    target.cluster.shutdown()
    return results

# Validate all tables
tables = ['users', 'events', 'orders', 'products']
validate_row_counts('production', tables)

Data Sampling Validation

def validate_sample_data(keyspace, table, primary_key_column, sample_size=100):
    """Compare sample rows between systems."""
    source = Cluster(['cassandra-host']).connect(keyspace)
    target = Cluster(['heliosdb-host']).connect(keyspace)

    # Get sample primary keys from source
    sample_query = f"SELECT {primary_key_column} FROM {table} LIMIT {sample_size}"
    sample_keys = [row[0] for row in source.execute(sample_query)]

    mismatches = []
    for key in sample_keys:
        source_row = source.execute(
            f"SELECT * FROM {table} WHERE {primary_key_column} = ?", [key]
        ).one()
        target_row = target.execute(
            f"SELECT * FROM {table} WHERE {primary_key_column} = ?", [key]
        ).one()

        if source_row != target_row:
            mismatches.append({
                'key': key,
                'source': dict(source_row._asdict()) if source_row else None,
                'target': dict(target_row._asdict()) if target_row else None
            })

    if mismatches:
        print(f"Found {len(mismatches)} mismatches in {table}")
        for m in mismatches[:5]:  # Show first 5
            print(f"  Key: {m['key']}")
    else:
        print(f"{table}: All {sample_size} samples match")

    return mismatches

9.2 Schema Validation

-- Compare schema definitions
-- On Cassandra:
DESCRIBE KEYSPACE production;

-- On HeliosDB:
DESCRIBE KEYSPACE production;

-- Verify specific table
DESCRIBE TABLE production.users;

-- Check indexes
SELECT * FROM system_schema.indexes WHERE keyspace_name = 'production';

-- Check materialized views
SELECT * FROM system_schema.views WHERE keyspace_name = 'production';

-- Check UDTs
SELECT * FROM system_schema.types WHERE keyspace_name = 'production';

9.3 Application Validation Checklist

  • [ ] All application endpoints responding correctly
  • [ ] Query latencies within acceptable range
  • [ ] No increase in error rates
  • [ ] Authentication working for all users
  • [ ] Batch operations completing successfully
  • [ ] TTL expiration working correctly
  • [ ] Counter increments accurate
  • [ ] Collection operations (add/remove) working
  • [ ] LWT operations returning expected results
  • [ ] Prepared statements caching effectively

9.4 Automated Validation Script

#!/bin/bash
# Post-migration validation script

CASSANDRA_HOST="cassandra-host"
HELIOSDB_HOST="heliosdb-host"
KEYSPACE="production"

echo "=== Post-Migration Validation ==="

# 1. Connectivity test
echo "Testing connectivity..."
cqlsh $HELIOSDB_HOST -e "SELECT now() FROM system.local" || exit 1

# 2. Schema comparison
echo "Comparing schemas..."
cqlsh $CASSANDRA_HOST -e "DESCRIBE KEYSPACE $KEYSPACE" > /tmp/cassandra_schema.cql
cqlsh $HELIOSDB_HOST -e "DESCRIBE KEYSPACE $KEYSPACE" > /tmp/heliosdb_schema.cql
diff /tmp/cassandra_schema.cql /tmp/heliosdb_schema.cql

# 3. Row count comparison
echo "Comparing row counts..."
for table in users events orders; do
    cassandra_count=$(cqlsh $CASSANDRA_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$table" | grep -E '^\s*[0-9]+')
    heliosdb_count=$(cqlsh $HELIOSDB_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$table" | grep -E '^\s*[0-9]+')
    echo "$table: Cassandra=$cassandra_count, HeliosDB=$heliosdb_count"
done

# 4. Query performance test
echo "Testing query performance..."
time cqlsh $HELIOSDB_HOST -e "SELECT * FROM $KEYSPACE.users LIMIT 1000"

echo "=== Validation Complete ==="

10. Performance Tuning

10.1 HeliosDB Configuration Optimization

# heliosdb.toml - Cassandra protocol optimization

[cassandra]
# Network settings
listen_address = "0.0.0.0"
port = 9042
max_connections = 2000

[cassandra.protocol]
# Enable compression for better throughput
compression_enabled = true
compression_algorithms = ["lz4"]

# Increase frame size for large batches
max_frame_size = 536870912  # 512MB

[cassandra.query]
# Optimize prepared statement cache
prepared_cache_enabled = true
prepared_cache_size = 5000

# Adjust page size for your workload
default_page_size = 10000
max_page_size = 100000

[cassandra.batch]
# Increase batch limits
max_batch_size = 100000
batch_timeout = 30000

[cassandra.performance]
# Enable query cache for repeated queries
query_cache_enabled = true
query_cache_size = 512  # MB

# Log slow queries for optimization
log_slow_queries = true
slow_query_threshold_ms = 100

10.2 Client-Side Optimization

Connection Pool Tuning (Java)

CqlSession session = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("heliosdb-host", 9042))
    .withLocalDatacenter("datacenter1")
    .withConfigLoader(
        DriverConfigLoader.programmaticBuilder()
            .withInt(DefaultDriverOption.CONNECTION_POOL_LOCAL_SIZE, 4)
            .withInt(DefaultDriverOption.CONNECTION_POOL_REMOTE_SIZE, 2)
            .withInt(DefaultDriverOption.CONNECTION_MAX_REQUESTS, 32768)
            .withDuration(DefaultDriverOption.REQUEST_TIMEOUT, Duration.ofSeconds(10))
            .build()
    )
    .build();

Connection Pool Tuning (Python)

from cassandra.cluster import Cluster, ExecutionProfile
from cassandra.policies import DCAwareRoundRobinPolicy, TokenAwarePolicy

# Optimize execution profile
profile = ExecutionProfile(
    load_balancing_policy=TokenAwarePolicy(
        DCAwareRoundRobinPolicy(local_dc='datacenter1')
    ),
    request_timeout=10.0,
    row_factory=tuple_factory  # More efficient than named tuples
)

cluster = Cluster(
    ['heliosdb-host'],
    execution_profiles={'default': profile},
    protocol_version=5,
    compression=True,
    idle_heartbeat_interval=30,
    connection_class=ShardAwareConnection  # If using multiple shards
)

10.3 Query Optimization Tips

Use Prepared Statements

# Always use prepared statements for repeated queries
prepared = session.prepare(
    "SELECT * FROM users WHERE user_id = ? AND event_time >= ? AND event_time < ?"
)

# Execute with parameters
for user_id in user_ids:
    result = session.execute(prepared, [user_id, start_time, end_time])

Optimize Batch Operations

from cassandra.query import BatchStatement, BatchType

# Use UNLOGGED batches for better performance when atomicity not required
batch = BatchStatement(batch_type=BatchType.UNLOGGED)

# Keep batches reasonable size (100-500 statements)
for i, row in enumerate(rows):
    batch.add(prepared, row)
    if (i + 1) % 100 == 0:
        session.execute(batch)
        batch = BatchStatement(batch_type=BatchType.UNLOGGED)

Avoid Anti-Patterns

-- AVOID: Full table scans
SELECT * FROM users;  -- Bad
SELECT * FROM users WHERE status = 'active' ALLOW FILTERING;  -- Bad

-- BETTER: Use partition keys
SELECT * FROM users WHERE user_id = ?;  -- Good
SELECT * FROM users WHERE user_id IN (?, ?, ?);  -- Good

-- AVOID: Large IN clauses
SELECT * FROM users WHERE user_id IN (/* 1000 values */);  -- Bad

-- BETTER: Multiple queries or token range
SELECT * FROM users WHERE token(user_id) >= ? AND token(user_id) < ?;

10.4 Monitoring Performance

-- HeliosDB performance views (accessible via SQL interface)
SELECT * FROM heliosdb_stats.cql_query_latency ORDER BY avg_latency_ms DESC LIMIT 10;

SELECT * FROM heliosdb_stats.cql_throughput WHERE timestamp > now() - interval '1 hour';

-- Via CQL
SELECT * FROM system.local;  -- Connection info
SELECT * FROM system.peers;  -- Cluster topology

11. Common Issues and Troubleshooting

11.1 Connection Issues

Issue: Connection Refused

NoHostAvailable: Unable to connect to any servers

Solutions:

# Check HeliosDB is running
systemctl status heliosdb

# Verify port is listening
netstat -tlnp | grep 9042

# Check firewall
firewall-cmd --list-ports | grep 9042

# Test connectivity
nc -zv heliosdb-host 9042

Issue: Authentication Failed

AuthenticationFailed: Authentication failed

Solutions:

# Ensure credentials are correct
from cassandra.auth import PlainTextAuthProvider

auth = PlainTextAuthProvider(
    username='your_username',
    password='your_password'
)
cluster = Cluster(['heliosdb-host'], auth_provider=auth)

11.2 Schema Issues

Issue: Keyspace Not Found

InvalidRequest: Keyspace 'xxx' does not exist

Solutions:

-- Create keyspace if missing
CREATE KEYSPACE IF NOT EXISTS xxx WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};

-- Or import schema first
cqlsh heliosdb-host -f schema.cql

Issue: Type Mismatch

InvalidRequest: Expected type X but got type Y

Solutions:

# Ensure proper type conversion
from cassandra.util import uuid_from_time
import uuid

# For UUID columns
user_id = uuid.UUID('550e8400-e29b-41d4-a716-446655440000')

# For TIMESTAMP columns
from datetime import datetime
event_time = datetime.utcnow()

# For collections
tags = {'tag1', 'tag2'}  # Set
prefs = {'key': 'value'}  # Map
items = ['item1', 'item2']  # List

11.3 Query Issues

Issue: Timeout on Large Queries

OperationTimedOut: errors={}, last_host=xxx

Solutions:

# Increase timeout
from cassandra.cluster import Cluster, ExecutionProfile

profile = ExecutionProfile(request_timeout=30.0)  # 30 seconds

cluster = Cluster(
    ['heliosdb-host'],
    execution_profiles={'default': profile}
)

# Or use paging for large results
query = SimpleStatement(
    "SELECT * FROM large_table",
    fetch_size=1000  # Page size
)
for row in session.execute(query):
    process(row)

Issue: ALLOW FILTERING Warning

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance

Solutions:

-- Add index for frequently filtered columns
CREATE INDEX ON users (email);

-- Or redesign schema with proper partition keys
-- Instead of:
SELECT * FROM users WHERE status = 'active' ALLOW FILTERING;

-- Create a table partitioned by status:
CREATE TABLE users_by_status (
    status TEXT,
    user_id UUID,
    name TEXT,
    PRIMARY KEY (status, user_id)
);

11.4 Data Migration Issues

Issue: Migration Performance Degradation

Solutions:

# Use parallel migration
from concurrent.futures import ThreadPoolExecutor

def migrate_partition(partition_key):
    # Migrate single partition
    pass

with ThreadPoolExecutor(max_workers=8) as executor:
    executor.map(migrate_partition, partition_keys)

Issue: Out of Memory During Migration

Solutions:

# Use streaming instead of loading all data
rows = session.execute("SELECT * FROM large_table")

# Process in chunks
batch = []
for row in rows:
    batch.append(row)
    if len(batch) >= 1000:
        process_batch(batch)
        batch = []

11.5 Performance Issues

Issue: High Latency After Migration

Diagnosis:

-- Enable tracing
TRACING ON;
SELECT * FROM users WHERE user_id = ?;
TRACING OFF;

Solutions: 1. Check if indexes are properly created 2. Verify partition key access patterns 3. Enable query caching 4. Increase connection pool size

# heliosdb.toml
[cassandra.performance]
query_cache_enabled = true
query_cache_size = 512

Issue: Inconsistent Query Results

Solutions:

# Use appropriate consistency level
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

query = SimpleStatement(
    "SELECT * FROM users WHERE user_id = ?",
    consistency_level=ConsistencyLevel.QUORUM
)

11.6 Collection Operation Issues

Issue: Collection Update Failures

InvalidRequest: Invalid operation on frozen collection

Solutions:

-- Frozen collections are immutable; replace entirely
-- Instead of:
UPDATE users SET address.street = 'New St' WHERE user_id = ?;  -- Fails if frozen

-- Replace the entire frozen field:
UPDATE users SET address = {street: 'New St', city: 'Chicago', ...}
WHERE user_id = ?;

-- Or use non-frozen for mutable fields
ALTER TYPE address DROP street;
ALTER TYPE address ADD street TEXT;

11.7 TTL Issues

Issue: Data Expiring Unexpectedly

Diagnosis:

-- Check TTL values
SELECT user_id, TTL(name), TTL(email) FROM users WHERE user_id = ?;

-- Check table default TTL
DESCRIBE TABLE users;

Solutions:

-- Update TTL on existing data
UPDATE users USING TTL 86400
SET name = name, email = email
WHERE user_id = ?;

-- Remove TTL (set to 0)
UPDATE users USING TTL 0
SET name = name
WHERE user_id = ?;


Quick Reference Card

Essential Commands

Task Command
Connect cqlsh heliosdb-host 9042
List keyspaces DESCRIBE KEYSPACES;
Use keyspace USE keyspace_name;
List tables DESCRIBE TABLES;
Table schema DESCRIBE TABLE table_name;
Enable tracing TRACING ON;
Import schema cqlsh -f schema.cql

Migration Checklist Summary

  1. [ ] Pre-migration assessment complete
  2. [ ] Schema exported and validated
  3. [ ] Schema imported to HeliosDB
  4. [ ] Data exported from Cassandra
  5. [ ] Data imported to HeliosDB
  6. [ ] Row counts validated
  7. [ ] Sample data verified
  8. [ ] Application configuration updated
  9. [ ] Application tested end-to-end
  10. [ ] Performance benchmarked
  11. [ ] Monitoring configured
  12. [ ] Rollback plan documented


Document Version: 1.0 Last Updated: January 2026 Compatibility: HeliosDB 7.0+, Apache Cassandra 3.x/4.x