Skip to content

MongoDB to HeliosDB Migration Guide

Version: 1.0 Last Updated: 2026-01-04


Table of Contents

  1. Introduction
  2. Compatibility Overview
  3. Pre-Migration Checklist
  4. Conceptual Mapping
  5. Step-by-Step Migration Process
  6. Query Translation Examples
  7. Feature Mapping
  8. Post-Migration Validation
  9. Performance Considerations
  10. Common Issues and Troubleshooting

1. Introduction

1.1 Why Migrate from MongoDB to HeliosDB?

HeliosDB offers significant advantages over MongoDB while maintaining wire protocol compatibility:

Aspect MongoDB HeliosDB
Data Model Document-only Unified multi-model (Document, Relational, Graph, Vector, Time-Series)
Query Language MongoDB Query Language only MQL, SQL, GraphQL, CQL, Redis commands
Transactions Snapshot isolation SERIALIZABLE isolation with zero-maintenance MVCC
Joins $lookup only Native SQL JOINs + $lookup
Analytics Limited aggregation Full OLAP + OLTP (HTAP)
Full-Text Search External (Atlas Search) Built-in with GraphRAG integration
Change Streams Standard CDC-backed with lower latency
Vector Search Atlas Vector Search Native HNSW with multi-modal embeddings
Licensing SSPL Enterprise-friendly
Sharding Manual configuration Transparent horizontal scaling

1.2 Migration Benefits

  • Unified Platform: Consolidate multiple database systems into one
  • SQL Access: Query MongoDB-style documents with standard SQL
  • Enhanced Analytics: Run complex analytical queries on document data
  • Better Transactions: ACID guarantees with serializable isolation
  • Cost Reduction: Single database license, reduced operational overhead
  • GraphRAG Integration: AI-powered semantic search on document content

1.3 Wire Protocol Compatibility

HeliosDB implements the MongoDB wire protocol (OP_MSG), allowing existing MongoDB applications to connect without code changes:

# Existing MongoDB code works unchanged
from pymongo import MongoClient

# Simply change the hostname
client = MongoClient("mongodb://heliosdb-server:27017/")
db = client.my_database
collection = db.my_collection

# All operations work as expected
result = collection.find({"status": "active"})

2. Compatibility Overview

2.1 Wire Protocol Support

Protocol Feature Status Notes
OP_MSG 100% Modern protocol (MongoDB 3.6+)
OP_QUERY 100% Legacy support
OP_INSERT/UPDATE/DELETE 100% Legacy operations
BSON Encoding 100% All 19 BSON types
SCRAM-SHA-256 100% Recommended authentication
TLS/SSL 100% Encrypted connections

2.2 CRUD Operations

Operation Compatibility Notes
insertOne/insertMany 100% Full support
find/findOne 100% All query operators
updateOne/updateMany 100% All update operators
deleteOne/deleteMany 100% Full support
replaceOne 100% Document replacement
bulkWrite 100% Ordered and unordered

2.3 Query Operators

Category Coverage Examples
Comparison 100% $eq, $gt, $lt, $in, $nin
Logical 100% $and, $or, $not, $nor
Element 100% $exists, $type
Evaluation 100% $regex, $text, $expr
Array 100% $all, $elemMatch, $size
Geospatial 100% $near, $geoWithin
Bitwise 100% $bitsAllSet, $bitsAnyClear

2.4 Aggregation Pipeline

Stage Status Notes
$match Supported Filter documents
$project Supported Reshape documents
$group Supported Group with accumulators
$sort Supported Multi-field sorting
$lookup Supported Left outer join
$graphLookup Supported Recursive lookup
$unwind Supported Deconstruct arrays
$facet Supported Multi-faceted results
$setWindowFields Supported Window functions
$merge Supported Write to collection

2.5 Known Limitations

Feature Status Alternative
GridFS Not supported Use blob storage or S3 integration
Capped collections Not supported Use TTL indexes
Server-side JavaScript Not supported Use SQL procedures
Map-reduce Deprecated Use aggregation pipeline

3. Pre-Migration Checklist

3.1 Assessment Phase

  • [ ] Inventory databases and collections

    # List all databases
    mongosh --eval "db.adminCommand('listDatabases')"
    
    # List collections in each database
    mongosh mydb --eval "db.getCollectionNames()"
    

  • [ ] Document data volume

    // Get collection statistics
    db.collection.stats()
    
    // Count documents
    db.collection.countDocuments({})
    

  • [ ] Identify GridFS usage

    // Check for GridFS collections
    db.getCollectionNames().filter(n => n.endsWith('.files') || n.endsWith('.chunks'))
    

  • [ ] Review index definitions

    // List all indexes
    db.collection.getIndexes()
    

  • [ ] Analyze query patterns

    // Enable profiler
    db.setProfilingLevel(1, { slowms: 100 })
    
    // Review slow queries
    db.system.profile.find().sort({ ts: -1 }).limit(20)
    

3.2 Technical Requirements

  • [ ] HeliosDB version: 7.0 or later recommended
  • [ ] Storage capacity: 1.5x current MongoDB storage (for import headroom)
  • [ ] Network connectivity: MongoDB port 27017 accessible
  • [ ] Authentication: Prepare HeliosDB credentials
  • [ ] Backup verified: Recent MongoDB backup tested for restore

3.3 Application Assessment

  • [ ] Driver versions: Document current MongoDB driver versions
  • [ ] Connection strings: Inventory all application connection strings
  • [ ] Custom code: Review any server-side JavaScript or map-reduce usage
  • [ ] Change stream consumers: Document change stream applications
  • [ ] Test environment: Prepare isolated test environment

3.4 Risk Assessment

Risk Mitigation
Data loss Comprehensive backup before migration
Application downtime Plan maintenance window or dual-write strategy
Query incompatibility Pre-test all application queries
Performance regression Benchmark before and after

4. Conceptual Mapping

4.1 Collections to Tables

MongoDB collections map to HeliosDB tables with JSONB storage:

MongoDB Collection:

// MongoDB: users collection
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "age": 30
  },
  "tags": ["developer", "mongodb"]
}

HeliosDB Table (Internal Representation):

-- Automatic table structure created by MongoDB protocol
CREATE TABLE users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB NOT NULL
);

-- The document is stored in the 'data' JSONB column
-- _id is extracted and indexed for fast lookups

Hybrid Schema (Optional Optimization):

-- For frequently queried fields, create hybrid schema
CREATE TABLE users (
    _id UUID PRIMARY KEY,
    username VARCHAR(100),
    email VARCHAR(255),
    profile JSONB,
    tags TEXT[],
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create indexes on extracted fields
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tags ON users USING GIN(tags);

4.2 Documents to Rows with JSONB

MongoDB Concept HeliosDB Equivalent
Document Row with JSONB column
Embedded document Nested JSONB object
Array JSONB array or PostgreSQL array
ObjectId UUID (auto-converted)
ISODate TIMESTAMP/TIMESTAMPTZ
NumberLong BIGINT
NumberDecimal NUMERIC
BinData BYTEA

Type Conversion Examples:

// MongoDB document with various types
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "price": NumberDecimal("19.99"),
  "quantity": NumberLong(100),
  "created": ISODate("2025-01-01T00:00:00Z"),
  "metadata": BinData(0, "base64data=="),
  "location": {
    "type": "Point",
    "coordinates": [-73.97, 40.77]
  }
}

-- HeliosDB JSONB representation
{
  "_id": "507f1f77-bcf8-6cd7-9943-9011507f1f77",
  "price": 19.99,
  "quantity": 100,
  "created": "2025-01-01T00:00:00Z",
  "metadata": "\\x626173653634646174613d3d",
  "location": {
    "type": "Point",
    "coordinates": [-73.97, 40.77]
  }
}

4.3 Indexes

MongoDB Index HeliosDB Equivalent
Single field B-tree index on JSONB path
Compound Composite B-tree index
Unique Unique constraint/index
TTL TTL extension on JSONB
Text GIN index with full-text search
2dsphere GiST geospatial index
Hashed Hash index
Wildcard GIN index on JSONB

Index Migration Examples:

// MongoDB indexes
db.users.createIndex({ "email": 1 }, { unique: true })
db.users.createIndex({ "profile.age": 1, "status": 1 })
db.users.createIndex({ "location": "2dsphere" })
db.users.createIndex({ "content": "text", "title": "text" })
db.users.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })
-- HeliosDB equivalent indexes
CREATE UNIQUE INDEX idx_users_email ON users ((data->>'email'));
CREATE INDEX idx_users_age_status ON users ((data->'profile'->>'age'), (data->>'status'));
CREATE INDEX idx_users_location ON users USING GIST ((data->'location'));
CREATE INDEX idx_users_content ON users USING GIN (to_tsvector('english',
    COALESCE(data->>'content', '') || ' ' || COALESCE(data->>'title', '')));
-- TTL implemented via background job or HeliosDB TTL extension

4.4 Database and Collection Namespaces

MongoDB HeliosDB
Database Schema
Collection Table
mydb.users mydb.users (schema.table)
admin heliosdb_admin (system schema)

5. Step-by-Step Migration Process

5.1 Schema Design in HeliosDB

Option A: Pure Document Mode (Simplest)

-- HeliosDB automatically creates this structure
-- when accessed via MongoDB protocol
CREATE SCHEMA IF NOT EXISTS mydb;

CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB NOT NULL
);

CREATE INDEX idx_users_data ON mydb.users USING GIN (data jsonb_path_ops);
-- Design schema with frequently queried fields extracted
CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    profile JSONB,
    tags TEXT[],
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes for common query patterns
CREATE INDEX idx_users_status ON mydb.users(status);
CREATE INDEX idx_users_created ON mydb.users(created_at DESC);
CREATE INDEX idx_users_profile ON mydb.users USING GIN (profile);
CREATE INDEX idx_users_tags ON mydb.users USING GIN (tags);

Option C: Full Relational Conversion (Best for Analytics)

-- Normalize embedded documents into separate tables
CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY,
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE mydb.user_profiles (
    user_id UUID PRIMARY KEY REFERENCES mydb.users(_id),
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    age INTEGER,
    bio TEXT
);

CREATE TABLE mydb.user_tags (
    id SERIAL PRIMARY KEY,
    user_id UUID REFERENCES mydb.users(_id),
    tag VARCHAR(50)
);

CREATE INDEX idx_user_tags_user ON mydb.user_tags(user_id);
CREATE INDEX idx_user_tags_tag ON mydb.user_tags(tag);

5.2 Data Export Using mongodump/mongoexport

# Export entire database
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --out=/backup/mongodb \
  --gzip

# Export specific collection
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --out=/backup/mongodb \
  --gzip

# Export with query filter (incremental)
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=orders \
  --query='{"createdAt": {"$gte": {"$date": "2025-01-01T00:00:00Z"}}}' \
  --out=/backup/mongodb

Method 2: mongoexport (JSON/CSV, Good for Transformation)

# Export to JSON (extended JSON format)
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --out=users.json \
  --jsonArray

# Export to CSV (for relational import)
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --type=csv \
  --fields=_id,username,email,profile.firstName,profile.lastName \
  --out=users.csv

# Export with query filter
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=orders \
  --query='{"status": "completed"}' \
  --out=completed_orders.json

5.3 Data Transformation and Import

Direct Import via mongorestore (Fastest)

# Import directly to HeliosDB (uses MongoDB wire protocol)
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --gzip \
  /backup/mongodb/mydb

# Import specific collection
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --collection=users \
  --gzip \
  /backup/mongodb/mydb/users.bson.gz

# Import with options
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --drop \
  --numInsertionWorkers=4 \
  --batchSize=1000 \
  /backup/mongodb/mydb

JSON Import via SQL COPY

# Convert MongoDB extended JSON to standard JSON
cat users.json | jq -c '.[] | {_id: ._id."$oid", data: .}' > users_transformed.json

# Import into HeliosDB using COPY
psql -h heliosdb-server -U admin -d mydb -c "
COPY users (data) FROM STDIN WITH (FORMAT json);
" < users_transformed.json

Python Migration Script (Complex Transformations)

#!/usr/bin/env python3
"""
MongoDB to HeliosDB Migration Script
Handles complex data transformations during migration
"""

from pymongo import MongoClient
from datetime import datetime
import uuid
import json

# Source MongoDB
source_client = MongoClient("mongodb://source-mongodb:27017/")
source_db = source_client.mydb

# Target HeliosDB (via MongoDB protocol)
target_client = MongoClient("mongodb://heliosdb-server:27017/")
target_db = target_client.mydb

def convert_objectid(oid):
    """Convert MongoDB ObjectId to UUID format"""
    hex_str = str(oid)
    # Pad to 32 characters for UUID
    padded = hex_str.ljust(32, '0')
    return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
                     padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

def transform_document(doc):
    """Transform document for HeliosDB compatibility"""
    transformed = {}

    for key, value in doc.items():
        if key == '_id':
            # Convert ObjectId to UUID string
            transformed['_id'] = str(convert_objectid(value))
        elif isinstance(value, datetime):
            # Convert datetime to ISO string
            transformed[key] = value.isoformat()
        elif isinstance(value, dict):
            # Recursively transform nested documents
            transformed[key] = transform_document(value)
        elif isinstance(value, list):
            # Transform array elements
            transformed[key] = [
                transform_document(v) if isinstance(v, dict) else v
                for v in value
            ]
        else:
            transformed[key] = value

    return transformed

def migrate_collection(collection_name, batch_size=1000):
    """Migrate a single collection with batching"""
    source_collection = source_db[collection_name]
    target_collection = target_db[collection_name]

    total = source_collection.count_documents({})
    migrated = 0

    print(f"Migrating {collection_name}: {total} documents")

    cursor = source_collection.find().batch_size(batch_size)
    batch = []

    for doc in cursor:
        transformed = transform_document(doc)
        batch.append(transformed)

        if len(batch) >= batch_size:
            target_collection.insert_many(batch, ordered=False)
            migrated += len(batch)
            print(f"  Progress: {migrated}/{total} ({100*migrated/total:.1f}%)")
            batch = []

    # Insert remaining documents
    if batch:
        target_collection.insert_many(batch, ordered=False)
        migrated += len(batch)

    print(f"  Completed: {migrated} documents migrated")
    return migrated

def migrate_indexes(collection_name):
    """Recreate indexes in target collection"""
    source_collection = source_db[collection_name]
    target_collection = target_db[collection_name]

    indexes = source_collection.index_information()

    for name, info in indexes.items():
        if name == '_id_':
            continue  # Skip default _id index

        keys = info['key']
        options = {k: v for k, v in info.items()
                   if k not in ('key', 'v', 'ns')}

        try:
            target_collection.create_index(keys, name=name, **options)
            print(f"  Created index: {name}")
        except Exception as e:
            print(f"  Warning: Could not create index {name}: {e}")

def main():
    collections = source_db.list_collection_names()

    # Filter out system collections
    collections = [c for c in collections
                   if not c.startswith('system.')]

    print(f"Found {len(collections)} collections to migrate")

    for collection_name in collections:
        migrate_collection(collection_name)
        migrate_indexes(collection_name)

    print("\nMigration complete!")

if __name__ == "__main__":
    main()

5.4 Application Connection Changes

Python (PyMongo)

# Before: MongoDB
from pymongo import MongoClient
client = MongoClient("mongodb://mongodb-server:27017/")

# After: HeliosDB (same code, different hostname)
from pymongo import MongoClient
client = MongoClient("mongodb://heliosdb-server:27017/")

# With authentication
client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    username="myuser",
    password="mypassword",
    authSource="admin",
    authMechanism="SCRAM-SHA-256"
)

# With connection pooling and options
client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    maxPoolSize=50,
    minPoolSize=10,
    maxIdleTimeMS=30000,
    serverSelectionTimeoutMS=5000
)

Node.js (mongodb driver)

// Before: MongoDB
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://mongodb-server:27017/');

// After: HeliosDB
const client = new MongoClient('mongodb://heliosdb-server:27017/', {
    auth: {
        username: 'myuser',
        password: 'mypassword'
    },
    authSource: 'admin',
    maxPoolSize: 50
});

Java (MongoDB Driver)

// Before: MongoDB
MongoClient client = MongoClients.create("mongodb://mongodb-server:27017/");

// After: HeliosDB
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString("mongodb://heliosdb-server:27017/"))
    .credential(MongoCredential.createScramSha256Credential(
        "myuser", "admin", "mypassword".toCharArray()))
    .build();
MongoClient client = MongoClients.create(settings);

Environment-Based Configuration

import os

# Use environment variable for easy switching
MONGODB_URI = os.environ.get(
    "MONGODB_URI",
    "mongodb://heliosdb-server:27017/"
)

client = MongoClient(MONGODB_URI)

6. Query Translation Examples

6.1 find() to SELECT

Basic Find

// MongoDB
db.users.find({ status: "active" })
-- HeliosDB SQL equivalent
SELECT * FROM users WHERE data->>'status' = 'active';

-- Or with hybrid schema
SELECT * FROM users WHERE status = 'active';

Find with Projection

// MongoDB
db.users.find(
    { status: "active" },
    { username: 1, email: 1, _id: 0 }
)
-- HeliosDB SQL
SELECT
    data->>'username' AS username,
    data->>'email' AS email
FROM users
WHERE data->>'status' = 'active';

Find with Operators

// MongoDB: comparison operators
db.orders.find({
    amount: { $gte: 100, $lte: 500 },
    status: { $in: ["pending", "processing"] }
})
-- HeliosDB SQL
SELECT * FROM orders
WHERE (data->>'amount')::numeric BETWEEN 100 AND 500
  AND data->>'status' IN ('pending', 'processing');

Find with Nested Fields

// MongoDB: nested document query
db.users.find({ "profile.age": { $gte: 21 } })
-- HeliosDB SQL
SELECT * FROM users
WHERE (data->'profile'->>'age')::integer >= 21;

-- Using jsonpath (more efficient)
SELECT * FROM users
WHERE data @@ '$.profile.age >= 21';

Find with Arrays

// MongoDB: array queries
db.users.find({ tags: "mongodb" })  // contains
db.users.find({ tags: { $all: ["mongodb", "developer"] } })  // contains all
db.users.find({ "items.price": { $gt: 100 } })  // array element match
-- HeliosDB SQL: array contains
SELECT * FROM users WHERE data->'tags' ? 'mongodb';

-- Contains all
SELECT * FROM users
WHERE data->'tags' ?& array['mongodb', 'developer'];

-- Array element match
SELECT * FROM users
WHERE EXISTS (
    SELECT 1 FROM jsonb_array_elements(data->'items') AS item
    WHERE (item->>'price')::numeric > 100
);

6.2 aggregate() to SQL with JSON Functions

Group and Count

// MongoDB aggregation
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $group: {
        _id: "$category",
        total: { $sum: "$amount" },
        count: { $sum: 1 },
        avgAmount: { $avg: "$amount" }
    }},
    { $sort: { total: -1 } },
    { $limit: 10 }
])
-- HeliosDB SQL
SELECT
    data->>'category' AS category,
    SUM((data->>'amount')::numeric) AS total,
    COUNT(*) AS count,
    AVG((data->>'amount')::numeric) AS avg_amount
FROM orders
WHERE data->>'status' = 'completed'
GROUP BY data->>'category'
ORDER BY total DESC
LIMIT 10;

Lookup (Join)

// MongoDB $lookup
db.orders.aggregate([
    { $lookup: {
        from: "customers",
        localField: "customer_id",
        foreignField: "_id",
        as: "customer"
    }},
    { $unwind: "$customer" }
])
-- HeliosDB SQL (much simpler with native JOINs)
SELECT
    o.*,
    c.data AS customer
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;

-- Or with JSONB aggregation to mimic $unwind result
SELECT
    o.data || jsonb_build_object('customer', c.data) AS document
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;
// MongoDB $facet
db.products.aggregate([
    { $facet: {
        byCategory: [
            { $group: { _id: "$category", count: { $sum: 1 } } }
        ],
        priceRange: [
            { $bucket: {
                groupBy: "$price",
                boundaries: [0, 50, 100, 500],
                default: "Other"
            }}
        ],
        totalCount: [
            { $count: "count" }
        ]
    }}
])
-- HeliosDB SQL: Use separate queries or WITH clause
WITH
by_category AS (
    SELECT data->>'category' AS category, COUNT(*) AS count
    FROM products
    GROUP BY data->>'category'
),
price_range AS (
    SELECT
        CASE
            WHEN (data->>'price')::numeric < 50 THEN '0-50'
            WHEN (data->>'price')::numeric < 100 THEN '50-100'
            WHEN (data->>'price')::numeric < 500 THEN '100-500'
            ELSE 'Other'
        END AS range,
        COUNT(*) AS count
    FROM products
    GROUP BY 1
),
total AS (
    SELECT COUNT(*) AS count FROM products
)
SELECT
    jsonb_build_object(
        'byCategory', (SELECT jsonb_agg(row_to_json(by_category)) FROM by_category),
        'priceRange', (SELECT jsonb_agg(row_to_json(price_range)) FROM price_range),
        'totalCount', (SELECT count FROM total)
    ) AS facets;

6.3 update() to UPDATE with JSONB Operators

Basic Update

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $set: { status: "active" } }
)
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(data, '{status}', '"active"')
WHERE _id = '...'::uuid;

-- With hybrid schema
UPDATE users SET status = 'active' WHERE _id = '...'::uuid;

Increment

// MongoDB
db.products.updateMany(
    { category: "electronics" },
    { $inc: { viewCount: 1, stock: -1 } }
)
-- HeliosDB SQL
UPDATE products
SET data = data || jsonb_build_object(
    'viewCount', COALESCE((data->>'viewCount')::integer, 0) + 1,
    'stock', COALESCE((data->>'stock')::integer, 0) - 1
)
WHERE data->>'category' = 'electronics';

Array Push

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $push: { tags: "premium" } }
)
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
    data,
    '{tags}',
    COALESCE(data->'tags', '[]'::jsonb) || '"premium"'::jsonb
)
WHERE _id = '...'::uuid;

Array Pull

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $pull: { tags: "temporary" } }
)
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
    data,
    '{tags}',
    (SELECT jsonb_agg(elem)
     FROM jsonb_array_elements(data->'tags') AS elem
     WHERE elem != '"temporary"')
)
WHERE _id = '...'::uuid;

Upsert

// MongoDB
db.users.updateOne(
    { email: "new@example.com" },
    { $set: { name: "New User", createdAt: new Date() } },
    { upsert: true }
)
-- HeliosDB SQL
INSERT INTO users (data)
VALUES ('{"email": "new@example.com", "name": "New User", "createdAt": "..."}'::jsonb)
ON CONFLICT ((data->>'email'))
DO UPDATE SET data = users.data || EXCLUDED.data;

7. Feature Mapping

7.1 Aggregation Pipeline Equivalents

MongoDB Stage HeliosDB SQL Equivalent
$match WHERE clause
$project SELECT with JSON functions
$group GROUP BY with aggregates
$sort ORDER BY
$limit LIMIT
$skip OFFSET
$unwind LATERAL jsonb_array_elements()
$lookup JOIN
$graphLookup WITH RECURSIVE
$bucket CASE WHEN in GROUP BY
$facet Multiple CTEs
$setWindowFields Window functions

7.2 Change Streams vs HeliosDB CDC

Feature MongoDB Change Streams HeliosDB CDC
Protocol Native MongoDB Native + Debezium-compatible
Latency ~100ms <10ms
Resume Token-based LSN-based (more reliable)
Filtering Pipeline stages SQL predicates
Output BSON JSON, Avro, Protobuf
Destinations Application code Kafka, Webhooks, S3, etc.

MongoDB Change Stream:

const pipeline = [
    { $match: { operationType: "insert" } }
];

const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
    console.log(change);
});

HeliosDB CDC (via MongoDB protocol - identical API):

// Same API works with HeliosDB
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
    console.log(change);
});

HeliosDB CDC (direct SQL subscription):

-- Create CDC subscription
CREATE PUBLICATION users_changes FOR TABLE users;

-- Subscribe via logical replication
CREATE SUBSCRIPTION users_sub
    CONNECTION 'host=heliosdb-server dbname=mydb'
    PUBLICATION users_changes;

HeliosDB Webhook CDC:

-- Configure webhook notification
SELECT helios_cdc.create_webhook(
    'users',
    'https://myapp.com/webhooks/user-changes',
    '{"events": ["INSERT", "UPDATE", "DELETE"]}'
);

7.3 Geospatial Queries

MongoDB HeliosDB
2dsphere index GiST index with geography
$near ST_DWithin + ORDER BY ST_Distance
$geoWithin ST_Within
$geoIntersects ST_Intersects
GeoJSON Native GeoJSON support

MongoDB Geospatial:

// Create 2dsphere index
db.places.createIndex({ location: "2dsphere" })

// Find near point
db.places.find({
    location: {
        $near: {
            $geometry: { type: "Point", coordinates: [-73.97, 40.77] },
            $maxDistance: 1000
        }
    }
})

// Find within polygon
db.places.find({
    location: {
        $geoWithin: {
            $geometry: {
                type: "Polygon",
                coordinates: [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
            }
        }
    }
})

HeliosDB SQL Geospatial:

-- Create spatial index
CREATE INDEX idx_places_location ON places
    USING GIST (ST_GeomFromGeoJSON(data->>'location'));

-- Find near point (within 1000 meters)
SELECT * FROM places
WHERE ST_DWithin(
    ST_GeomFromGeoJSON(data->>'location')::geography,
    ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography,
    1000
)
ORDER BY ST_Distance(
    ST_GeomFromGeoJSON(data->>'location')::geography,
    ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography
);

-- Find within polygon
SELECT * FROM places
WHERE ST_Within(
    ST_GeomFromGeoJSON(data->>'location'),
    ST_GeomFromGeoJSON('{
        "type": "Polygon",
        "coordinates": [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
    }')
);

MongoDB HeliosDB
Text index GIN index with tsvector
$text search to_tsvector/to_tsquery
Score sorting ts_rank
Language support 30+ languages
Stemming Built-in

MongoDB Text Search:

// Create text index
db.articles.createIndex({ title: "text", content: "text" })

// Search
db.articles.find(
    { $text: { $search: "mongodb database" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

HeliosDB Full-Text Search:

-- Create full-text index
CREATE INDEX idx_articles_fts ON articles
    USING GIN (to_tsvector('english',
        COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')));

-- Search with ranking
SELECT
    *,
    ts_rank(
        to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')),
        plainto_tsquery('english', 'mongodb database')
    ) AS score
FROM articles
WHERE to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', ''))
    @@ plainto_tsquery('english', 'mongodb database')
ORDER BY score DESC;

HeliosDB GraphRAG Semantic Search:

-- Enable semantic search (AI-powered)
SELECT helios_graphrag.enable_semantic_search('articles', 'content');

-- Semantic search
SELECT * FROM helios_graphrag.semantic_search(
    'articles',
    'How do document databases handle transactions?',
    10  -- top 10 results
);


8. Post-Migration Validation

8.1 Data Integrity Checks

// Compare document counts
// Source MongoDB
const sourceCount = db.users.countDocuments({});
console.log(`Source count: ${sourceCount}`);

// Target HeliosDB
const targetCount = db.users.countDocuments({});
console.log(`Target count: ${targetCount}`);

// Verify counts match
if (sourceCount !== targetCount) {
    console.error(`Count mismatch! Source: ${sourceCount}, Target: ${targetCount}`);
}
def validate_migration(source_db, target_db, collection_name, sample_size=1000):
    """Validate migrated data integrity"""
    source = source_db[collection_name]
    target = target_db[collection_name]

    # Count validation
    source_count = source.count_documents({})
    target_count = target.count_documents({})

    if source_count != target_count:
        print(f"ERROR: Count mismatch - Source: {source_count}, Target: {target_count}")
        return False

    # Sample validation
    sample = list(source.aggregate([{"$sample": {"size": sample_size}}]))

    for doc in sample:
        target_doc = target.find_one({"_id": doc["_id"]})
        if target_doc is None:
            print(f"ERROR: Document {doc['_id']} not found in target")
            return False

        # Compare key fields (adjust based on your schema)
        for key in ["username", "email", "status"]:
            if doc.get(key) != target_doc.get(key):
                print(f"ERROR: Field {key} mismatch for {doc['_id']}")
                return False

    print(f"Validation passed: {source_count} documents verified")
    return True

8.2 Query Validation

def validate_queries(source_db, target_db, collection_name):
    """Run test queries on both databases and compare results"""

    test_queries = [
        # Basic find
        {"status": "active"},
        # Range query
        {"created_at": {"$gte": datetime(2025, 1, 1)}},
        # Nested field
        {"profile.age": {"$gte": 21}},
        # Array query
        {"tags": {"$in": ["premium", "vip"]}},
    ]

    source = source_db[collection_name]
    target = target_db[collection_name]

    for query in test_queries:
        source_results = list(source.find(query).limit(100))
        target_results = list(target.find(query).limit(100))

        if len(source_results) != len(target_results):
            print(f"Query result count mismatch for {query}")
            print(f"  Source: {len(source_results)}, Target: {len(target_results)}")
        else:
            print(f"Query {query}: OK ({len(source_results)} results)")

8.3 Index Validation

// Compare indexes
function compareIndexes(sourceDb, targetDb, collectionName) {
    const sourceIndexes = sourceDb[collectionName].getIndexes();
    const targetIndexes = targetDb[collectionName].getIndexes();

    const sourceNames = sourceIndexes.map(i => i.name).sort();
    const targetNames = targetIndexes.map(i => i.name).sort();

    console.log("Source indexes:", sourceNames);
    console.log("Target indexes:", targetNames);

    // Check for missing indexes
    for (const name of sourceNames) {
        if (!targetNames.includes(name)) {
            console.log(`WARNING: Index ${name} missing in target`);
        }
    }
}

8.4 Performance Validation

import time

def benchmark_queries(db, collection_name, iterations=100):
    """Benchmark common query patterns"""
    collection = db[collection_name]

    benchmarks = {
        "find_by_id": lambda: collection.find_one({"_id": sample_id}),
        "find_by_index": lambda: list(collection.find({"status": "active"}).limit(10)),
        "aggregation": lambda: list(collection.aggregate([
            {"$match": {"status": "active"}},
            {"$group": {"_id": "$category", "count": {"$sum": 1}}}
        ])),
        "text_search": lambda: list(collection.find(
            {"$text": {"$search": "search terms"}}
        ).limit(10)),
    }

    results = {}
    for name, query_func in benchmarks.items():
        times = []
        for _ in range(iterations):
            start = time.time()
            query_func()
            times.append(time.time() - start)

        results[name] = {
            "avg_ms": sum(times) / len(times) * 1000,
            "min_ms": min(times) * 1000,
            "max_ms": max(times) * 1000
        }
        print(f"{name}: avg={results[name]['avg_ms']:.2f}ms")

    return results

8.5 Application Testing Checklist

  • [ ] Unit tests pass against HeliosDB
  • [ ] Integration tests pass with new connection strings
  • [ ] CRUD operations work correctly
  • [ ] Aggregation pipelines return expected results
  • [ ] Change streams receive events
  • [ ] Transactions commit and rollback properly
  • [ ] Authentication works with all user accounts
  • [ ] Connection pooling handles load correctly
  • [ ] Error handling catches expected exceptions

9. Performance Considerations

9.1 Index Optimization

-- Analyze query patterns
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';

-- Create expression indexes for frequent queries
CREATE INDEX idx_users_status ON users ((data->>'status'));
CREATE INDEX idx_users_email ON users ((data->>'email'));

-- Create GIN index for flexible JSON queries
CREATE INDEX idx_users_data ON users USING GIN (data jsonb_path_ops);

-- For range queries on numeric fields
CREATE INDEX idx_orders_amount ON orders (((data->>'amount')::numeric));

-- Composite index for multi-field queries
CREATE INDEX idx_orders_status_date ON orders (
    (data->>'status'),
    ((data->>'created_at')::timestamptz)
);

9.2 Query Optimization

-- Use JSONB containment for exact matches (uses GIN index)
SELECT * FROM users WHERE data @> '{"status": "active"}';

-- Use jsonpath for complex queries
SELECT * FROM users
WHERE data @@ '$.profile.age > 21 && $.tags[*] == "premium"';

-- Avoid full table scans with proper indexes
-- BAD: No index usage
SELECT * FROM users WHERE data->>'name' ILIKE '%john%';

-- GOOD: Use full-text search
SELECT * FROM users
WHERE to_tsvector('english', data->>'name') @@ to_tsquery('john');

9.3 Bulk Operations

# Batch inserts for better performance
def bulk_insert(collection, documents, batch_size=1000):
    """Insert documents in batches for optimal performance"""
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        collection.insert_many(batch, ordered=False)
        print(f"Inserted batch {i//batch_size + 1}")
-- Use COPY for large data imports (SQL mode)
COPY users (data) FROM '/path/to/data.json' WITH (FORMAT json);

-- Disable indexes during bulk load, rebuild after
ALTER INDEX idx_users_data DISABLE;
-- ... bulk insert ...
REINDEX INDEX idx_users_data;

9.4 Connection Pooling

# Configure connection pooling
from pymongo import MongoClient

client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    maxPoolSize=100,      # Maximum connections
    minPoolSize=10,       # Minimum connections to maintain
    maxIdleTimeMS=30000,  # Close idle connections after 30s
    waitQueueTimeoutMS=5000,  # Timeout waiting for connection
    socketTimeoutMS=30000,    # Socket operation timeout
    serverSelectionTimeoutMS=5000  # Server selection timeout
)

9.5 Memory and Cache Configuration

-- Configure HeliosDB for document workloads
ALTER SYSTEM SET work_mem = '256MB';  -- For complex aggregations
ALTER SYSTEM SET shared_buffers = '8GB';  -- For caching
ALTER SYSTEM SET effective_cache_size = '24GB';  -- Query planner hint

-- Enable intelligent caching for hot collections
ALTER TABLE users ENABLE INTELLIGENT_CACHING;

10. Common Issues and Troubleshooting

10.1 Connection Issues

Problem: Connection refused

pymongo.errors.ServerSelectionTimeoutError: heliosdb-server:27017: [Errno 111] Connection refused

Solution:

# Check if HeliosDB is listening on MongoDB port
netstat -tlnp | grep 27017

# Verify MongoDB protocol is enabled
psql -c "SHOW helios.mongodb_port;"

# Check firewall rules
sudo firewall-cmd --list-ports | grep 27017

Problem: Authentication failed

pymongo.errors.OperationFailure: Authentication failed

Solution:

# Ensure correct auth mechanism
client = MongoClient(
    "mongodb://user:password@heliosdb-server:27017/",
    authMechanism="SCRAM-SHA-256",  # Ensure this matches server config
    authSource="admin"
)

10.2 Query Compatibility Issues

Problem: Unsupported operator

pymongo.errors.OperationFailure: Unknown operator: $customOp

Solution:

# Check compatibility matrix, use alternative
# Instead of custom operators, use $expr with aggregation expressions

# Instead of
db.collection.find({"$customOp": value})

# Use
db.collection.find({"$expr": {"$eq": ["$field", value]}})

Problem: Query returns different results

Solution:

# Check for type mismatches
# MongoDB may have stored numbers as strings

# Debug: Check actual stored type
doc = collection.find_one()
print(type(doc.get('amount')))  # Check if string vs number

# Fix: Ensure consistent types during migration
if isinstance(doc['amount'], str):
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"amount": float(doc['amount'])}}
    )

10.3 Performance Issues

Problem: Slow queries after migration

Solution:

-- Check if indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Analyze table statistics
ANALYZE users;

-- Check query plan
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';

-- Create missing indexes
CREATE INDEX CONCURRENTLY idx_users_status ON users ((data->>'status'));

Problem: High memory usage

Solution:

-- Check for large documents
SELECT pg_size_pretty(pg_total_relation_size('users')) AS total_size;

-- Find large documents
SELECT _id, pg_column_size(data) AS size
FROM users
ORDER BY size DESC
LIMIT 10;

-- Consider splitting large documents or using hybrid schema

10.4 Data Type Issues

Problem: ObjectId conversion errors

Solution:

from bson import ObjectId
import uuid

def objectid_to_uuid(oid):
    """Convert MongoDB ObjectId to UUID"""
    hex_str = str(oid)
    # Pad to 32 hex characters
    padded = hex_str.ljust(32, '0')
    return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
                     padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

Problem: Date/time precision loss

Solution:

# MongoDB stores dates as milliseconds since epoch
# Ensure proper conversion

from datetime import datetime

def convert_date(mongo_date):
    """Preserve date precision during migration"""
    if isinstance(mongo_date, datetime):
        return mongo_date.isoformat()
    return mongo_date

10.5 Change Stream Issues

Problem: Change stream not receiving events

Solution:

# Ensure collection has proper CDC enabled
# Check via SQL
# SELECT * FROM pg_publication_tables WHERE pubname = 'mongodb_cdc';

# Restart change stream with explicit options
change_stream = collection.watch(
    full_document='updateLookup',  # Include full document on updates
    max_await_time_ms=1000
)

10.6 Transaction Issues

Problem: Transaction aborted unexpectedly

Solution:

# Increase transaction timeout
with client.start_session() as session:
    with session.start_transaction(
        max_commit_time_ms=30000,  # 30 second timeout
        read_concern=ReadConcern("snapshot"),
        write_concern=WriteConcern("majority")
    ):
        # Transaction operations
        pass

10.7 Debugging Tools

# Enable MongoDB driver debugging
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pymongo')
logger.setLevel(logging.DEBUG)
-- Enable HeliosDB query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 100;  -- Log queries > 100ms
SELECT pg_reload_conf();

-- View recent logs
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

Appendix A: Quick Reference Card

Connection String Mapping

MongoDB HeliosDB
mongodb://host:27017/ mongodb://host:27017/
mongodb+srv://cluster/ mongodb://heliosdb:27017/
authSource=admin authSource=admin
replicaSet=rs0 Not needed (built-in HA)

Common Query Translation

MongoDB HeliosDB SQL
find({}) SELECT * FROM collection
find({a: 1}) WHERE data->>'a' = '1'
find({a: {$gt: 1}}) WHERE (data->>'a')::int > 1
find({a: {$in: [1,2]}}) WHERE data->>'a' IN ('1','2')
updateOne({}, {$set: {a: 1}}) UPDATE SET data = jsonb_set(data, '{a}', '1')
aggregate([{$group: ...}]) SELECT ... GROUP BY ...

Useful HeliosDB JSON Functions

Function Purpose
data->>'field' Extract field as text
data->'field' Extract field as JSONB
data @> '{}' JSONB containment
data ? 'key' Key exists
jsonb_set() Set nested value
jsonb_array_elements() Unwind array

Appendix B: Migration Checklist Summary

Pre-Migration

  • [ ] Database inventory completed
  • [ ] Data volume estimated
  • [ ] GridFS usage identified and alternative planned
  • [ ] Backup created and verified
  • [ ] Test environment prepared

Migration

  • [ ] Schema designed (pure document, hybrid, or relational)
  • [ ] Data exported from MongoDB
  • [ ] Data imported to HeliosDB
  • [ ] Indexes recreated
  • [ ] Connection strings updated

Post-Migration

  • [ ] Document counts verified
  • [ ] Sample data validated
  • [ ] All queries tested
  • [ ] Performance benchmarked
  • [ ] Application tests passed
  • [ ] Change streams working
  • [ ] Monitoring configured


Last Updated: 2026-01-04 Migration Guide Version: 1.0 HeliosDB Compatibility: 7.0+