MongoDB to HeliosDB Migration Guide¶

Version: 1.0 Last Updated: 2026-01-04

Table of Contents¶

Introduction
Compatibility Overview
Pre-Migration Checklist
Conceptual Mapping
Step-by-Step Migration Process
Query Translation Examples
Feature Mapping
Post-Migration Validation
Performance Considerations
Common Issues and Troubleshooting

1. Introduction¶

1.1 Why Migrate from MongoDB to HeliosDB?¶

HeliosDB offers significant advantages over MongoDB while maintaining wire protocol compatibility:

Aspect	MongoDB	HeliosDB
Data Model	Document-only	Unified multi-model (Document, Relational, Graph, Vector, Time-Series)
Query Language	MongoDB Query Language only	MQL, SQL, GraphQL, CQL, Redis commands
Transactions	Snapshot isolation	SERIALIZABLE isolation with zero-maintenance MVCC
Joins	$lookup only	Native SQL JOINs + $lookup
Analytics	Limited aggregation	Full OLAP + OLTP (HTAP)
Full-Text Search	External (Atlas Search)	Built-in with GraphRAG integration
Change Streams	Standard	CDC-backed with lower latency
Vector Search	Atlas Vector Search	Native HNSW with multi-modal embeddings
Licensing	SSPL	Enterprise-friendly
Sharding	Manual configuration	Transparent horizontal scaling

1.2 Migration Benefits¶

Unified Platform: Consolidate multiple database systems into one
SQL Access: Query MongoDB-style documents with standard SQL
Enhanced Analytics: Run complex analytical queries on document data
Better Transactions: ACID guarantees with serializable isolation
Cost Reduction: Single database license, reduced operational overhead
GraphRAG Integration: AI-powered semantic search on document content

1.3 Wire Protocol Compatibility¶

HeliosDB implements the MongoDB wire protocol (OP_MSG), allowing existing MongoDB applications to connect without code changes:

# Existing MongoDB code works unchanged
from pymongo import MongoClient

# Simply change the hostname
client = MongoClient("mongodb://heliosdb-server:27017/")
db = client.my_database
collection = db.my_collection

# All operations work as expected
result = collection.find({"status": "active"})

2. Compatibility Overview¶

2.1 Wire Protocol Support¶

Protocol Feature	Status	Notes
OP_MSG	100%	Modern protocol (MongoDB 3.6+)
OP_QUERY	100%	Legacy support
OP_INSERT/UPDATE/DELETE	100%	Legacy operations
BSON Encoding	100%	All 19 BSON types
SCRAM-SHA-256	100%	Recommended authentication
TLS/SSL	100%	Encrypted connections

2.2 CRUD Operations¶

Operation	Compatibility	Notes
insertOne/insertMany	100%	Full support
find/findOne	100%	All query operators
updateOne/updateMany	100%	All update operators
deleteOne/deleteMany	100%	Full support
replaceOne	100%	Document replacement
bulkWrite	100%	Ordered and unordered

2.3 Query Operators¶

Category	Coverage	Examples
Comparison	100%	$eq, $gt, $lt, $in, $nin
Logical	100%	$and, $or, $not, $nor
Element	100%	$exists, $type
Evaluation	100%	$regex, $text, $expr
Array	100%	$all, $elemMatch, $size
Geospatial	100%	$near, $geoWithin
Bitwise	100%	$bitsAllSet, $bitsAnyClear

2.4 Aggregation Pipeline¶

Stage	Status	Notes
$match	Supported	Filter documents
$project	Supported	Reshape documents
$group	Supported	Group with accumulators
$sort	Supported	Multi-field sorting
$lookup	Supported	Left outer join
$graphLookup	Supported	Recursive lookup
$unwind	Supported	Deconstruct arrays
$facet	Supported	Multi-faceted results
$setWindowFields	Supported	Window functions
$merge	Supported	Write to collection

2.5 Known Limitations¶

Feature	Status	Alternative
GridFS	Not supported	Use blob storage or S3 integration
Capped collections	Not supported	Use TTL indexes
Server-side JavaScript	Not supported	Use SQL procedures
Map-reduce	Deprecated	Use aggregation pipeline

3. Pre-Migration Checklist¶

3.1 Assessment Phase¶

[ ] Inventory databases and collections

# List all databases
mongosh --eval "db.adminCommand('listDatabases')"

# List collections in each database
mongosh mydb --eval "db.getCollectionNames()"

[ ] Document data volume

// Get collection statistics
db.collection.stats()

// Count documents
db.collection.countDocuments({})

[ ] Identify GridFS usage

// Check for GridFS collections
db.getCollectionNames().filter(n => n.endsWith('.files') || n.endsWith('.chunks'))

[ ] Review index definitions

// List all indexes
db.collection.getIndexes()

[ ] Analyze query patterns

// Enable profiler
db.setProfilingLevel(1, { slowms: 100 })

// Review slow queries
db.system.profile.find().sort({ ts: -1 }).limit(20)

3.2 Technical Requirements¶

[ ] HeliosDB version: 7.0 or later recommended
[ ] Storage capacity: 1.5x current MongoDB storage (for import headroom)
[ ] Network connectivity: MongoDB port 27017 accessible
[ ] Authentication: Prepare HeliosDB credentials
[ ] Backup verified: Recent MongoDB backup tested for restore

3.3 Application Assessment¶

[ ] Driver versions: Document current MongoDB driver versions
[ ] Connection strings: Inventory all application connection strings
[ ] Custom code: Review any server-side JavaScript or map-reduce usage
[ ] Change stream consumers: Document change stream applications
[ ] Test environment: Prepare isolated test environment

3.4 Risk Assessment¶

Risk	Mitigation
Data loss	Comprehensive backup before migration
Application downtime	Plan maintenance window or dual-write strategy
Query incompatibility	Pre-test all application queries
Performance regression	Benchmark before and after

4. Conceptual Mapping¶

4.1 Collections to Tables¶

MongoDB collections map to HeliosDB tables with JSONB storage:

MongoDB Collection:

// MongoDB: users collection
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "age": 30
  },
  "tags": ["developer", "mongodb"]
}

HeliosDB Table (Internal Representation):

-- Automatic table structure created by MongoDB protocol
CREATE TABLE users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB NOT NULL
);

-- The document is stored in the 'data' JSONB column
-- _id is extracted and indexed for fast lookups

Hybrid Schema (Optional Optimization):

-- For frequently queried fields, create hybrid schema
CREATE TABLE users (
    _id UUID PRIMARY KEY,
    username VARCHAR(100),
    email VARCHAR(255),
    profile JSONB,
    tags TEXT[],
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create indexes on extracted fields
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tags ON users USING GIN(tags);

4.2 Documents to Rows with JSONB¶

MongoDB Concept	HeliosDB Equivalent
Document	Row with JSONB column
Embedded document	Nested JSONB object
Array	JSONB array or PostgreSQL array
ObjectId	UUID (auto-converted)
ISODate	TIMESTAMP/TIMESTAMPTZ
NumberLong	BIGINT
NumberDecimal	NUMERIC
BinData	BYTEA

Type Conversion Examples:

// MongoDB document with various types
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "price": NumberDecimal("19.99"),
  "quantity": NumberLong(100),
  "created": ISODate("2025-01-01T00:00:00Z"),
  "metadata": BinData(0, "base64data=="),
  "location": {
    "type": "Point",
    "coordinates": [-73.97, 40.77]
  }
}

-- HeliosDB JSONB representation
{
  "_id": "507f1f77-bcf8-6cd7-9943-9011507f1f77",
  "price": 19.99,
  "quantity": 100,
  "created": "2025-01-01T00:00:00Z",
  "metadata": "\\x626173653634646174613d3d",
  "location": {
    "type": "Point",
    "coordinates": [-73.97, 40.77]
  }
}

4.3 Indexes¶

MongoDB Index	HeliosDB Equivalent
Single field	B-tree index on JSONB path
Compound	Composite B-tree index
Unique	Unique constraint/index
TTL	TTL extension on JSONB
Text	GIN index with full-text search
2dsphere	GiST geospatial index
Hashed	Hash index
Wildcard	GIN index on JSONB

Index Migration Examples:

// MongoDB indexes
db.users.createIndex({ "email": 1 }, { unique: true })
db.users.createIndex({ "profile.age": 1, "status": 1 })
db.users.createIndex({ "location": "2dsphere" })
db.users.createIndex({ "content": "text", "title": "text" })
db.users.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })

-- HeliosDB equivalent indexes
CREATE UNIQUE INDEX idx_users_email ON users ((data->>'email'));
CREATE INDEX idx_users_age_status ON users ((data->'profile'->>'age'), (data->>'status'));
CREATE INDEX idx_users_location ON users USING GIST ((data->'location'));
CREATE INDEX idx_users_content ON users USING GIN (to_tsvector('english',
    COALESCE(data->>'content', '') || ' ' || COALESCE(data->>'title', '')));
-- TTL implemented via background job or HeliosDB TTL extension

4.4 Database and Collection Namespaces¶

MongoDB	HeliosDB
Database	Schema
Collection	Table
mydb.users	mydb.users (schema.table)
admin	heliosdb_admin (system schema)

5. Step-by-Step Migration Process¶

5.1 Schema Design in HeliosDB¶

Option A: Pure Document Mode (Simplest)¶

-- HeliosDB automatically creates this structure
-- when accessed via MongoDB protocol
CREATE SCHEMA IF NOT EXISTS mydb;

CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB NOT NULL
);

CREATE INDEX idx_users_data ON mydb.users USING GIN (data jsonb_path_ops);

Option B: Hybrid Schema (Recommended for Performance)¶

-- Design schema with frequently queried fields extracted
CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    profile JSONB,
    tags TEXT[],
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes for common query patterns
CREATE INDEX idx_users_status ON mydb.users(status);
CREATE INDEX idx_users_created ON mydb.users(created_at DESC);
CREATE INDEX idx_users_profile ON mydb.users USING GIN (profile);
CREATE INDEX idx_users_tags ON mydb.users USING GIN (tags);

Option C: Full Relational Conversion (Best for Analytics)¶

-- Normalize embedded documents into separate tables
CREATE TABLE mydb.users (
    _id UUID PRIMARY KEY,
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE mydb.user_profiles (
    user_id UUID PRIMARY KEY REFERENCES mydb.users(_id),
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    age INTEGER,
    bio TEXT
);

CREATE TABLE mydb.user_tags (
    id SERIAL PRIMARY KEY,
    user_id UUID REFERENCES mydb.users(_id),
    tag VARCHAR(50)
);

CREATE INDEX idx_user_tags_user ON mydb.user_tags(user_id);
CREATE INDEX idx_user_tags_tag ON mydb.user_tags(tag);

5.2 Data Export Using mongodump/mongoexport¶

Method 1: mongodump (Binary, Recommended for Large Datasets)¶

# Export entire database
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --out=/backup/mongodb \
  --gzip

# Export specific collection
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --out=/backup/mongodb \
  --gzip

# Export with query filter (incremental)
mongodump \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=orders \
  --query='{"createdAt": {"$gte": {"$date": "2025-01-01T00:00:00Z"}}}' \
  --out=/backup/mongodb

Method 2: mongoexport (JSON/CSV, Good for Transformation)¶

# Export to JSON (extended JSON format)
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --out=users.json \
  --jsonArray

# Export to CSV (for relational import)
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=users \
  --type=csv \
  --fields=_id,username,email,profile.firstName,profile.lastName \
  --out=users.csv

# Export with query filter
mongoexport \
  --uri="mongodb://source-mongodb:27017/mydb" \
  --collection=orders \
  --query='{"status": "completed"}' \
  --out=completed_orders.json

5.3 Data Transformation and Import¶

Direct Import via mongorestore (Fastest)¶

# Import directly to HeliosDB (uses MongoDB wire protocol)
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --gzip \
  /backup/mongodb/mydb

# Import specific collection
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --collection=users \
  --gzip \
  /backup/mongodb/mydb/users.bson.gz

# Import with options
mongorestore \
  --uri="mongodb://heliosdb-server:27017/mydb" \
  --drop \
  --numInsertionWorkers=4 \
  --batchSize=1000 \
  /backup/mongodb/mydb

JSON Import via SQL COPY¶

# Convert MongoDB extended JSON to standard JSON
cat users.json | jq -c '.[] | {_id: ._id."$oid", data: .}' > users_transformed.json

# Import into HeliosDB using COPY
psql -h heliosdb-server -U admin -d mydb -c "
COPY users (data) FROM STDIN WITH (FORMAT json);
" < users_transformed.json

Python Migration Script (Complex Transformations)¶

#!/usr/bin/env python3
"""
MongoDB to HeliosDB Migration Script
Handles complex data transformations during migration
"""

from pymongo import MongoClient
from datetime import datetime
import uuid
import json

# Source MongoDB
source_client = MongoClient("mongodb://source-mongodb:27017/")
source_db = source_client.mydb

# Target HeliosDB (via MongoDB protocol)
target_client = MongoClient("mongodb://heliosdb-server:27017/")
target_db = target_client.mydb

def convert_objectid(oid):
    """Convert MongoDB ObjectId to UUID format"""
    hex_str = str(oid)
    # Pad to 32 characters for UUID
    padded = hex_str.ljust(32, '0')
    return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
                     padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

def transform_document(doc):
    """Transform document for HeliosDB compatibility"""
    transformed = {}

    for key, value in doc.items():
        if key == '_id':
            # Convert ObjectId to UUID string
            transformed['_id'] = str(convert_objectid(value))
        elif isinstance(value, datetime):
            # Convert datetime to ISO string
            transformed[key] = value.isoformat()
        elif isinstance(value, dict):
            # Recursively transform nested documents
            transformed[key] = transform_document(value)
        elif isinstance(value, list):
            # Transform array elements
            transformed[key] = [
                transform_document(v) if isinstance(v, dict) else v
                for v in value
            ]
        else:
            transformed[key] = value

    return transformed

def migrate_collection(collection_name, batch_size=1000):
    """Migrate a single collection with batching"""
    source_collection = source_db[collection_name]
    target_collection = target_db[collection_name]

    total = source_collection.count_documents({})
    migrated = 0

    print(f"Migrating {collection_name}: {total} documents")

    cursor = source_collection.find().batch_size(batch_size)
    batch = []

    for doc in cursor:
        transformed = transform_document(doc)
        batch.append(transformed)

        if len(batch) >= batch_size:
            target_collection.insert_many(batch, ordered=False)
            migrated += len(batch)
            print(f"  Progress: {migrated}/{total} ({100*migrated/total:.1f}%)")
            batch = []

    # Insert remaining documents
    if batch:
        target_collection.insert_many(batch, ordered=False)
        migrated += len(batch)

    print(f"  Completed: {migrated} documents migrated")
    return migrated

def migrate_indexes(collection_name):
    """Recreate indexes in target collection"""
    source_collection = source_db[collection_name]
    target_collection = target_db[collection_name]

    indexes = source_collection.index_information()

    for name, info in indexes.items():
        if name == '_id_':
            continue  # Skip default _id index

        keys = info['key']
        options = {k: v for k, v in info.items()
                   if k not in ('key', 'v', 'ns')}

        try:
            target_collection.create_index(keys, name=name, **options)
            print(f"  Created index: {name}")
        except Exception as e:
            print(f"  Warning: Could not create index {name}: {e}")

def main():
    collections = source_db.list_collection_names()

    # Filter out system collections
    collections = [c for c in collections
                   if not c.startswith('system.')]

    print(f"Found {len(collections)} collections to migrate")

    for collection_name in collections:
        migrate_collection(collection_name)
        migrate_indexes(collection_name)

    print("\nMigration complete!")

if __name__ == "__main__":
    main()

5.4 Application Connection Changes¶

Python (PyMongo)¶

# Before: MongoDB
from pymongo import MongoClient
client = MongoClient("mongodb://mongodb-server:27017/")

# After: HeliosDB (same code, different hostname)
from pymongo import MongoClient
client = MongoClient("mongodb://heliosdb-server:27017/")

# With authentication
client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    username="myuser",
    password="mypassword",
    authSource="admin",
    authMechanism="SCRAM-SHA-256"
)

# With connection pooling and options
client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    maxPoolSize=50,
    minPoolSize=10,
    maxIdleTimeMS=30000,
    serverSelectionTimeoutMS=5000
)

Node.js (mongodb driver)¶

// Before: MongoDB
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://mongodb-server:27017/');

// After: HeliosDB
const client = new MongoClient('mongodb://heliosdb-server:27017/', {
    auth: {
        username: 'myuser',
        password: 'mypassword'
    },
    authSource: 'admin',
    maxPoolSize: 50
});

Java (MongoDB Driver)¶

// Before: MongoDB
MongoClient client = MongoClients.create("mongodb://mongodb-server:27017/");

// After: HeliosDB
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString("mongodb://heliosdb-server:27017/"))
    .credential(MongoCredential.createScramSha256Credential(
        "myuser", "admin", "mypassword".toCharArray()))
    .build();
MongoClient client = MongoClients.create(settings);

Environment-Based Configuration¶

import os

# Use environment variable for easy switching
MONGODB_URI = os.environ.get(
    "MONGODB_URI",
    "mongodb://heliosdb-server:27017/"
)

client = MongoClient(MONGODB_URI)

6. Query Translation Examples¶

6.1 find() to SELECT¶

Basic Find¶

// MongoDB
db.users.find({ status: "active" })

-- HeliosDB SQL equivalent
SELECT * FROM users WHERE data->>'status' = 'active';

-- Or with hybrid schema
SELECT * FROM users WHERE status = 'active';

Find with Projection¶

// MongoDB
db.users.find(
    { status: "active" },
    { username: 1, email: 1, _id: 0 }
)

-- HeliosDB SQL
SELECT
    data->>'username' AS username,
    data->>'email' AS email
FROM users
WHERE data->>'status' = 'active';

Find with Operators¶

// MongoDB: comparison operators
db.orders.find({
    amount: { $gte: 100, $lte: 500 },
    status: { $in: ["pending", "processing"] }
})

-- HeliosDB SQL
SELECT * FROM orders
WHERE (data->>'amount')::numeric BETWEEN 100 AND 500
  AND data->>'status' IN ('pending', 'processing');

Find with Nested Fields¶

// MongoDB: nested document query
db.users.find({ "profile.age": { $gte: 21 } })

-- HeliosDB SQL
SELECT * FROM users
WHERE (data->'profile'->>'age')::integer >= 21;

-- Using jsonpath (more efficient)
SELECT * FROM users
WHERE data @@ '$.profile.age >= 21';

Find with Arrays¶

// MongoDB: array queries
db.users.find({ tags: "mongodb" })  // contains
db.users.find({ tags: { $all: ["mongodb", "developer"] } })  // contains all
db.users.find({ "items.price": { $gt: 100 } })  // array element match

-- HeliosDB SQL: array contains
SELECT * FROM users WHERE data->'tags' ? 'mongodb';

-- Contains all
SELECT * FROM users
WHERE data->'tags' ?& array['mongodb', 'developer'];

-- Array element match
SELECT * FROM users
WHERE EXISTS (
    SELECT 1 FROM jsonb_array_elements(data->'items') AS item
    WHERE (item->>'price')::numeric > 100
);

6.2 aggregate() to SQL with JSON Functions¶

Group and Count¶

// MongoDB aggregation
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $group: {
        _id: "$category",
        total: { $sum: "$amount" },
        count: { $sum: 1 },
        avgAmount: { $avg: "$amount" }
    }},
    { $sort: { total: -1 } },
    { $limit: 10 }
])

-- HeliosDB SQL
SELECT
    data->>'category' AS category,
    SUM((data->>'amount')::numeric) AS total,
    COUNT(*) AS count,
    AVG((data->>'amount')::numeric) AS avg_amount
FROM orders
WHERE data->>'status' = 'completed'
GROUP BY data->>'category'
ORDER BY total DESC
LIMIT 10;

Lookup (Join)¶

// MongoDB $lookup
db.orders.aggregate([
    { $lookup: {
        from: "customers",
        localField: "customer_id",
        foreignField: "_id",
        as: "customer"
    }},
    { $unwind: "$customer" }
])

-- HeliosDB SQL (much simpler with native JOINs)
SELECT
    o.*,
    c.data AS customer
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;

-- Or with JSONB aggregation to mimic $unwind result
SELECT
    o.data || jsonb_build_object('customer', c.data) AS document
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;

Faceted Search¶

// MongoDB $facet
db.products.aggregate([
    { $facet: {
        byCategory: [
            { $group: { _id: "$category", count: { $sum: 1 } } }
        ],
        priceRange: [
            { $bucket: {
                groupBy: "$price",
                boundaries: [0, 50, 100, 500],
                default: "Other"
            }}
        ],
        totalCount: [
            { $count: "count" }
        ]
    }}
])

-- HeliosDB SQL: Use separate queries or WITH clause
WITH
by_category AS (
    SELECT data->>'category' AS category, COUNT(*) AS count
    FROM products
    GROUP BY data->>'category'
),
price_range AS (
    SELECT
        CASE
            WHEN (data->>'price')::numeric < 50 THEN '0-50'
            WHEN (data->>'price')::numeric < 100 THEN '50-100'
            WHEN (data->>'price')::numeric < 500 THEN '100-500'
            ELSE 'Other'
        END AS range,
        COUNT(*) AS count
    FROM products
    GROUP BY 1
),
total AS (
    SELECT COUNT(*) AS count FROM products
)
SELECT
    jsonb_build_object(
        'byCategory', (SELECT jsonb_agg(row_to_json(by_category)) FROM by_category),
        'priceRange', (SELECT jsonb_agg(row_to_json(price_range)) FROM price_range),
        'totalCount', (SELECT count FROM total)
    ) AS facets;

6.3 update() to UPDATE with JSONB Operators¶

Basic Update¶

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $set: { status: "active" } }
)

-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(data, '{status}', '"active"')
WHERE _id = '...'::uuid;

-- With hybrid schema
UPDATE users SET status = 'active' WHERE _id = '...'::uuid;

Increment¶

// MongoDB
db.products.updateMany(
    { category: "electronics" },
    { $inc: { viewCount: 1, stock: -1 } }
)

-- HeliosDB SQL
UPDATE products
SET data = data || jsonb_build_object(
    'viewCount', COALESCE((data->>'viewCount')::integer, 0) + 1,
    'stock', COALESCE((data->>'stock')::integer, 0) - 1
)
WHERE data->>'category' = 'electronics';

Array Push¶

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $push: { tags: "premium" } }
)

-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
    data,
    '{tags}',
    COALESCE(data->'tags', '[]'::jsonb) || '"premium"'::jsonb
)
WHERE _id = '...'::uuid;

Array Pull¶

// MongoDB
db.users.updateOne(
    { _id: ObjectId("...") },
    { $pull: { tags: "temporary" } }
)

-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
    data,
    '{tags}',
    (SELECT jsonb_agg(elem)
     FROM jsonb_array_elements(data->'tags') AS elem
     WHERE elem != '"temporary"')
)
WHERE _id = '...'::uuid;

Upsert¶

// MongoDB
db.users.updateOne(
    { email: "new@example.com" },
    { $set: { name: "New User", createdAt: new Date() } },
    { upsert: true }
)

-- HeliosDB SQL
INSERT INTO users (data)
VALUES ('{"email": "new@example.com", "name": "New User", "createdAt": "..."}'::jsonb)
ON CONFLICT ((data->>'email'))
DO UPDATE SET data = users.data || EXCLUDED.data;

7. Feature Mapping¶

7.1 Aggregation Pipeline Equivalents¶

MongoDB Stage	HeliosDB SQL Equivalent
$match	WHERE clause
$project	SELECT with JSON functions
$group	GROUP BY with aggregates
$sort	ORDER BY
$limit	LIMIT
$skip	OFFSET
$unwind	LATERAL jsonb_array_elements()
$lookup	JOIN
$graphLookup	WITH RECURSIVE
$bucket	CASE WHEN in GROUP BY
$facet	Multiple CTEs
$setWindowFields	Window functions

7.2 Change Streams vs HeliosDB CDC¶

Feature	MongoDB Change Streams	HeliosDB CDC
Protocol	Native MongoDB	Native + Debezium-compatible
Latency	~100ms	<10ms
Resume	Token-based	LSN-based (more reliable)
Filtering	Pipeline stages	SQL predicates
Output	BSON	JSON, Avro, Protobuf
Destinations	Application code	Kafka, Webhooks, S3, etc.

MongoDB Change Stream:

const pipeline = [
    { $match: { operationType: "insert" } }
];

const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
    console.log(change);
});

HeliosDB CDC (via MongoDB protocol - identical API):

// Same API works with HeliosDB
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
    console.log(change);
});

HeliosDB CDC (direct SQL subscription):

-- Create CDC subscription
CREATE PUBLICATION users_changes FOR TABLE users;

-- Subscribe via logical replication
CREATE SUBSCRIPTION users_sub
    CONNECTION 'host=heliosdb-server dbname=mydb'
    PUBLICATION users_changes;

HeliosDB Webhook CDC:

-- Configure webhook notification
SELECT helios_cdc.create_webhook(
    'users',
    'https://myapp.com/webhooks/user-changes',
    '{"events": ["INSERT", "UPDATE", "DELETE"]}'
);

7.3 Geospatial Queries¶

MongoDB	HeliosDB
2dsphere index	GiST index with geography
$near	ST_DWithin + ORDER BY ST_Distance
$geoWithin	ST_Within
$geoIntersects	ST_Intersects
GeoJSON	Native GeoJSON support

MongoDB Geospatial:

// Create 2dsphere index
db.places.createIndex({ location: "2dsphere" })

// Find near point
db.places.find({
    location: {
        $near: {
            $geometry: { type: "Point", coordinates: [-73.97, 40.77] },
            $maxDistance: 1000
        }
    }
})

// Find within polygon
db.places.find({
    location: {
        $geoWithin: {
            $geometry: {
                type: "Polygon",
                coordinates: [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
            }
        }
    }
})

HeliosDB SQL Geospatial:

-- Create spatial index
CREATE INDEX idx_places_location ON places
    USING GIST (ST_GeomFromGeoJSON(data->>'location'));

-- Find near point (within 1000 meters)
SELECT * FROM places
WHERE ST_DWithin(
    ST_GeomFromGeoJSON(data->>'location')::geography,
    ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography,
    1000
)
ORDER BY ST_Distance(
    ST_GeomFromGeoJSON(data->>'location')::geography,
    ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography
);

-- Find within polygon
SELECT * FROM places
WHERE ST_Within(
    ST_GeomFromGeoJSON(data->>'location'),
    ST_GeomFromGeoJSON('{
        "type": "Polygon",
        "coordinates": [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
    }')
);

7.4 Full-Text Search¶

MongoDB	HeliosDB
Text index	GIN index with tsvector
$text search	to_tsvector/to_tsquery
Score sorting	ts_rank
Language support	30+ languages
Stemming	Built-in

MongoDB Text Search:

// Create text index
db.articles.createIndex({ title: "text", content: "text" })

// Search
db.articles.find(
    { $text: { $search: "mongodb database" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

HeliosDB Full-Text Search:

-- Create full-text index
CREATE INDEX idx_articles_fts ON articles
    USING GIN (to_tsvector('english',
        COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')));

-- Search with ranking
SELECT
    *,
    ts_rank(
        to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')),
        plainto_tsquery('english', 'mongodb database')
    ) AS score
FROM articles
WHERE to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', ''))
    @@ plainto_tsquery('english', 'mongodb database')
ORDER BY score DESC;

HeliosDB GraphRAG Semantic Search:

-- Enable semantic search (AI-powered)
SELECT helios_graphrag.enable_semantic_search('articles', 'content');

-- Semantic search
SELECT * FROM helios_graphrag.semantic_search(
    'articles',
    'How do document databases handle transactions?',
    10  -- top 10 results
);

8. Post-Migration Validation¶

8.1 Data Integrity Checks¶

// Compare document counts
// Source MongoDB
const sourceCount = db.users.countDocuments({});
console.log(`Source count: ${sourceCount}`);

// Target HeliosDB
const targetCount = db.users.countDocuments({});
console.log(`Target count: ${targetCount}`);

// Verify counts match
if (sourceCount !== targetCount) {
    console.error(`Count mismatch! Source: ${sourceCount}, Target: ${targetCount}`);
}

def validate_migration(source_db, target_db, collection_name, sample_size=1000):
    """Validate migrated data integrity"""
    source = source_db[collection_name]
    target = target_db[collection_name]

    # Count validation
    source_count = source.count_documents({})
    target_count = target.count_documents({})

    if source_count != target_count:
        print(f"ERROR: Count mismatch - Source: {source_count}, Target: {target_count}")
        return False

    # Sample validation
    sample = list(source.aggregate([{"$sample": {"size": sample_size}}]))

    for doc in sample:
        target_doc = target.find_one({"_id": doc["_id"]})
        if target_doc is None:
            print(f"ERROR: Document {doc['_id']} not found in target")
            return False

        # Compare key fields (adjust based on your schema)
        for key in ["username", "email", "status"]:
            if doc.get(key) != target_doc.get(key):
                print(f"ERROR: Field {key} mismatch for {doc['_id']}")
                return False

    print(f"Validation passed: {source_count} documents verified")
    return True

8.2 Query Validation¶

def validate_queries(source_db, target_db, collection_name):
    """Run test queries on both databases and compare results"""

    test_queries = [
        # Basic find
        {"status": "active"},
        # Range query
        {"created_at": {"$gte": datetime(2025, 1, 1)}},
        # Nested field
        {"profile.age": {"$gte": 21}},
        # Array query
        {"tags": {"$in": ["premium", "vip"]}},
    ]

    source = source_db[collection_name]
    target = target_db[collection_name]

    for query in test_queries:
        source_results = list(source.find(query).limit(100))
        target_results = list(target.find(query).limit(100))

        if len(source_results) != len(target_results):
            print(f"Query result count mismatch for {query}")
            print(f"  Source: {len(source_results)}, Target: {len(target_results)}")
        else:
            print(f"Query {query}: OK ({len(source_results)} results)")

8.3 Index Validation¶

// Compare indexes
function compareIndexes(sourceDb, targetDb, collectionName) {
    const sourceIndexes = sourceDb[collectionName].getIndexes();
    const targetIndexes = targetDb[collectionName].getIndexes();

    const sourceNames = sourceIndexes.map(i => i.name).sort();
    const targetNames = targetIndexes.map(i => i.name).sort();

    console.log("Source indexes:", sourceNames);
    console.log("Target indexes:", targetNames);

    // Check for missing indexes
    for (const name of sourceNames) {
        if (!targetNames.includes(name)) {
            console.log(`WARNING: Index ${name} missing in target`);
        }
    }
}

8.4 Performance Validation¶

import time

def benchmark_queries(db, collection_name, iterations=100):
    """Benchmark common query patterns"""
    collection = db[collection_name]

    benchmarks = {
        "find_by_id": lambda: collection.find_one({"_id": sample_id}),
        "find_by_index": lambda: list(collection.find({"status": "active"}).limit(10)),
        "aggregation": lambda: list(collection.aggregate([
            {"$match": {"status": "active"}},
            {"$group": {"_id": "$category", "count": {"$sum": 1}}}
        ])),
        "text_search": lambda: list(collection.find(
            {"$text": {"$search": "search terms"}}
        ).limit(10)),
    }

    results = {}
    for name, query_func in benchmarks.items():
        times = []
        for _ in range(iterations):
            start = time.time()
            query_func()
            times.append(time.time() - start)

        results[name] = {
            "avg_ms": sum(times) / len(times) * 1000,
            "min_ms": min(times) * 1000,
            "max_ms": max(times) * 1000
        }
        print(f"{name}: avg={results[name]['avg_ms']:.2f}ms")

    return results

8.5 Application Testing Checklist¶

[ ] Unit tests pass against HeliosDB
[ ] Integration tests pass with new connection strings
[ ] CRUD operations work correctly
[ ] Aggregation pipelines return expected results
[ ] Change streams receive events
[ ] Transactions commit and rollback properly
[ ] Authentication works with all user accounts
[ ] Connection pooling handles load correctly
[ ] Error handling catches expected exceptions

9. Performance Considerations¶

9.1 Index Optimization¶

-- Analyze query patterns
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';

-- Create expression indexes for frequent queries
CREATE INDEX idx_users_status ON users ((data->>'status'));
CREATE INDEX idx_users_email ON users ((data->>'email'));

-- Create GIN index for flexible JSON queries
CREATE INDEX idx_users_data ON users USING GIN (data jsonb_path_ops);

-- For range queries on numeric fields
CREATE INDEX idx_orders_amount ON orders (((data->>'amount')::numeric));

-- Composite index for multi-field queries
CREATE INDEX idx_orders_status_date ON orders (
    (data->>'status'),
    ((data->>'created_at')::timestamptz)
);

9.2 Query Optimization¶

-- Use JSONB containment for exact matches (uses GIN index)
SELECT * FROM users WHERE data @> '{"status": "active"}';

-- Use jsonpath for complex queries
SELECT * FROM users
WHERE data @@ '$.profile.age > 21 && $.tags[*] == "premium"';

-- Avoid full table scans with proper indexes
-- BAD: No index usage
SELECT * FROM users WHERE data->>'name' ILIKE '%john%';

-- GOOD: Use full-text search
SELECT * FROM users
WHERE to_tsvector('english', data->>'name') @@ to_tsquery('john');

9.3 Bulk Operations¶

# Batch inserts for better performance
def bulk_insert(collection, documents, batch_size=1000):
    """Insert documents in batches for optimal performance"""
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        collection.insert_many(batch, ordered=False)
        print(f"Inserted batch {i//batch_size + 1}")

-- Use COPY for large data imports (SQL mode)
COPY users (data) FROM '/path/to/data.json' WITH (FORMAT json);

-- Disable indexes during bulk load, rebuild after
ALTER INDEX idx_users_data DISABLE;
-- ... bulk insert ...
REINDEX INDEX idx_users_data;

9.4 Connection Pooling¶

# Configure connection pooling
from pymongo import MongoClient

client = MongoClient(
    "mongodb://heliosdb-server:27017/",
    maxPoolSize=100,      # Maximum connections
    minPoolSize=10,       # Minimum connections to maintain
    maxIdleTimeMS=30000,  # Close idle connections after 30s
    waitQueueTimeoutMS=5000,  # Timeout waiting for connection
    socketTimeoutMS=30000,    # Socket operation timeout
    serverSelectionTimeoutMS=5000  # Server selection timeout
)

9.5 Memory and Cache Configuration¶

-- Configure HeliosDB for document workloads
ALTER SYSTEM SET work_mem = '256MB';  -- For complex aggregations
ALTER SYSTEM SET shared_buffers = '8GB';  -- For caching
ALTER SYSTEM SET effective_cache_size = '24GB';  -- Query planner hint

-- Enable intelligent caching for hot collections
ALTER TABLE users ENABLE INTELLIGENT_CACHING;

10. Common Issues and Troubleshooting¶

10.1 Connection Issues¶

Problem: Connection refused

pymongo.errors.ServerSelectionTimeoutError: heliosdb-server:27017: [Errno 111] Connection refused

Solution:

# Check if HeliosDB is listening on MongoDB port
netstat -tlnp | grep 27017

# Verify MongoDB protocol is enabled
psql -c "SHOW helios.mongodb_port;"

# Check firewall rules
sudo firewall-cmd --list-ports | grep 27017

Problem: Authentication failed

pymongo.errors.OperationFailure: Authentication failed

Solution:

# Ensure correct auth mechanism
client = MongoClient(
    "mongodb://user:password@heliosdb-server:27017/",
    authMechanism="SCRAM-SHA-256",  # Ensure this matches server config
    authSource="admin"
)

10.2 Query Compatibility Issues¶

Problem: Unsupported operator

pymongo.errors.OperationFailure: Unknown operator: $customOp

Solution:

# Check compatibility matrix, use alternative
# Instead of custom operators, use $expr with aggregation expressions

# Instead of
db.collection.find({"$customOp": value})

# Use
db.collection.find({"$expr": {"$eq": ["$field", value]}})

Problem: Query returns different results

Solution:

# Check for type mismatches
# MongoDB may have stored numbers as strings

# Debug: Check actual stored type
doc = collection.find_one()
print(type(doc.get('amount')))  # Check if string vs number

# Fix: Ensure consistent types during migration
if isinstance(doc['amount'], str):
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"amount": float(doc['amount'])}}
    )

10.3 Performance Issues¶

Problem: Slow queries after migration

Solution:

-- Check if indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Analyze table statistics
ANALYZE users;

-- Check query plan
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';

-- Create missing indexes
CREATE INDEX CONCURRENTLY idx_users_status ON users ((data->>'status'));

Problem: High memory usage

Solution:

-- Check for large documents
SELECT pg_size_pretty(pg_total_relation_size('users')) AS total_size;

-- Find large documents
SELECT _id, pg_column_size(data) AS size
FROM users
ORDER BY size DESC
LIMIT 10;

-- Consider splitting large documents or using hybrid schema

10.4 Data Type Issues¶

Problem: ObjectId conversion errors

Solution:

from bson import ObjectId
import uuid

def objectid_to_uuid(oid):
    """Convert MongoDB ObjectId to UUID"""
    hex_str = str(oid)
    # Pad to 32 hex characters
    padded = hex_str.ljust(32, '0')
    return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
                     padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])

Problem: Date/time precision loss

Solution:

# MongoDB stores dates as milliseconds since epoch
# Ensure proper conversion

from datetime import datetime

def convert_date(mongo_date):
    """Preserve date precision during migration"""
    if isinstance(mongo_date, datetime):
        return mongo_date.isoformat()
    return mongo_date

10.5 Change Stream Issues¶

Problem: Change stream not receiving events

Solution:

# Ensure collection has proper CDC enabled
# Check via SQL
# SELECT * FROM pg_publication_tables WHERE pubname = 'mongodb_cdc';

# Restart change stream with explicit options
change_stream = collection.watch(
    full_document='updateLookup',  # Include full document on updates
    max_await_time_ms=1000
)

10.6 Transaction Issues¶

Problem: Transaction aborted unexpectedly

Solution:

# Increase transaction timeout
with client.start_session() as session:
    with session.start_transaction(
        max_commit_time_ms=30000,  # 30 second timeout
        read_concern=ReadConcern("snapshot"),
        write_concern=WriteConcern("majority")
    ):
        # Transaction operations
        pass

10.7 Debugging Tools¶

# Enable MongoDB driver debugging
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pymongo')
logger.setLevel(logging.DEBUG)

-- Enable HeliosDB query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 100;  -- Log queries > 100ms
SELECT pg_reload_conf();

-- View recent logs
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

Appendix A: Quick Reference Card¶

Connection String Mapping¶

MongoDB	HeliosDB
`mongodb://host:27017/`	`mongodb://host:27017/`
`mongodb+srv://cluster/`	`mongodb://heliosdb:27017/`
`authSource=admin`	`authSource=admin`
`replicaSet=rs0`	Not needed (built-in HA)

Common Query Translation¶

MongoDB	HeliosDB SQL
`find({})`	`SELECT * FROM collection`
`find({a: 1})`	`WHERE data->>'a' = '1'`
`find({a: {$gt: 1}})`	`WHERE (data->>'a')::int > 1`
`find({a: {$in: [1,2]}})`	`WHERE data->>'a' IN ('1','2')`
`updateOne({}, {$set: {a: 1}})`	`UPDATE SET data = jsonb_set(data, '{a}', '1')`
`aggregate([{$group: ...}])`	`SELECT ... GROUP BY ...`

Useful HeliosDB JSON Functions¶

Function	Purpose
`data->>'field'`	Extract field as text
`data->'field'`	Extract field as JSONB
`data @> '{}'`	JSONB containment
`data ? 'key'`	Key exists
`jsonb_set()`	Set nested value
`jsonb_array_elements()`	Unwind array

Appendix B: Migration Checklist Summary¶

Pre-Migration¶

[ ] Database inventory completed
[ ] Data volume estimated
[ ] GridFS usage identified and alternative planned
[ ] Backup created and verified
[ ] Test environment prepared

Migration¶

[ ] Schema designed (pure document, hybrid, or relational)
[ ] Data exported from MongoDB
[ ] Data imported to HeliosDB
[ ] Indexes recreated
[ ] Connection strings updated

Post-Migration¶

[ ] Document counts verified
[ ] Sample data validated
[ ] All queries tested
[ ] Performance benchmarked
[ ] Application tests passed
[ ] Change streams working
[ ] Monitoring configured

MongoDB Protocol README - Protocol overview
MongoDB Compatibility Matrix - Feature support details
MongoDB Examples - Code examples
General Migration Guide - Cross-protocol migration
MySQL Migration Guide - SQL migration reference

Last Updated: 2026-01-04 Migration Guide Version: 1.0 HeliosDB Compatibility: 7.0+