MongoDB to HeliosDB Migration Guide¶
Version: 1.0 Last Updated: 2026-01-04
Table of Contents¶
- Introduction
- Compatibility Overview
- Pre-Migration Checklist
- Conceptual Mapping
- Step-by-Step Migration Process
- Query Translation Examples
- Feature Mapping
- Post-Migration Validation
- Performance Considerations
- Common Issues and Troubleshooting
1. Introduction¶
1.1 Why Migrate from MongoDB to HeliosDB?¶
HeliosDB offers significant advantages over MongoDB while maintaining wire protocol compatibility:
| Aspect | MongoDB | HeliosDB |
|---|---|---|
| Data Model | Document-only | Unified multi-model (Document, Relational, Graph, Vector, Time-Series) |
| Query Language | MongoDB Query Language only | MQL, SQL, GraphQL, CQL, Redis commands |
| Transactions | Snapshot isolation | SERIALIZABLE isolation with zero-maintenance MVCC |
| Joins | $lookup only | Native SQL JOINs + $lookup |
| Analytics | Limited aggregation | Full OLAP + OLTP (HTAP) |
| Full-Text Search | External (Atlas Search) | Built-in with GraphRAG integration |
| Change Streams | Standard | CDC-backed with lower latency |
| Vector Search | Atlas Vector Search | Native HNSW with multi-modal embeddings |
| Licensing | SSPL | Enterprise-friendly |
| Sharding | Manual configuration | Transparent horizontal scaling |
1.2 Migration Benefits¶
- Unified Platform: Consolidate multiple database systems into one
- SQL Access: Query MongoDB-style documents with standard SQL
- Enhanced Analytics: Run complex analytical queries on document data
- Better Transactions: ACID guarantees with serializable isolation
- Cost Reduction: Single database license, reduced operational overhead
- GraphRAG Integration: AI-powered semantic search on document content
1.3 Wire Protocol Compatibility¶
HeliosDB implements the MongoDB wire protocol (OP_MSG), allowing existing MongoDB applications to connect without code changes:
# Existing MongoDB code works unchanged
from pymongo import MongoClient
# Simply change the hostname
client = MongoClient("mongodb://heliosdb-server:27017/")
db = client.my_database
collection = db.my_collection
# All operations work as expected
result = collection.find({"status": "active"})
2. Compatibility Overview¶
2.1 Wire Protocol Support¶
| Protocol Feature | Status | Notes |
|---|---|---|
| OP_MSG | 100% | Modern protocol (MongoDB 3.6+) |
| OP_QUERY | 100% | Legacy support |
| OP_INSERT/UPDATE/DELETE | 100% | Legacy operations |
| BSON Encoding | 100% | All 19 BSON types |
| SCRAM-SHA-256 | 100% | Recommended authentication |
| TLS/SSL | 100% | Encrypted connections |
2.2 CRUD Operations¶
| Operation | Compatibility | Notes |
|---|---|---|
| insertOne/insertMany | 100% | Full support |
| find/findOne | 100% | All query operators |
| updateOne/updateMany | 100% | All update operators |
| deleteOne/deleteMany | 100% | Full support |
| replaceOne | 100% | Document replacement |
| bulkWrite | 100% | Ordered and unordered |
2.3 Query Operators¶
| Category | Coverage | Examples |
|---|---|---|
| Comparison | 100% | $eq, $gt, $lt, $in, $nin |
| Logical | 100% | $and, $or, $not, $nor |
| Element | 100% | $exists, $type |
| Evaluation | 100% | $regex, $text, $expr |
| Array | 100% | $all, $elemMatch, $size |
| Geospatial | 100% | $near, $geoWithin |
| Bitwise | 100% | $bitsAllSet, $bitsAnyClear |
2.4 Aggregation Pipeline¶
| Stage | Status | Notes |
|---|---|---|
| $match | Supported | Filter documents |
| $project | Supported | Reshape documents |
| $group | Supported | Group with accumulators |
| $sort | Supported | Multi-field sorting |
| $lookup | Supported | Left outer join |
| $graphLookup | Supported | Recursive lookup |
| $unwind | Supported | Deconstruct arrays |
| $facet | Supported | Multi-faceted results |
| $setWindowFields | Supported | Window functions |
| $merge | Supported | Write to collection |
2.5 Known Limitations¶
| Feature | Status | Alternative |
|---|---|---|
| GridFS | Not supported | Use blob storage or S3 integration |
| Capped collections | Not supported | Use TTL indexes |
| Server-side JavaScript | Not supported | Use SQL procedures |
| Map-reduce | Deprecated | Use aggregation pipeline |
3. Pre-Migration Checklist¶
3.1 Assessment Phase¶
-
[ ] Inventory databases and collections
-
[ ] Document data volume
-
[ ] Identify GridFS usage
-
[ ] Review index definitions
-
[ ] Analyze query patterns
3.2 Technical Requirements¶
- [ ] HeliosDB version: 7.0 or later recommended
- [ ] Storage capacity: 1.5x current MongoDB storage (for import headroom)
- [ ] Network connectivity: MongoDB port 27017 accessible
- [ ] Authentication: Prepare HeliosDB credentials
- [ ] Backup verified: Recent MongoDB backup tested for restore
3.3 Application Assessment¶
- [ ] Driver versions: Document current MongoDB driver versions
- [ ] Connection strings: Inventory all application connection strings
- [ ] Custom code: Review any server-side JavaScript or map-reduce usage
- [ ] Change stream consumers: Document change stream applications
- [ ] Test environment: Prepare isolated test environment
3.4 Risk Assessment¶
| Risk | Mitigation |
|---|---|
| Data loss | Comprehensive backup before migration |
| Application downtime | Plan maintenance window or dual-write strategy |
| Query incompatibility | Pre-test all application queries |
| Performance regression | Benchmark before and after |
4. Conceptual Mapping¶
4.1 Collections to Tables¶
MongoDB collections map to HeliosDB tables with JSONB storage:
MongoDB Collection:
// MongoDB: users collection
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"username": "johndoe",
"email": "john@example.com",
"profile": {
"firstName": "John",
"lastName": "Doe",
"age": 30
},
"tags": ["developer", "mongodb"]
}
HeliosDB Table (Internal Representation):
-- Automatic table structure created by MongoDB protocol
CREATE TABLE users (
_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
data JSONB NOT NULL
);
-- The document is stored in the 'data' JSONB column
-- _id is extracted and indexed for fast lookups
Hybrid Schema (Optional Optimization):
-- For frequently queried fields, create hybrid schema
CREATE TABLE users (
_id UUID PRIMARY KEY,
username VARCHAR(100),
email VARCHAR(255),
profile JSONB,
tags TEXT[],
created_at TIMESTAMP DEFAULT NOW()
);
-- Create indexes on extracted fields
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tags ON users USING GIN(tags);
4.2 Documents to Rows with JSONB¶
| MongoDB Concept | HeliosDB Equivalent |
|---|---|
| Document | Row with JSONB column |
| Embedded document | Nested JSONB object |
| Array | JSONB array or PostgreSQL array |
| ObjectId | UUID (auto-converted) |
| ISODate | TIMESTAMP/TIMESTAMPTZ |
| NumberLong | BIGINT |
| NumberDecimal | NUMERIC |
| BinData | BYTEA |
Type Conversion Examples:
// MongoDB document with various types
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"price": NumberDecimal("19.99"),
"quantity": NumberLong(100),
"created": ISODate("2025-01-01T00:00:00Z"),
"metadata": BinData(0, "base64data=="),
"location": {
"type": "Point",
"coordinates": [-73.97, 40.77]
}
}
-- HeliosDB JSONB representation
{
"_id": "507f1f77-bcf8-6cd7-9943-9011507f1f77",
"price": 19.99,
"quantity": 100,
"created": "2025-01-01T00:00:00Z",
"metadata": "\\x626173653634646174613d3d",
"location": {
"type": "Point",
"coordinates": [-73.97, 40.77]
}
}
4.3 Indexes¶
| MongoDB Index | HeliosDB Equivalent |
|---|---|
| Single field | B-tree index on JSONB path |
| Compound | Composite B-tree index |
| Unique | Unique constraint/index |
| TTL | TTL extension on JSONB |
| Text | GIN index with full-text search |
| 2dsphere | GiST geospatial index |
| Hashed | Hash index |
| Wildcard | GIN index on JSONB |
Index Migration Examples:
// MongoDB indexes
db.users.createIndex({ "email": 1 }, { unique: true })
db.users.createIndex({ "profile.age": 1, "status": 1 })
db.users.createIndex({ "location": "2dsphere" })
db.users.createIndex({ "content": "text", "title": "text" })
db.users.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })
-- HeliosDB equivalent indexes
CREATE UNIQUE INDEX idx_users_email ON users ((data->>'email'));
CREATE INDEX idx_users_age_status ON users ((data->'profile'->>'age'), (data->>'status'));
CREATE INDEX idx_users_location ON users USING GIST ((data->'location'));
CREATE INDEX idx_users_content ON users USING GIN (to_tsvector('english',
COALESCE(data->>'content', '') || ' ' || COALESCE(data->>'title', '')));
-- TTL implemented via background job or HeliosDB TTL extension
4.4 Database and Collection Namespaces¶
| MongoDB | HeliosDB |
|---|---|
| Database | Schema |
| Collection | Table |
| mydb.users | mydb.users (schema.table) |
| admin | heliosdb_admin (system schema) |
5. Step-by-Step Migration Process¶
5.1 Schema Design in HeliosDB¶
Option A: Pure Document Mode (Simplest)¶
-- HeliosDB automatically creates this structure
-- when accessed via MongoDB protocol
CREATE SCHEMA IF NOT EXISTS mydb;
CREATE TABLE mydb.users (
_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
data JSONB NOT NULL
);
CREATE INDEX idx_users_data ON mydb.users USING GIN (data jsonb_path_ops);
Option B: Hybrid Schema (Recommended for Performance)¶
-- Design schema with frequently queried fields extracted
CREATE TABLE mydb.users (
_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
status VARCHAR(20) DEFAULT 'active',
profile JSONB,
tags TEXT[],
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for common query patterns
CREATE INDEX idx_users_status ON mydb.users(status);
CREATE INDEX idx_users_created ON mydb.users(created_at DESC);
CREATE INDEX idx_users_profile ON mydb.users USING GIN (profile);
CREATE INDEX idx_users_tags ON mydb.users USING GIN (tags);
Option C: Full Relational Conversion (Best for Analytics)¶
-- Normalize embedded documents into separate tables
CREATE TABLE mydb.users (
_id UUID PRIMARY KEY,
username VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE mydb.user_profiles (
user_id UUID PRIMARY KEY REFERENCES mydb.users(_id),
first_name VARCHAR(100),
last_name VARCHAR(100),
age INTEGER,
bio TEXT
);
CREATE TABLE mydb.user_tags (
id SERIAL PRIMARY KEY,
user_id UUID REFERENCES mydb.users(_id),
tag VARCHAR(50)
);
CREATE INDEX idx_user_tags_user ON mydb.user_tags(user_id);
CREATE INDEX idx_user_tags_tag ON mydb.user_tags(tag);
5.2 Data Export Using mongodump/mongoexport¶
Method 1: mongodump (Binary, Recommended for Large Datasets)¶
# Export entire database
mongodump \
--uri="mongodb://source-mongodb:27017/mydb" \
--out=/backup/mongodb \
--gzip
# Export specific collection
mongodump \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=users \
--out=/backup/mongodb \
--gzip
# Export with query filter (incremental)
mongodump \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=orders \
--query='{"createdAt": {"$gte": {"$date": "2025-01-01T00:00:00Z"}}}' \
--out=/backup/mongodb
Method 2: mongoexport (JSON/CSV, Good for Transformation)¶
# Export to JSON (extended JSON format)
mongoexport \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=users \
--out=users.json \
--jsonArray
# Export to CSV (for relational import)
mongoexport \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=users \
--type=csv \
--fields=_id,username,email,profile.firstName,profile.lastName \
--out=users.csv
# Export with query filter
mongoexport \
--uri="mongodb://source-mongodb:27017/mydb" \
--collection=orders \
--query='{"status": "completed"}' \
--out=completed_orders.json
5.3 Data Transformation and Import¶
Direct Import via mongorestore (Fastest)¶
# Import directly to HeliosDB (uses MongoDB wire protocol)
mongorestore \
--uri="mongodb://heliosdb-server:27017/mydb" \
--gzip \
/backup/mongodb/mydb
# Import specific collection
mongorestore \
--uri="mongodb://heliosdb-server:27017/mydb" \
--collection=users \
--gzip \
/backup/mongodb/mydb/users.bson.gz
# Import with options
mongorestore \
--uri="mongodb://heliosdb-server:27017/mydb" \
--drop \
--numInsertionWorkers=4 \
--batchSize=1000 \
/backup/mongodb/mydb
JSON Import via SQL COPY¶
# Convert MongoDB extended JSON to standard JSON
cat users.json | jq -c '.[] | {_id: ._id."$oid", data: .}' > users_transformed.json
# Import into HeliosDB using COPY
psql -h heliosdb-server -U admin -d mydb -c "
COPY users (data) FROM STDIN WITH (FORMAT json);
" < users_transformed.json
Python Migration Script (Complex Transformations)¶
#!/usr/bin/env python3
"""
MongoDB to HeliosDB Migration Script
Handles complex data transformations during migration
"""
from pymongo import MongoClient
from datetime import datetime
import uuid
import json
# Source MongoDB
source_client = MongoClient("mongodb://source-mongodb:27017/")
source_db = source_client.mydb
# Target HeliosDB (via MongoDB protocol)
target_client = MongoClient("mongodb://heliosdb-server:27017/")
target_db = target_client.mydb
def convert_objectid(oid):
"""Convert MongoDB ObjectId to UUID format"""
hex_str = str(oid)
# Pad to 32 characters for UUID
padded = hex_str.ljust(32, '0')
return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])
def transform_document(doc):
"""Transform document for HeliosDB compatibility"""
transformed = {}
for key, value in doc.items():
if key == '_id':
# Convert ObjectId to UUID string
transformed['_id'] = str(convert_objectid(value))
elif isinstance(value, datetime):
# Convert datetime to ISO string
transformed[key] = value.isoformat()
elif isinstance(value, dict):
# Recursively transform nested documents
transformed[key] = transform_document(value)
elif isinstance(value, list):
# Transform array elements
transformed[key] = [
transform_document(v) if isinstance(v, dict) else v
for v in value
]
else:
transformed[key] = value
return transformed
def migrate_collection(collection_name, batch_size=1000):
"""Migrate a single collection with batching"""
source_collection = source_db[collection_name]
target_collection = target_db[collection_name]
total = source_collection.count_documents({})
migrated = 0
print(f"Migrating {collection_name}: {total} documents")
cursor = source_collection.find().batch_size(batch_size)
batch = []
for doc in cursor:
transformed = transform_document(doc)
batch.append(transformed)
if len(batch) >= batch_size:
target_collection.insert_many(batch, ordered=False)
migrated += len(batch)
print(f" Progress: {migrated}/{total} ({100*migrated/total:.1f}%)")
batch = []
# Insert remaining documents
if batch:
target_collection.insert_many(batch, ordered=False)
migrated += len(batch)
print(f" Completed: {migrated} documents migrated")
return migrated
def migrate_indexes(collection_name):
"""Recreate indexes in target collection"""
source_collection = source_db[collection_name]
target_collection = target_db[collection_name]
indexes = source_collection.index_information()
for name, info in indexes.items():
if name == '_id_':
continue # Skip default _id index
keys = info['key']
options = {k: v for k, v in info.items()
if k not in ('key', 'v', 'ns')}
try:
target_collection.create_index(keys, name=name, **options)
print(f" Created index: {name}")
except Exception as e:
print(f" Warning: Could not create index {name}: {e}")
def main():
collections = source_db.list_collection_names()
# Filter out system collections
collections = [c for c in collections
if not c.startswith('system.')]
print(f"Found {len(collections)} collections to migrate")
for collection_name in collections:
migrate_collection(collection_name)
migrate_indexes(collection_name)
print("\nMigration complete!")
if __name__ == "__main__":
main()
5.4 Application Connection Changes¶
Python (PyMongo)¶
# Before: MongoDB
from pymongo import MongoClient
client = MongoClient("mongodb://mongodb-server:27017/")
# After: HeliosDB (same code, different hostname)
from pymongo import MongoClient
client = MongoClient("mongodb://heliosdb-server:27017/")
# With authentication
client = MongoClient(
"mongodb://heliosdb-server:27017/",
username="myuser",
password="mypassword",
authSource="admin",
authMechanism="SCRAM-SHA-256"
)
# With connection pooling and options
client = MongoClient(
"mongodb://heliosdb-server:27017/",
maxPoolSize=50,
minPoolSize=10,
maxIdleTimeMS=30000,
serverSelectionTimeoutMS=5000
)
Node.js (mongodb driver)¶
// Before: MongoDB
const { MongoClient } = require('mongodb');
const client = new MongoClient('mongodb://mongodb-server:27017/');
// After: HeliosDB
const client = new MongoClient('mongodb://heliosdb-server:27017/', {
auth: {
username: 'myuser',
password: 'mypassword'
},
authSource: 'admin',
maxPoolSize: 50
});
Java (MongoDB Driver)¶
// Before: MongoDB
MongoClient client = MongoClients.create("mongodb://mongodb-server:27017/");
// After: HeliosDB
MongoClientSettings settings = MongoClientSettings.builder()
.applyConnectionString(new ConnectionString("mongodb://heliosdb-server:27017/"))
.credential(MongoCredential.createScramSha256Credential(
"myuser", "admin", "mypassword".toCharArray()))
.build();
MongoClient client = MongoClients.create(settings);
Environment-Based Configuration¶
import os
# Use environment variable for easy switching
MONGODB_URI = os.environ.get(
"MONGODB_URI",
"mongodb://heliosdb-server:27017/"
)
client = MongoClient(MONGODB_URI)
6. Query Translation Examples¶
6.1 find() to SELECT¶
Basic Find¶
-- HeliosDB SQL equivalent
SELECT * FROM users WHERE data->>'status' = 'active';
-- Or with hybrid schema
SELECT * FROM users WHERE status = 'active';
Find with Projection¶
-- HeliosDB SQL
SELECT
data->>'username' AS username,
data->>'email' AS email
FROM users
WHERE data->>'status' = 'active';
Find with Operators¶
// MongoDB: comparison operators
db.orders.find({
amount: { $gte: 100, $lte: 500 },
status: { $in: ["pending", "processing"] }
})
-- HeliosDB SQL
SELECT * FROM orders
WHERE (data->>'amount')::numeric BETWEEN 100 AND 500
AND data->>'status' IN ('pending', 'processing');
Find with Nested Fields¶
-- HeliosDB SQL
SELECT * FROM users
WHERE (data->'profile'->>'age')::integer >= 21;
-- Using jsonpath (more efficient)
SELECT * FROM users
WHERE data @@ '$.profile.age >= 21';
Find with Arrays¶
// MongoDB: array queries
db.users.find({ tags: "mongodb" }) // contains
db.users.find({ tags: { $all: ["mongodb", "developer"] } }) // contains all
db.users.find({ "items.price": { $gt: 100 } }) // array element match
-- HeliosDB SQL: array contains
SELECT * FROM users WHERE data->'tags' ? 'mongodb';
-- Contains all
SELECT * FROM users
WHERE data->'tags' ?& array['mongodb', 'developer'];
-- Array element match
SELECT * FROM users
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(data->'items') AS item
WHERE (item->>'price')::numeric > 100
);
6.2 aggregate() to SQL with JSON Functions¶
Group and Count¶
// MongoDB aggregation
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: "$category",
total: { $sum: "$amount" },
count: { $sum: 1 },
avgAmount: { $avg: "$amount" }
}},
{ $sort: { total: -1 } },
{ $limit: 10 }
])
-- HeliosDB SQL
SELECT
data->>'category' AS category,
SUM((data->>'amount')::numeric) AS total,
COUNT(*) AS count,
AVG((data->>'amount')::numeric) AS avg_amount
FROM orders
WHERE data->>'status' = 'completed'
GROUP BY data->>'category'
ORDER BY total DESC
LIMIT 10;
Lookup (Join)¶
// MongoDB $lookup
db.orders.aggregate([
{ $lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer"
}},
{ $unwind: "$customer" }
])
-- HeliosDB SQL (much simpler with native JOINs)
SELECT
o.*,
c.data AS customer
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;
-- Or with JSONB aggregation to mimic $unwind result
SELECT
o.data || jsonb_build_object('customer', c.data) AS document
FROM orders o
JOIN customers c ON o.data->>'customer_id' = c._id::text;
Faceted Search¶
// MongoDB $facet
db.products.aggregate([
{ $facet: {
byCategory: [
{ $group: { _id: "$category", count: { $sum: 1 } } }
],
priceRange: [
{ $bucket: {
groupBy: "$price",
boundaries: [0, 50, 100, 500],
default: "Other"
}}
],
totalCount: [
{ $count: "count" }
]
}}
])
-- HeliosDB SQL: Use separate queries or WITH clause
WITH
by_category AS (
SELECT data->>'category' AS category, COUNT(*) AS count
FROM products
GROUP BY data->>'category'
),
price_range AS (
SELECT
CASE
WHEN (data->>'price')::numeric < 50 THEN '0-50'
WHEN (data->>'price')::numeric < 100 THEN '50-100'
WHEN (data->>'price')::numeric < 500 THEN '100-500'
ELSE 'Other'
END AS range,
COUNT(*) AS count
FROM products
GROUP BY 1
),
total AS (
SELECT COUNT(*) AS count FROM products
)
SELECT
jsonb_build_object(
'byCategory', (SELECT jsonb_agg(row_to_json(by_category)) FROM by_category),
'priceRange', (SELECT jsonb_agg(row_to_json(price_range)) FROM price_range),
'totalCount', (SELECT count FROM total)
) AS facets;
6.3 update() to UPDATE with JSONB Operators¶
Basic Update¶
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(data, '{status}', '"active"')
WHERE _id = '...'::uuid;
-- With hybrid schema
UPDATE users SET status = 'active' WHERE _id = '...'::uuid;
Increment¶
// MongoDB
db.products.updateMany(
{ category: "electronics" },
{ $inc: { viewCount: 1, stock: -1 } }
)
-- HeliosDB SQL
UPDATE products
SET data = data || jsonb_build_object(
'viewCount', COALESCE((data->>'viewCount')::integer, 0) + 1,
'stock', COALESCE((data->>'stock')::integer, 0) - 1
)
WHERE data->>'category' = 'electronics';
Array Push¶
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
data,
'{tags}',
COALESCE(data->'tags', '[]'::jsonb) || '"premium"'::jsonb
)
WHERE _id = '...'::uuid;
Array Pull¶
-- HeliosDB SQL
UPDATE users
SET data = jsonb_set(
data,
'{tags}',
(SELECT jsonb_agg(elem)
FROM jsonb_array_elements(data->'tags') AS elem
WHERE elem != '"temporary"')
)
WHERE _id = '...'::uuid;
Upsert¶
// MongoDB
db.users.updateOne(
{ email: "new@example.com" },
{ $set: { name: "New User", createdAt: new Date() } },
{ upsert: true }
)
-- HeliosDB SQL
INSERT INTO users (data)
VALUES ('{"email": "new@example.com", "name": "New User", "createdAt": "..."}'::jsonb)
ON CONFLICT ((data->>'email'))
DO UPDATE SET data = users.data || EXCLUDED.data;
7. Feature Mapping¶
7.1 Aggregation Pipeline Equivalents¶
| MongoDB Stage | HeliosDB SQL Equivalent |
|---|---|
| $match | WHERE clause |
| $project | SELECT with JSON functions |
| $group | GROUP BY with aggregates |
| $sort | ORDER BY |
| $limit | LIMIT |
| $skip | OFFSET |
| $unwind | LATERAL jsonb_array_elements() |
| $lookup | JOIN |
| $graphLookup | WITH RECURSIVE |
| $bucket | CASE WHEN in GROUP BY |
| $facet | Multiple CTEs |
| $setWindowFields | Window functions |
7.2 Change Streams vs HeliosDB CDC¶
| Feature | MongoDB Change Streams | HeliosDB CDC |
|---|---|---|
| Protocol | Native MongoDB | Native + Debezium-compatible |
| Latency | ~100ms | <10ms |
| Resume | Token-based | LSN-based (more reliable) |
| Filtering | Pipeline stages | SQL predicates |
| Output | BSON | JSON, Avro, Protobuf |
| Destinations | Application code | Kafka, Webhooks, S3, etc. |
MongoDB Change Stream:
const pipeline = [
{ $match: { operationType: "insert" } }
];
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
console.log(change);
});
HeliosDB CDC (via MongoDB protocol - identical API):
// Same API works with HeliosDB
const changeStream = collection.watch(pipeline);
changeStream.on("change", (change) => {
console.log(change);
});
HeliosDB CDC (direct SQL subscription):
-- Create CDC subscription
CREATE PUBLICATION users_changes FOR TABLE users;
-- Subscribe via logical replication
CREATE SUBSCRIPTION users_sub
CONNECTION 'host=heliosdb-server dbname=mydb'
PUBLICATION users_changes;
HeliosDB Webhook CDC:
-- Configure webhook notification
SELECT helios_cdc.create_webhook(
'users',
'https://myapp.com/webhooks/user-changes',
'{"events": ["INSERT", "UPDATE", "DELETE"]}'
);
7.3 Geospatial Queries¶
| MongoDB | HeliosDB |
|---|---|
| 2dsphere index | GiST index with geography |
| $near | ST_DWithin + ORDER BY ST_Distance |
| $geoWithin | ST_Within |
| $geoIntersects | ST_Intersects |
| GeoJSON | Native GeoJSON support |
MongoDB Geospatial:
// Create 2dsphere index
db.places.createIndex({ location: "2dsphere" })
// Find near point
db.places.find({
location: {
$near: {
$geometry: { type: "Point", coordinates: [-73.97, 40.77] },
$maxDistance: 1000
}
}
})
// Find within polygon
db.places.find({
location: {
$geoWithin: {
$geometry: {
type: "Polygon",
coordinates: [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
}
}
}
})
HeliosDB SQL Geospatial:
-- Create spatial index
CREATE INDEX idx_places_location ON places
USING GIST (ST_GeomFromGeoJSON(data->>'location'));
-- Find near point (within 1000 meters)
SELECT * FROM places
WHERE ST_DWithin(
ST_GeomFromGeoJSON(data->>'location')::geography,
ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography,
1000
)
ORDER BY ST_Distance(
ST_GeomFromGeoJSON(data->>'location')::geography,
ST_SetSRID(ST_MakePoint(-73.97, 40.77), 4326)::geography
);
-- Find within polygon
SELECT * FROM places
WHERE ST_Within(
ST_GeomFromGeoJSON(data->>'location'),
ST_GeomFromGeoJSON('{
"type": "Polygon",
"coordinates": [[[-73.98, 40.76], [-73.96, 40.76], [-73.96, 40.78], [-73.98, 40.78], [-73.98, 40.76]]]
}')
);
7.4 Full-Text Search¶
| MongoDB | HeliosDB |
|---|---|
| Text index | GIN index with tsvector |
| $text search | to_tsvector/to_tsquery |
| Score sorting | ts_rank |
| Language support | 30+ languages |
| Stemming | Built-in |
MongoDB Text Search:
// Create text index
db.articles.createIndex({ title: "text", content: "text" })
// Search
db.articles.find(
{ $text: { $search: "mongodb database" } },
{ score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })
HeliosDB Full-Text Search:
-- Create full-text index
CREATE INDEX idx_articles_fts ON articles
USING GIN (to_tsvector('english',
COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')));
-- Search with ranking
SELECT
*,
ts_rank(
to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', '')),
plainto_tsquery('english', 'mongodb database')
) AS score
FROM articles
WHERE to_tsvector('english', COALESCE(data->>'title', '') || ' ' || COALESCE(data->>'content', ''))
@@ plainto_tsquery('english', 'mongodb database')
ORDER BY score DESC;
HeliosDB GraphRAG Semantic Search:
-- Enable semantic search (AI-powered)
SELECT helios_graphrag.enable_semantic_search('articles', 'content');
-- Semantic search
SELECT * FROM helios_graphrag.semantic_search(
'articles',
'How do document databases handle transactions?',
10 -- top 10 results
);
8. Post-Migration Validation¶
8.1 Data Integrity Checks¶
// Compare document counts
// Source MongoDB
const sourceCount = db.users.countDocuments({});
console.log(`Source count: ${sourceCount}`);
// Target HeliosDB
const targetCount = db.users.countDocuments({});
console.log(`Target count: ${targetCount}`);
// Verify counts match
if (sourceCount !== targetCount) {
console.error(`Count mismatch! Source: ${sourceCount}, Target: ${targetCount}`);
}
def validate_migration(source_db, target_db, collection_name, sample_size=1000):
"""Validate migrated data integrity"""
source = source_db[collection_name]
target = target_db[collection_name]
# Count validation
source_count = source.count_documents({})
target_count = target.count_documents({})
if source_count != target_count:
print(f"ERROR: Count mismatch - Source: {source_count}, Target: {target_count}")
return False
# Sample validation
sample = list(source.aggregate([{"$sample": {"size": sample_size}}]))
for doc in sample:
target_doc = target.find_one({"_id": doc["_id"]})
if target_doc is None:
print(f"ERROR: Document {doc['_id']} not found in target")
return False
# Compare key fields (adjust based on your schema)
for key in ["username", "email", "status"]:
if doc.get(key) != target_doc.get(key):
print(f"ERROR: Field {key} mismatch for {doc['_id']}")
return False
print(f"Validation passed: {source_count} documents verified")
return True
8.2 Query Validation¶
def validate_queries(source_db, target_db, collection_name):
"""Run test queries on both databases and compare results"""
test_queries = [
# Basic find
{"status": "active"},
# Range query
{"created_at": {"$gte": datetime(2025, 1, 1)}},
# Nested field
{"profile.age": {"$gte": 21}},
# Array query
{"tags": {"$in": ["premium", "vip"]}},
]
source = source_db[collection_name]
target = target_db[collection_name]
for query in test_queries:
source_results = list(source.find(query).limit(100))
target_results = list(target.find(query).limit(100))
if len(source_results) != len(target_results):
print(f"Query result count mismatch for {query}")
print(f" Source: {len(source_results)}, Target: {len(target_results)}")
else:
print(f"Query {query}: OK ({len(source_results)} results)")
8.3 Index Validation¶
// Compare indexes
function compareIndexes(sourceDb, targetDb, collectionName) {
const sourceIndexes = sourceDb[collectionName].getIndexes();
const targetIndexes = targetDb[collectionName].getIndexes();
const sourceNames = sourceIndexes.map(i => i.name).sort();
const targetNames = targetIndexes.map(i => i.name).sort();
console.log("Source indexes:", sourceNames);
console.log("Target indexes:", targetNames);
// Check for missing indexes
for (const name of sourceNames) {
if (!targetNames.includes(name)) {
console.log(`WARNING: Index ${name} missing in target`);
}
}
}
8.4 Performance Validation¶
import time
def benchmark_queries(db, collection_name, iterations=100):
"""Benchmark common query patterns"""
collection = db[collection_name]
benchmarks = {
"find_by_id": lambda: collection.find_one({"_id": sample_id}),
"find_by_index": lambda: list(collection.find({"status": "active"}).limit(10)),
"aggregation": lambda: list(collection.aggregate([
{"$match": {"status": "active"}},
{"$group": {"_id": "$category", "count": {"$sum": 1}}}
])),
"text_search": lambda: list(collection.find(
{"$text": {"$search": "search terms"}}
).limit(10)),
}
results = {}
for name, query_func in benchmarks.items():
times = []
for _ in range(iterations):
start = time.time()
query_func()
times.append(time.time() - start)
results[name] = {
"avg_ms": sum(times) / len(times) * 1000,
"min_ms": min(times) * 1000,
"max_ms": max(times) * 1000
}
print(f"{name}: avg={results[name]['avg_ms']:.2f}ms")
return results
8.5 Application Testing Checklist¶
- [ ] Unit tests pass against HeliosDB
- [ ] Integration tests pass with new connection strings
- [ ] CRUD operations work correctly
- [ ] Aggregation pipelines return expected results
- [ ] Change streams receive events
- [ ] Transactions commit and rollback properly
- [ ] Authentication works with all user accounts
- [ ] Connection pooling handles load correctly
- [ ] Error handling catches expected exceptions
9. Performance Considerations¶
9.1 Index Optimization¶
-- Analyze query patterns
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';
-- Create expression indexes for frequent queries
CREATE INDEX idx_users_status ON users ((data->>'status'));
CREATE INDEX idx_users_email ON users ((data->>'email'));
-- Create GIN index for flexible JSON queries
CREATE INDEX idx_users_data ON users USING GIN (data jsonb_path_ops);
-- For range queries on numeric fields
CREATE INDEX idx_orders_amount ON orders (((data->>'amount')::numeric));
-- Composite index for multi-field queries
CREATE INDEX idx_orders_status_date ON orders (
(data->>'status'),
((data->>'created_at')::timestamptz)
);
9.2 Query Optimization¶
-- Use JSONB containment for exact matches (uses GIN index)
SELECT * FROM users WHERE data @> '{"status": "active"}';
-- Use jsonpath for complex queries
SELECT * FROM users
WHERE data @@ '$.profile.age > 21 && $.tags[*] == "premium"';
-- Avoid full table scans with proper indexes
-- BAD: No index usage
SELECT * FROM users WHERE data->>'name' ILIKE '%john%';
-- GOOD: Use full-text search
SELECT * FROM users
WHERE to_tsvector('english', data->>'name') @@ to_tsquery('john');
9.3 Bulk Operations¶
# Batch inserts for better performance
def bulk_insert(collection, documents, batch_size=1000):
"""Insert documents in batches for optimal performance"""
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
collection.insert_many(batch, ordered=False)
print(f"Inserted batch {i//batch_size + 1}")
-- Use COPY for large data imports (SQL mode)
COPY users (data) FROM '/path/to/data.json' WITH (FORMAT json);
-- Disable indexes during bulk load, rebuild after
ALTER INDEX idx_users_data DISABLE;
-- ... bulk insert ...
REINDEX INDEX idx_users_data;
9.4 Connection Pooling¶
# Configure connection pooling
from pymongo import MongoClient
client = MongoClient(
"mongodb://heliosdb-server:27017/",
maxPoolSize=100, # Maximum connections
minPoolSize=10, # Minimum connections to maintain
maxIdleTimeMS=30000, # Close idle connections after 30s
waitQueueTimeoutMS=5000, # Timeout waiting for connection
socketTimeoutMS=30000, # Socket operation timeout
serverSelectionTimeoutMS=5000 # Server selection timeout
)
9.5 Memory and Cache Configuration¶
-- Configure HeliosDB for document workloads
ALTER SYSTEM SET work_mem = '256MB'; -- For complex aggregations
ALTER SYSTEM SET shared_buffers = '8GB'; -- For caching
ALTER SYSTEM SET effective_cache_size = '24GB'; -- Query planner hint
-- Enable intelligent caching for hot collections
ALTER TABLE users ENABLE INTELLIGENT_CACHING;
10. Common Issues and Troubleshooting¶
10.1 Connection Issues¶
Problem: Connection refused
Solution:
# Check if HeliosDB is listening on MongoDB port
netstat -tlnp | grep 27017
# Verify MongoDB protocol is enabled
psql -c "SHOW helios.mongodb_port;"
# Check firewall rules
sudo firewall-cmd --list-ports | grep 27017
Problem: Authentication failed
Solution:
# Ensure correct auth mechanism
client = MongoClient(
"mongodb://user:password@heliosdb-server:27017/",
authMechanism="SCRAM-SHA-256", # Ensure this matches server config
authSource="admin"
)
10.2 Query Compatibility Issues¶
Problem: Unsupported operator
Solution:
# Check compatibility matrix, use alternative
# Instead of custom operators, use $expr with aggregation expressions
# Instead of
db.collection.find({"$customOp": value})
# Use
db.collection.find({"$expr": {"$eq": ["$field", value]}})
Problem: Query returns different results
Solution:
# Check for type mismatches
# MongoDB may have stored numbers as strings
# Debug: Check actual stored type
doc = collection.find_one()
print(type(doc.get('amount'))) # Check if string vs number
# Fix: Ensure consistent types during migration
if isinstance(doc['amount'], str):
collection.update_one(
{"_id": doc["_id"]},
{"$set": {"amount": float(doc['amount'])}}
)
10.3 Performance Issues¶
Problem: Slow queries after migration
Solution:
-- Check if indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';
-- Analyze table statistics
ANALYZE users;
-- Check query plan
EXPLAIN ANALYZE SELECT * FROM users WHERE data->>'status' = 'active';
-- Create missing indexes
CREATE INDEX CONCURRENTLY idx_users_status ON users ((data->>'status'));
Problem: High memory usage
Solution:
-- Check for large documents
SELECT pg_size_pretty(pg_total_relation_size('users')) AS total_size;
-- Find large documents
SELECT _id, pg_column_size(data) AS size
FROM users
ORDER BY size DESC
LIMIT 10;
-- Consider splitting large documents or using hybrid schema
10.4 Data Type Issues¶
Problem: ObjectId conversion errors
Solution:
from bson import ObjectId
import uuid
def objectid_to_uuid(oid):
"""Convert MongoDB ObjectId to UUID"""
hex_str = str(oid)
# Pad to 32 hex characters
padded = hex_str.ljust(32, '0')
return uuid.UUID(padded[:8] + '-' + padded[8:12] + '-' +
padded[12:16] + '-' + padded[16:20] + '-' + padded[20:32])
Problem: Date/time precision loss
Solution:
# MongoDB stores dates as milliseconds since epoch
# Ensure proper conversion
from datetime import datetime
def convert_date(mongo_date):
"""Preserve date precision during migration"""
if isinstance(mongo_date, datetime):
return mongo_date.isoformat()
return mongo_date
10.5 Change Stream Issues¶
Problem: Change stream not receiving events
Solution:
# Ensure collection has proper CDC enabled
# Check via SQL
# SELECT * FROM pg_publication_tables WHERE pubname = 'mongodb_cdc';
# Restart change stream with explicit options
change_stream = collection.watch(
full_document='updateLookup', # Include full document on updates
max_await_time_ms=1000
)
10.6 Transaction Issues¶
Problem: Transaction aborted unexpectedly
Solution:
# Increase transaction timeout
with client.start_session() as session:
with session.start_transaction(
max_commit_time_ms=30000, # 30 second timeout
read_concern=ReadConcern("snapshot"),
write_concern=WriteConcern("majority")
):
# Transaction operations
pass
10.7 Debugging Tools¶
# Enable MongoDB driver debugging
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('pymongo')
logger.setLevel(logging.DEBUG)
-- Enable HeliosDB query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 100; -- Log queries > 100ms
SELECT pg_reload_conf();
-- View recent logs
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;
Appendix A: Quick Reference Card¶
Connection String Mapping¶
| MongoDB | HeliosDB |
|---|---|
mongodb://host:27017/ |
mongodb://host:27017/ |
mongodb+srv://cluster/ |
mongodb://heliosdb:27017/ |
authSource=admin |
authSource=admin |
replicaSet=rs0 |
Not needed (built-in HA) |
Common Query Translation¶
| MongoDB | HeliosDB SQL |
|---|---|
find({}) |
SELECT * FROM collection |
find({a: 1}) |
WHERE data->>'a' = '1' |
find({a: {$gt: 1}}) |
WHERE (data->>'a')::int > 1 |
find({a: {$in: [1,2]}}) |
WHERE data->>'a' IN ('1','2') |
updateOne({}, {$set: {a: 1}}) |
UPDATE SET data = jsonb_set(data, '{a}', '1') |
aggregate([{$group: ...}]) |
SELECT ... GROUP BY ... |
Useful HeliosDB JSON Functions¶
| Function | Purpose |
|---|---|
data->>'field' |
Extract field as text |
data->'field' |
Extract field as JSONB |
data @> '{}' |
JSONB containment |
data ? 'key' |
Key exists |
jsonb_set() |
Set nested value |
jsonb_array_elements() |
Unwind array |
Appendix B: Migration Checklist Summary¶
Pre-Migration¶
- [ ] Database inventory completed
- [ ] Data volume estimated
- [ ] GridFS usage identified and alternative planned
- [ ] Backup created and verified
- [ ] Test environment prepared
Migration¶
- [ ] Schema designed (pure document, hybrid, or relational)
- [ ] Data exported from MongoDB
- [ ] Data imported to HeliosDB
- [ ] Indexes recreated
- [ ] Connection strings updated
Post-Migration¶
- [ ] Document counts verified
- [ ] Sample data validated
- [ ] All queries tested
- [ ] Performance benchmarked
- [ ] Application tests passed
- [ ] Change streams working
- [ ] Monitoring configured
Related Documentation¶
- MongoDB Protocol README - Protocol overview
- MongoDB Compatibility Matrix - Feature support details
- MongoDB Examples - Code examples
- General Migration Guide - Cross-protocol migration
- MySQL Migration Guide - SQL migration reference
Last Updated: 2026-01-04 Migration Guide Version: 1.0 HeliosDB Compatibility: 7.0+