MongoDB to PostgreSQL migration

- Bottom line: Export MongoDB collections with mongoexport, design a relational or hybrid (JSONB) schema in PostgreSQL, transform nested documents into rows or JSONB columns, and load with \copy or a migration script -- use AWS DMS or CDC pipelines for zero-downtime migrations.

convert MongoDB to Postgres

- Bottom line: Export MongoDB collections with mongoexport, design a relational or hybrid (JSONB) schema in PostgreSQL, transform nested documents into rows or JSONB columns, and load with \copy or a migration script -- use AWS DMS or CDC pipelines for zero-downtime migrations.

document DB to relational migration

- Bottom line: Export MongoDB collections with mongoexport, design a relational or hybrid (JSONB) schema in PostgreSQL, transform nested documents into rows or JSONB columns, and load with \copy or a migration script -- use AWS DMS or CDC pipelines for zero-downtime migrations.

MongoDB to Postgres guide

- Bottom line: Export MongoDB collections with mongoexport, design a relational or hybrid (JSONB) schema in PostgreSQL, transform nested documents into rows or JSONB columns, and load with \copy or a migration script -- use AWS DMS or CDC pipelines for zero-downtime migrations.

move data from MongoDB to PostgreSQL

- Bottom line: Export MongoDB collections with mongoexport, design a relational or hybrid (JSONB) schema in PostgreSQL, transform nested documents into rows or JSONB columns, and load with \copy or a migration script -- use AWS DMS or CDC pipelines for zero-downtime migrations.

How to Migrate from MongoDB to PostgreSQL

How do I migrate from MongoDB to PostgreSQL?

TL;DR

Bottom line: Export MongoDB collections with mongoexport, design a relational or hybrid (JSONB) schema in PostgreSQL, transform nested documents into rows or JSONB columns, and load with \copy or a migration script — use AWS DMS or CDC pipelines for zero-downtime migrations.
Key tool/command: mongoexport --db mydb --collection users --type=json --out users.json
Watch out for: Dumping nested MongoDB documents directly into flat relational tables without deciding what to normalize vs. store as JSONB — this leads to either data loss or unmaintainable schemas.
Works with: MongoDB 4.x–8.x, PostgreSQL 14–18, AWS DMS 3.4.5+, any language with MongoDB and PostgreSQL drivers.

Constraints

Never run a production migration without first auditing field type variance across at least 10% of documents — MongoDB collections routinely store the same field as different types (string vs number). [src5]
Always preserve the original MongoDB ObjectId in a reference column (mongo_id) during migration — you will need it for cross-referencing, debugging, and rollback. [src6]
Use PostgreSQL COPY protocol (not individual INSERTs) for bulk loading — COPY is 5–10x faster and avoids transaction overhead on large datasets. [src7]
Disable foreign key constraints during bulk load (ALTER TABLE ... DISABLE TRIGGER ALL;) and re-enable + validate after — FK violations during load are the #1 cause of failed migrations. [src3]
MongoDB Decimal128 must map to PostgreSQL NUMERIC, not FLOAT — using FLOAT silently loses precision on financial data. [src1]
AWS DMS document mode stores entire documents in a single _doc JSONB column — this requires a second normalization pass and is not a complete migration on its own. [src3]

Quick Reference

MongoDB Concept	PostgreSQL Equivalent	Example
Collection	Table	`CREATE TABLE users (...)`
Document (BSON)	Row (or JSONB column)	`INSERT INTO users (data) VALUES ('{"name":"Alice"}'::jsonb)`
`_id` (ObjectId)	`UUID` or `BIGSERIAL` primary key	`id UUID DEFAULT gen_random_uuid()`
Embedded subdocument	JSONB column or separate table (FK)	`metadata JSONB` or `CREATE TABLE addresses (user_id UUID REFERENCES users)`
Array field	JSONB array or junction table	`tags JSONB` or `CREATE TABLE user_tags (user_id, tag_id)`
`$lookup` (aggregation)	`JOIN`	`SELECT * FROM orders JOIN users ON orders.user_id = users.id`
`find({status: "active"})`	`WHERE` clause	`SELECT * FROM users WHERE status = 'active'`
`find({"addr.city": "NYC"})`	JSONB operator `->>`	`SELECT * FROM users WHERE data->>'city' = 'NYC'`
Compound index	B-tree composite index	`CREATE INDEX idx ON users (status, created_at)`
Text index	`tsvector` + GIN index	`CREATE INDEX idx ON docs USING GIN (to_tsvector('english', body))`
`$regex` query	`~` or `LIKE`/`ILIKE`	`SELECT * FROM users WHERE name ~ '^Al'`
Schema-less fields	JSONB with GIN index	`CREATE INDEX idx ON users USING GIN (metadata)`
`db.collection.aggregate()`	SQL `GROUP BY` / window functions	`SELECT status, COUNT(*) FROM orders GROUP BY status`
Replica set (read scaling)	Read replicas / streaming replication	`pg_basebackup` + streaming replication
`Decimal128`	`NUMERIC`	`price NUMERIC(12,2)`
`ISODate()`	`TIMESTAMPTZ`	`created_at TIMESTAMPTZ DEFAULT NOW()`
`JSON_TABLE()` (PG 17+)	Explode JSON arrays to rows	`SELECT * FROM JSON_TABLE(doc, '$.items[*]' COLUMNS ...)`
UUID v7 (PG 18+)	Time-ordered UUIDs	`id UUID DEFAULT uuidv7()`

Decision Tree

START
├── Is downtime acceptable during migration?
│   ├── YES → Use mongoexport + ETL script + COPY (simplest approach)
│   └── NO ↓
├── Need real-time sync during cutover?
│   ├── YES → Use AWS DMS with CDC or Debezium + Kafka pipeline
│   └── NO ↓
├── Data volume > 100 GB?
│   ├── YES → Use parallel export (mongodump --numParallelCollections) + batch COPY
│   └── NO ↓
├── Documents have highly variable schemas?
│   ├── YES → Hybrid approach: relational columns for common fields + JSONB for variable
│   └── NO ↓
├── Deep nesting (>3 levels)?
│   ├── YES → Store as JSONB, add GIN indexes for query paths
│   └── NO ↓
└── DEFAULT → Fully normalize into relational tables with foreign keys

Step-by-Step Guide

1. Audit MongoDB collections and plan the schema

Analyze every collection's document structure, field types, nesting depth, and access patterns. This determines which fields to normalize into columns vs. store as JSONB. [src4, src5]

# List all collections and their document counts
mongosh --eval 'db.getCollectionNames().forEach(c => print(c, db[c].countDocuments()))'

# Sample a collection's schema
mongosh --eval 'db.users.aggregate([{$sample: {size: 100}}]).forEach(d => printjson(Object.keys(d)))'

# Detect field type variance (critical for schema design)
mongosh --eval '
  db.users.aggregate([
    { $sample: { size: 5000 } },
    { $project: { _id: 0 } },
    { $addFields: { _keys: { $objectToArray: "$$ROOT" } } },
    { $unwind: "$_keys" },
    { $group: { _id: { field: "$_keys.k", type: { $type: "$_keys.v" } }, count: { $sum: 1 } } },
    { $sort: { "_id.field": 1 } }
  ])'

Verify: You have a list of all collections with field names, types, and nesting depth. Create an ER diagram for the target PostgreSQL schema before writing any migration code.

2. Create the PostgreSQL schema

Design tables based on your audit. Use a hybrid approach: relational columns for frequently queried fields, JSONB for variable or deeply nested data. [src1, src2]

-- PostgreSQL 18+: uuidv7() available for time-ordered UUIDs
-- PostgreSQL 13+: gen_random_uuid() is built-in
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    mongo_id VARCHAR(24),
    email TEXT NOT NULL UNIQUE,
    name TEXT NOT NULL,
    status TEXT DEFAULT 'active',
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ,
    metadata JSONB DEFAULT '{}'
);

CREATE TABLE addresses (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    street TEXT, city TEXT, state TEXT, zip TEXT,
    country TEXT DEFAULT 'US',
    is_primary BOOLEAN DEFAULT false
);

CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_status ON users (status);
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
CREATE INDEX idx_users_created ON users (created_at DESC);
CREATE INDEX idx_addresses_user ON addresses (user_id);

Verify: \dt shows all tables. \d users shows correct column types and constraints.

3. Export data from MongoDB

Use mongoexport for small-to-medium datasets or mongodump for large binary exports. JSON format preserves nested structure better than CSV. [src4]

# Export as JSON (one document per line)
mongoexport --db=myapp --collection=users --type=json --out=users.json

# For large collections, export in parallel
mongodump --db=myapp --numParallelCollections=4 --out=./dump/

# Verify export
wc -l users.json

Verify: Line count in exported JSON matches document count in MongoDB.

4. Transform and load data with a migration script

Write a script that reads MongoDB JSON, transforms documents to match the PostgreSQL schema, and inserts using batch operations. [src4, src5]

import json, psycopg2
conn = psycopg2.connect(host="localhost", dbname="myapp", user="postgres")
cur = conn.cursor()
BATCH_SIZE = 1000
batch = []

with open("users.json", "r") as f:
    for i, line in enumerate(f, 1):
        doc = json.loads(line)
        mongo_id = doc["_id"]["$oid"] if isinstance(doc["_id"], dict) else str(doc["_id"])
        created = doc.get("createdAt", {}).get("$date") if isinstance(doc.get("createdAt"), dict) else doc.get("createdAt")
        metadata = {k: v for k, v in doc.items() if k not in ("_id","email","name","status","createdAt")}
        batch.append((mongo_id, doc.get("email",""), doc.get("name",""),
                       doc.get("status","active"), created, json.dumps(metadata)))
        if i % BATCH_SIZE == 0:
            cur.executemany("""
                INSERT INTO users (mongo_id, email, name, status, created_at, metadata)
                VALUES (%s, %s, %s, %s, %s, %s::jsonb)
                ON CONFLICT (email) DO UPDATE SET name=EXCLUDED.name, metadata=EXCLUDED.metadata
            """, batch)
            batch.clear()
            conn.commit()
if batch:
    cur.executemany("""INSERT INTO users (mongo_id,email,name,status,created_at,metadata) VALUES (%s,%s,%s,%s,%s,%s::jsonb) ON CONFLICT (email) DO NOTHING""", batch)
    conn.commit()
cur.close(); conn.close()

Verify: SELECT COUNT(*) FROM users; matches the MongoDB document count.

5. Migrate indexes and validate queries

Recreate MongoDB indexes as PostgreSQL equivalents. Test every critical query path against the new schema. [src1, src2]

-- Full-text search (replaces MongoDB text index)
ALTER TABLE products ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,''))) STORED;
CREATE INDEX idx_products_search ON products USING GIN (search_vector);

-- GIN index on JSONB (replaces MongoDB flexible queries)
CREATE INDEX idx_events_metadata ON events USING GIN (metadata);

-- PostgreSQL 17+: Use JSON_TABLE for analytics on JSONB arrays
SELECT jt.sku, SUM(jt.qty) AS total
FROM orders, JSON_TABLE(items, '$[*]' COLUMNS (sku TEXT PATH '$.sku', qty INT PATH '$.qty')) AS jt
GROUP BY jt.sku;

Verify: EXPLAIN ANALYZE on your most common queries confirms index usage.

6. Set up dual-write for zero-downtime cutover (optional)

For production systems, implement a dual-write pattern: writes go to both databases, reads shift gradually to PostgreSQL. [src3, src4, src8]

// Node.js dual-write middleware
const { MongoClient } = require('mongodb');
const { Pool } = require('pg');

async function dualWrite(collection, pgTable, document, pgTransform) {
    const pgRow = pgTransform(document);
    // PostgreSQL is source of truth
    const pgResult = await pg.query(
        `INSERT INTO ${pgTable} (${Object.keys(pgRow).join(', ')})
         VALUES (${Object.keys(pgRow).map((_, i) => `$${i+1}`).join(', ')})
         RETURNING id`,
        Object.values(pgRow)
    );
    try {
        await mongoDb.collection(collection).insertOne({
            ...document, _pg_id: pgResult.rows[0].id
        });
    } catch (err) {
        console.error(`MongoDB shadow write failed: ${err.message}`);
    }
    return pgResult.rows[0];
}

Verify: Spot-check 100 records — data in both databases should match.

7. Validate data integrity and cut over

Run full data validation before decommissioning MongoDB. Compare record counts, checksums on critical fields, and test every query path. [src5, src8]

-- Count comparison
SELECT 'users' AS table_name, COUNT(*) FROM users
UNION ALL SELECT 'orders', COUNT(*) FROM orders;

-- Checksum critical fields
SELECT MD5(STRING_AGG(email || name || status, '' ORDER BY mongo_id)) AS checksum FROM users;

-- Verify no FK orphans
SELECT COUNT(*) FROM addresses a
LEFT JOIN users u ON a.user_id = u.id
WHERE u.id IS NULL;  -- Should return 0

Verify: All counts match. FK orphan check returns 0. Application test suite passes against PostgreSQL.

Code Examples

Python: Full collection migration with progress tracking

# Input:  MongoDB connection string + collection name
# Output: Data migrated to PostgreSQL table with JSONB hybrid schema

import json, time, psycopg2
from pymongo import MongoClient    # pymongo==4.8.0

mongo = MongoClient("mongodb://localhost:27017")
pg_conn = psycopg2.connect("dbname=myapp user=postgres host=localhost")
pg_cur = pg_conn.cursor()
BATCH_SIZE = 5000

def migrate_collection(db_name, coll_name, pg_table, field_map):
    coll = mongo[db_name][coll_name]
    total = coll.estimated_document_count()
    start = time.time()
    batch = []
    columns = [pg_col for pg_col, _ in field_map.values()] + ["metadata"]
    placeholders = ", ".join(["%s"] * len(columns))
    sql = f"INSERT INTO {pg_table} ({', '.join(columns)}) VALUES ({placeholders}) ON CONFLICT DO NOTHING"

    for i, doc in enumerate(coll.find().batch_size(BATCH_SIZE), 1):
        row = [transform(doc.get(field)) for field, (_, transform) in field_map.items()]
        extra = {k: v for k, v in doc.items() if k not in set(field_map) | {"_id"}}
        row.append(json.dumps(extra, default=str))
        batch.append(tuple(row))
        if len(batch) >= BATCH_SIZE:
            pg_cur.executemany(sql, batch)
            pg_conn.commit()
            print(f"  {i}/{total} ({i*100//total}%) — {i/(time.time()-start):.0f} docs/sec")
            batch.clear()
    if batch:
        pg_cur.executemany(sql, batch)
        pg_conn.commit()
    print(f"Done: {total} docs in {time.time()-start:.1f}s")

migrate_collection("myapp", "users", "users", {
    "email":     ("email",      lambda v: v),
    "name":      ("name",       lambda v: v or ""),
    "status":    ("status",     lambda v: v or "active"),
    "createdAt": ("created_at", lambda v: v.isoformat() if v else None),
})

Node.js: Streaming migration for large collections

// Input:  MongoDB collection with millions of documents
// Output: Data streamed into PostgreSQL using COPY protocol

const { MongoClient } = require('mongodb');     // [email protected]
const { Pool } = require('pg');                  // [email protected]
const { pipeline } = require('stream/promises');
const { Transform } = require('stream');
const copyFrom = require('pg-copy-streams').from; // [email protected]

async function streamMigrate(dbName, collName, pgTable, transformDoc) {
    const mongo = new MongoClient('mongodb://localhost:27017');
    const pg = new Pool({ connectionString: 'postgresql://postgres:secret@localhost/myapp' });
    await mongo.connect();
    const collection = mongo.db(dbName).collection(collName);
    let count = 0;
    const pgClient = await pg.connect();
    const copyStream = pgClient.query(
        copyFrom(`COPY ${pgTable} FROM STDIN WITH (FORMAT csv, NULL '\\N')`)
    );
    const mongoStream = collection.find().stream();
    const transformer = new Transform({
        objectMode: true,
        transform(doc, _enc, cb) {
            count++;
            const row = transformDoc(doc);
            const csv = row.map(f => {
                if (f == null) return '\\N';
                const s = String(f);
                return s.includes(',') || s.includes('"') || s.includes('\n')
                    ? `"${s.replace(/"/g, '""')}"` : s;
            }).join(',') + '\n';
            cb(null, csv);
        }
    });
    await pipeline(mongoStream, transformer, copyStream);
    console.log(`Migrated ${count} rows into ${pgTable}`);
    pgClient.release(); await pg.end(); await mongo.close();
}

streamMigrate('myapp', 'orders', 'orders', (doc) => [
    doc._id.toString(), doc.userId?.toString(), doc.total || 0,
    doc.status || 'pending', doc.createdAt?.toISOString(), JSON.stringify(doc.items)
]);

SQL: Query translation examples (MongoDB shell to PostgreSQL)

-- MongoDB: db.orders.find({status: "shipped", total: {$gte: 100}}).sort({createdAt: -1}).limit(20)
SELECT * FROM orders WHERE status = 'shipped' AND total >= 100 ORDER BY created_at DESC LIMIT 20;

-- MongoDB: db.orders.aggregate([{$group: {_id: "$status", count: {$sum: 1}, avg: {$avg: "$total"}}}])
SELECT status, COUNT(*) AS count, AVG(total) AS avg_total FROM orders GROUP BY status;

-- MongoDB: db.users.find({"address.city": "New York"})
-- Normalized:
SELECT u.* FROM users u JOIN addresses a ON a.user_id = u.id WHERE a.city = 'New York';
-- JSONB:
SELECT * FROM users WHERE metadata->>'city' = 'New York';

-- MongoDB: db.users.find({tags: {$in: ["premium", "vip"]}})
-- JSONB array:
SELECT * FROM users WHERE metadata->'tags' ?| ARRAY['premium', 'vip'];

-- MongoDB: $unwind + $group (average of nested array)
SELECT id, AVG((review->>'rating')::numeric) AS avg_rating
FROM products, jsonb_array_elements(reviews) AS review GROUP BY id;

Anti-Patterns

Wrong: Dumping entire documents into a single JSONB column

-- ❌ BAD — stores entire MongoDB document as one JSONB blob
CREATE TABLE users (id SERIAL PRIMARY KEY, doc JSONB);
INSERT INTO users (doc) VALUES ('{"email":"[email protected]","name":"Alice","status":"active"}'::jsonb);
-- No constraints, no type safety, every query needs JSONB operators

Correct: Hybrid schema with relational columns for common fields

-- ✅ GOOD — frequently queried fields are real columns with types and constraints
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT NOT NULL UNIQUE,
    name TEXT NOT NULL,
    status TEXT DEFAULT 'active' CHECK (status IN ('active','inactive','suspended')),
    created_at TIMESTAMPTZ NOT NULL,
    metadata JSONB DEFAULT '{}'  -- only variable fields here
);

Wrong: Using VARCHAR(24) to store ObjectIds as primary keys

-- ❌ BAD — ObjectIds make poor PostgreSQL primary keys
CREATE TABLE users (id VARCHAR(24) PRIMARY KEY, email TEXT);
-- 24-char string comparisons are slower than UUID, no built-in generation

Correct: Use UUID with a mongo_id reference column

-- ✅ GOOD — native UUID primary key + reference column for traceability
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    mongo_id VARCHAR(24),  -- for cross-referencing during migration
    email TEXT NOT NULL UNIQUE
);
CREATE INDEX idx_users_mongo_id ON users (mongo_id);  -- drop post-migration

Wrong: Normalizing every nested field into separate tables

-- ❌ BAD — over-normalization of rarely-queried nested data
CREATE TABLE user_preferences (id UUID, user_id UUID, key TEXT, value TEXT);
CREATE TABLE user_notification_settings (id UUID, user_id UUID, channel TEXT, enabled BOOLEAN);
CREATE TABLE user_ui_settings (id UUID, user_id UUID, theme TEXT, language TEXT);
-- 4+ JOINs to reconstruct a single user profile

Correct: Use JSONB for infrequently queried nested data

-- ✅ GOOD — one JSONB column for settings, no extra tables
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT NOT NULL UNIQUE,
    preferences JSONB DEFAULT '{"notifications":{"email":true},"ui":{"theme":"light"}}'
);
-- Still queryable: SELECT * FROM users WHERE preferences->'ui'->>'theme' = 'dark';

Wrong: Inserting MongoDB arrays as comma-separated strings

# ❌ BAD — CSV strings lose structure and queryability
tags_str = ",".join(doc.get("tags", []))
cur.execute("INSERT INTO users (tags) VALUES (%s)", (tags_str,))

Correct: Use JSONB arrays or a proper junction table

# ✅ GOOD — preserve array structure in JSONB
import json
tags = doc.get("tags", [])
cur.execute("INSERT INTO users (tags) VALUES (%s::jsonb)", (json.dumps(tags),))
# Query: SELECT * FROM users WHERE tags ? 'premium'

Common Pitfalls

MongoDB extended JSON date format: mongoexport outputs dates as {"$date": "2026-01-01T00:00:00Z"} — not a plain ISO string. Fix: Parse with val["$date"] before inserting into TIMESTAMPTZ columns. [src4]
ObjectId serialization: {"$oid": "65a..."} in extended JSON, not a plain string. Fix: Extract with doc["_id"]["$oid"] or use str(doc["_id"]) with PyMongo. [src6]
Schema inference from small samples: MongoDB collections often have inconsistent field types. Fix: Sample at least 10% of documents or use $type aggregation to detect type variance per field. [src5]
Missing NULL handling: MongoDB omits fields entirely instead of storing null. Fix: Use .get(field, default) in migration scripts and set DEFAULT values in PostgreSQL. [src7]
Foreign key violations during load: Inserting child tables before parent rows causes FK errors. Fix: Disable FK constraints during bulk load (ALTER TABLE addresses DISABLE TRIGGER ALL;), re-enable after. [src3]
JSONB storage bloat: Storing large arrays (1000+ elements) as JSONB wastes storage and slows updates. Fix: Normalize arrays with >50 elements into junction tables. [src1]
Transaction size during bulk load: Wrapping millions of inserts in one transaction locks the table. Fix: Commit every 5,000–10,000 rows. Use COPY protocol for 10x throughput. [src7]
Losing MongoDB's automatic _id index: Every MongoDB collection has an index on _id. Fix: Ensure your PostgreSQL PRIMARY KEY is in place — it creates a B-tree index automatically. [src1]
AWS DMS document mode assumptions: DMS stores documents in _doc JSONB by default, giving a false sense of completion. Fix: Always plan a second normalization pass after DMS full-load completes. [src3]

Diagnostic Commands

# Check MongoDB collection stats before migration
mongosh --eval 'db.users.stats()' | grep -E 'count|size|storageSize|nindexes'

# Export MongoDB collection to JSON
mongoexport --db=myapp --collection=users --type=json --out=users.json

# Verify export line count
wc -l users.json

# Check PostgreSQL table after migration
psql -c "SELECT COUNT(*) AS row_count FROM users;"
psql -c "SELECT pg_size_pretty(pg_total_relation_size('users')) AS table_size;"

# Verify JSONB data is valid
psql -c "SELECT COUNT(*) FROM users WHERE metadata IS NOT NULL AND jsonb_typeof(metadata) = 'object';"

# Check index usage
psql -c "SELECT indexrelname, idx_scan, idx_tup_read FROM pg_stat_user_indexes WHERE schemaname = 'public' ORDER BY idx_scan DESC;"

# Compare record counts
mongosh --eval 'db.users.countDocuments()'
psql -c "SELECT COUNT(*) FROM users;"

# Detect orphaned foreign keys
psql -c "SELECT COUNT(*) FROM addresses a LEFT JOIN users u ON a.user_id = u.id WHERE u.id IS NULL;"

# Check for NULL in required fields post-migration
psql -c "SELECT COUNT(*) FROM users WHERE email IS NULL OR name IS NULL;"

Version History & Compatibility

Component	Version	Status	Migration Notes
MongoDB 8.0	Current	Supported	Full `mongoexport` support; queryable encryption not supported by DMS; AWS DMS 3.5.4+
MongoDB 7.0	Supported	AWS DMS 3.5.4+	Standard migration path
MongoDB 6.0	Supported	AWS DMS 3.5.2+	Timeseries collections not supported by DMS
MongoDB 5.0	Supported	AWS DMS 3.5.1+	Live resharding not supported by DMS
MongoDB 4.x	Legacy	AWS DMS 3.4.5+	Most documented migration path
PostgreSQL 18	Current (Sep 2025)	All features	Async I/O (2–3x perf), UUID v7, enhanced RETURNING clause
PostgreSQL 17	Supported	All features	`JSON_TABLE()` for analytics, improved VACUUM
PostgreSQL 16	Supported	All features	Improved JSONB performance
PostgreSQL 14–15	Supported	JSONB subscripting (PG 14+)	Use `jsonb_set()` for older versions

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Need ACID transactions across related documents	Schema is truly dynamic with no common fields	Keep MongoDB or use DynamoDB
Want SQL JOINs for complex reporting/analytics	Sub-millisecond reads on simple key-value lookups	Redis or MongoDB with read replicas
Regulatory requirements demand strong data integrity	Horizontal sharding across 10+ nodes is required	MongoDB Atlas, CockroachDB, or Citus
Team knows SQL better than MongoDB aggregation	Data is primarily geospatial with GeoJSON	MongoDB or PostGIS
PostgreSQL JSONB covers your schema flexibility needs	Write volume exceeds 100K inserts/sec sustained	MongoDB or TimescaleDB
Want a single database for relational + document data	Application is stable on MongoDB with no pain points	Don't migrate for migration's sake

Important Caveats

MongoDB's flexible schema means production collections often have undocumented field type variations (e.g., age stored as string in some docs, number in others). Always audit field types across a large sample before designing PostgreSQL columns.
AWS DMS document mode stores entire documents in a single _doc JSONB column — useful for initial migration but requires a second pass to normalize into relational columns.
PostgreSQL JSONB does not enforce schema — you lose MongoDB's optional schema validation. Use CHECK constraints or application-level validation.
The COPY protocol is 5–10x faster than individual INSERT statements for bulk loading. Always use COPY for large migrations.
MongoDB Decimal128 maps to PostgreSQL NUMERIC — verify precision is preserved, especially for financial data.
PostgreSQL 17's JSON_TABLE() function significantly simplifies analytics queries on migrated JSONB data — consider using it for reporting pipelines.
PostgreSQL 18's async I/O can provide 2–3x performance improvement for bulk load operations — upgrade before running large migrations if possible.

Decision Logic

If downtime is acceptable and dataset < 100 GB

→ Use mongoexport --type=json + a Python or Node.js ETL script + PostgreSQL COPY protocol. Simplest path, no AWS dependency. [src4, src7]

If zero-downtime is required

→ Use AWS DMS 3.5.4+ with CDC (or Debezium + Kafka). DMS supports MongoDB 7.0/8.0 source and PostgreSQL 17/18 target; expect a phased dual-run period of 1–3 months. [src3, src9]

If documents have highly variable schemas with no common fields

→ Do NOT migrate. Either keep MongoDB or accept a JSONB-only PostgreSQL schema and lose most relational benefits — the migration cost rarely pays off. [src5, src10]

If you also need vector search (RAG, embeddings, semantic similarity)

→ Migrate to PostgreSQL and use pgvector instead of running MongoDB Atlas Vector Search + a separate primary DB. Teams report ~60% fewer integration bugs with a single store. [src10]

If the application has > 2,000 MongoDB queries to rewrite

→ Budget 1–2 weeks per 100–200 queries and run a 3–6 month migration. Don't promise a 2-month timeline — documented enterprise migrations regularly slip from 2 to 7 months on query-rewrite alone. [src10]

If `Decimal128` fields exist (financial, scientific data)

→ Map to PostgreSQL NUMERIC with explicit precision, NEVER FLOAT or DOUBLE PRECISION. Test edge cases (0.1 + 0.2, large multiplications) before cutover. [src1]

If the source MongoDB is on Atlas with queryable encryption

→ AWS DMS does NOT support queryable-encryption sources as of 3.5.4. Use mongodump/mongoexport against a decrypted replica, or stage data through a non-encrypted secondary. [src3, src9]

How to Migrate from MongoDB to PostgreSQL

How do I migrate from MongoDB to PostgreSQL?

TL;DR

Constraints

Quick Reference

Decision Tree

Step-by-Step Guide

1. Audit MongoDB collections and plan the schema

2. Create the PostgreSQL schema

3. Export data from MongoDB

4. Transform and load data with a migration script

5. Migrate indexes and validate queries

6. Set up dual-write for zero-downtime cutover (optional)

7. Validate data integrity and cut over

Code Examples

Python: Full collection migration with progress tracking

Node.js: Streaming migration for large collections

SQL: Query translation examples (MongoDB shell to PostgreSQL)

Anti-Patterns

Wrong: Dumping entire documents into a single JSONB column

Correct: Hybrid schema with relational columns for common fields

Wrong: Using VARCHAR(24) to store ObjectIds as primary keys

Correct: Use UUID with a mongo_id reference column

Wrong: Normalizing every nested field into separate tables

Correct: Use JSONB for infrequently queried nested data

Wrong: Inserting MongoDB arrays as comma-separated strings

Correct: Use JSONB arrays or a proper junction table

Common Pitfalls

Diagnostic Commands

Version History & Compatibility

When to Use / When Not to Use

Important Caveats

Decision Logic

If downtime is acceptable and dataset < 100 GB

If zero-downtime is required

If documents have highly variable schemas with no common fields

If you also need vector search (RAG, embeddings, semantic similarity)

If the application has > 2,000 MongoDB queries to rewrite

If Decimal128 fields exist (financial, scientific data)

If the source MongoDB is on Atlas with queryable encryption

If `Decimal128` fields exist (financial, scientific data)