How to Migrate from MongoDB to PostgreSQL

Type: Software Reference Confidence: 0.89 Sources: 8 Verified: 2026-02-23 Freshness: quarterly

TL;DR

Constraints

Quick Reference

MongoDB ConceptPostgreSQL EquivalentExample
CollectionTableCREATE TABLE users (...)
Document (BSON)Row (or JSONB column)INSERT INTO users (data) VALUES ('{"name":"Alice"}'::jsonb)
_id (ObjectId)UUID or BIGSERIAL primary keyid UUID DEFAULT gen_random_uuid()
Embedded subdocumentJSONB column or separate table (FK)metadata JSONB or CREATE TABLE addresses (user_id UUID REFERENCES users)
Array fieldJSONB array or junction tabletags JSONB or CREATE TABLE user_tags (user_id, tag_id)
$lookup (aggregation)JOINSELECT * FROM orders JOIN users ON orders.user_id = users.id
find({status: "active"})WHERE clauseSELECT * FROM users WHERE status = 'active'
find({"addr.city": "NYC"})JSONB operator ->>SELECT * FROM users WHERE data->>'city' = 'NYC'
Compound indexB-tree composite indexCREATE INDEX idx ON users (status, created_at)
Text indextsvector + GIN indexCREATE INDEX idx ON docs USING GIN (to_tsvector('english', body))
$regex query~ or LIKE/ILIKESELECT * FROM users WHERE name ~ '^Al'
Schema-less fieldsJSONB with GIN indexCREATE INDEX idx ON users USING GIN (metadata)
db.collection.aggregate()SQL GROUP BY / window functionsSELECT status, COUNT(*) FROM orders GROUP BY status
Replica set (read scaling)Read replicas / streaming replicationpg_basebackup + streaming replication
Decimal128NUMERICprice NUMERIC(12,2)
ISODate()TIMESTAMPTZcreated_at TIMESTAMPTZ DEFAULT NOW()
JSON_TABLE() (PG 17+)Explode JSON arrays to rowsSELECT * FROM JSON_TABLE(doc, '$.items[*]' COLUMNS ...)
UUID v7 (PG 18+)Time-ordered UUIDsid UUID DEFAULT uuidv7()

Decision Tree

START
├── Is downtime acceptable during migration?
│   ├── YES → Use mongoexport + ETL script + COPY (simplest approach)
│   └── NO ↓
├── Need real-time sync during cutover?
│   ├── YES → Use AWS DMS with CDC or Debezium + Kafka pipeline
│   └── NO ↓
├── Data volume > 100 GB?
│   ├── YES → Use parallel export (mongodump --numParallelCollections) + batch COPY
│   └── NO ↓
├── Documents have highly variable schemas?
│   ├── YES → Hybrid approach: relational columns for common fields + JSONB for variable
│   └── NO ↓
├── Deep nesting (>3 levels)?
│   ├── YES → Store as JSONB, add GIN indexes for query paths
│   └── NO ↓
└── DEFAULT → Fully normalize into relational tables with foreign keys

Step-by-Step Guide

1. Audit MongoDB collections and plan the schema

Analyze every collection's document structure, field types, nesting depth, and access patterns. This determines which fields to normalize into columns vs. store as JSONB. [src4, src5]

# List all collections and their document counts
mongosh --eval 'db.getCollectionNames().forEach(c => print(c, db[c].countDocuments()))'

# Sample a collection's schema
mongosh --eval 'db.users.aggregate([{$sample: {size: 100}}]).forEach(d => printjson(Object.keys(d)))'

# Detect field type variance (critical for schema design)
mongosh --eval '
  db.users.aggregate([
    { $sample: { size: 5000 } },
    { $project: { _id: 0 } },
    { $addFields: { _keys: { $objectToArray: "$$ROOT" } } },
    { $unwind: "$_keys" },
    { $group: { _id: { field: "$_keys.k", type: { $type: "$_keys.v" } }, count: { $sum: 1 } } },
    { $sort: { "_id.field": 1 } }
  ])'

Verify: You have a list of all collections with field names, types, and nesting depth. Create an ER diagram for the target PostgreSQL schema before writing any migration code.

2. Create the PostgreSQL schema

Design tables based on your audit. Use a hybrid approach: relational columns for frequently queried fields, JSONB for variable or deeply nested data. [src1, src2]

-- PostgreSQL 18+: uuidv7() available for time-ordered UUIDs
-- PostgreSQL 13+: gen_random_uuid() is built-in
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    mongo_id VARCHAR(24),
    email TEXT NOT NULL UNIQUE,
    name TEXT NOT NULL,
    status TEXT DEFAULT 'active',
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ,
    metadata JSONB DEFAULT '{}'
);

CREATE TABLE addresses (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    street TEXT, city TEXT, state TEXT, zip TEXT,
    country TEXT DEFAULT 'US',
    is_primary BOOLEAN DEFAULT false
);

CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_status ON users (status);
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
CREATE INDEX idx_users_created ON users (created_at DESC);
CREATE INDEX idx_addresses_user ON addresses (user_id);

Verify: \dt shows all tables. \d users shows correct column types and constraints.

3. Export data from MongoDB

Use mongoexport for small-to-medium datasets or mongodump for large binary exports. JSON format preserves nested structure better than CSV. [src4]

# Export as JSON (one document per line)
mongoexport --db=myapp --collection=users --type=json --out=users.json

# For large collections, export in parallel
mongodump --db=myapp --numParallelCollections=4 --out=./dump/

# Verify export
wc -l users.json

Verify: Line count in exported JSON matches document count in MongoDB.

4. Transform and load data with a migration script

Write a script that reads MongoDB JSON, transforms documents to match the PostgreSQL schema, and inserts using batch operations. [src4, src5]

import json, psycopg2
conn = psycopg2.connect(host="localhost", dbname="myapp", user="postgres")
cur = conn.cursor()
BATCH_SIZE = 1000
batch = []

with open("users.json", "r") as f:
    for i, line in enumerate(f, 1):
        doc = json.loads(line)
        mongo_id = doc["_id"]["$oid"] if isinstance(doc["_id"], dict) else str(doc["_id"])
        created = doc.get("createdAt", {}).get("$date") if isinstance(doc.get("createdAt"), dict) else doc.get("createdAt")
        metadata = {k: v for k, v in doc.items() if k not in ("_id","email","name","status","createdAt")}
        batch.append((mongo_id, doc.get("email",""), doc.get("name",""),
                       doc.get("status","active"), created, json.dumps(metadata)))
        if i % BATCH_SIZE == 0:
            cur.executemany("""
                INSERT INTO users (mongo_id, email, name, status, created_at, metadata)
                VALUES (%s, %s, %s, %s, %s, %s::jsonb)
                ON CONFLICT (email) DO UPDATE SET name=EXCLUDED.name, metadata=EXCLUDED.metadata
            """, batch)
            batch.clear()
            conn.commit()
if batch:
    cur.executemany("""INSERT INTO users (mongo_id,email,name,status,created_at,metadata) VALUES (%s,%s,%s,%s,%s,%s::jsonb) ON CONFLICT (email) DO NOTHING""", batch)
    conn.commit()
cur.close(); conn.close()

Verify: SELECT COUNT(*) FROM users; matches the MongoDB document count.

5. Migrate indexes and validate queries

Recreate MongoDB indexes as PostgreSQL equivalents. Test every critical query path against the new schema. [src1, src2]

-- Full-text search (replaces MongoDB text index)
ALTER TABLE products ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,''))) STORED;
CREATE INDEX idx_products_search ON products USING GIN (search_vector);

-- GIN index on JSONB (replaces MongoDB flexible queries)
CREATE INDEX idx_events_metadata ON events USING GIN (metadata);

-- PostgreSQL 17+: Use JSON_TABLE for analytics on JSONB arrays
SELECT jt.sku, SUM(jt.qty) AS total
FROM orders, JSON_TABLE(items, '$[*]' COLUMNS (sku TEXT PATH '$.sku', qty INT PATH '$.qty')) AS jt
GROUP BY jt.sku;

Verify: EXPLAIN ANALYZE on your most common queries confirms index usage.

6. Set up dual-write for zero-downtime cutover (optional)

For production systems, implement a dual-write pattern: writes go to both databases, reads shift gradually to PostgreSQL. [src3, src4, src8]

// Node.js dual-write middleware
const { MongoClient } = require('mongodb');
const { Pool } = require('pg');

async function dualWrite(collection, pgTable, document, pgTransform) {
    const pgRow = pgTransform(document);
    // PostgreSQL is source of truth
    const pgResult = await pg.query(
        `INSERT INTO ${pgTable} (${Object.keys(pgRow).join(', ')})
         VALUES (${Object.keys(pgRow).map((_, i) => `$${i+1}`).join(', ')})
         RETURNING id`,
        Object.values(pgRow)
    );
    try {
        await mongoDb.collection(collection).insertOne({
            ...document, _pg_id: pgResult.rows[0].id
        });
    } catch (err) {
        console.error(`MongoDB shadow write failed: ${err.message}`);
    }
    return pgResult.rows[0];
}

Verify: Spot-check 100 records — data in both databases should match.

7. Validate data integrity and cut over

Run full data validation before decommissioning MongoDB. Compare record counts, checksums on critical fields, and test every query path. [src5, src8]

-- Count comparison
SELECT 'users' AS table_name, COUNT(*) FROM users
UNION ALL SELECT 'orders', COUNT(*) FROM orders;

-- Checksum critical fields
SELECT MD5(STRING_AGG(email || name || status, '' ORDER BY mongo_id)) AS checksum FROM users;

-- Verify no FK orphans
SELECT COUNT(*) FROM addresses a
LEFT JOIN users u ON a.user_id = u.id
WHERE u.id IS NULL;  -- Should return 0

Verify: All counts match. FK orphan check returns 0. Application test suite passes against PostgreSQL.

Code Examples

Python: Full collection migration with progress tracking

# Input:  MongoDB connection string + collection name
# Output: Data migrated to PostgreSQL table with JSONB hybrid schema

import json, time, psycopg2
from pymongo import MongoClient    # pymongo==4.8.0

mongo = MongoClient("mongodb://localhost:27017")
pg_conn = psycopg2.connect("dbname=myapp user=postgres host=localhost")
pg_cur = pg_conn.cursor()
BATCH_SIZE = 5000

def migrate_collection(db_name, coll_name, pg_table, field_map):
    coll = mongo[db_name][coll_name]
    total = coll.estimated_document_count()
    start = time.time()
    batch = []
    columns = [pg_col for pg_col, _ in field_map.values()] + ["metadata"]
    placeholders = ", ".join(["%s"] * len(columns))
    sql = f"INSERT INTO {pg_table} ({', '.join(columns)}) VALUES ({placeholders}) ON CONFLICT DO NOTHING"

    for i, doc in enumerate(coll.find().batch_size(BATCH_SIZE), 1):
        row = [transform(doc.get(field)) for field, (_, transform) in field_map.items()]
        extra = {k: v for k, v in doc.items() if k not in set(field_map) | {"_id"}}
        row.append(json.dumps(extra, default=str))
        batch.append(tuple(row))
        if len(batch) >= BATCH_SIZE:
            pg_cur.executemany(sql, batch)
            pg_conn.commit()
            print(f"  {i}/{total} ({i*100//total}%) — {i/(time.time()-start):.0f} docs/sec")
            batch.clear()
    if batch:
        pg_cur.executemany(sql, batch)
        pg_conn.commit()
    print(f"Done: {total} docs in {time.time()-start:.1f}s")

migrate_collection("myapp", "users", "users", {
    "email":     ("email",      lambda v: v),
    "name":      ("name",       lambda v: v or ""),
    "status":    ("status",     lambda v: v or "active"),
    "createdAt": ("created_at", lambda v: v.isoformat() if v else None),
})

Node.js: Streaming migration for large collections

// Input:  MongoDB collection with millions of documents
// Output: Data streamed into PostgreSQL using COPY protocol

const { MongoClient } = require('mongodb');     // [email protected]
const { Pool } = require('pg');                  // [email protected]
const { pipeline } = require('stream/promises');
const { Transform } = require('stream');
const copyFrom = require('pg-copy-streams').from; // [email protected]

async function streamMigrate(dbName, collName, pgTable, transformDoc) {
    const mongo = new MongoClient('mongodb://localhost:27017');
    const pg = new Pool({ connectionString: 'postgresql://postgres:secret@localhost/myapp' });
    await mongo.connect();
    const collection = mongo.db(dbName).collection(collName);
    let count = 0;
    const pgClient = await pg.connect();
    const copyStream = pgClient.query(
        copyFrom(`COPY ${pgTable} FROM STDIN WITH (FORMAT csv, NULL '\\N')`)
    );
    const mongoStream = collection.find().stream();
    const transformer = new Transform({
        objectMode: true,
        transform(doc, _enc, cb) {
            count++;
            const row = transformDoc(doc);
            const csv = row.map(f => {
                if (f == null) return '\\N';
                const s = String(f);
                return s.includes(',') || s.includes('"') || s.includes('\n')
                    ? `"${s.replace(/"/g, '""')}"` : s;
            }).join(',') + '\n';
            cb(null, csv);
        }
    });
    await pipeline(mongoStream, transformer, copyStream);
    console.log(`Migrated ${count} rows into ${pgTable}`);
    pgClient.release(); await pg.end(); await mongo.close();
}

streamMigrate('myapp', 'orders', 'orders', (doc) => [
    doc._id.toString(), doc.userId?.toString(), doc.total || 0,
    doc.status || 'pending', doc.createdAt?.toISOString(), JSON.stringify(doc.items)
]);

SQL: Query translation examples (MongoDB shell to PostgreSQL)

-- MongoDB: db.orders.find({status: "shipped", total: {$gte: 100}}).sort({createdAt: -1}).limit(20)
SELECT * FROM orders WHERE status = 'shipped' AND total >= 100 ORDER BY created_at DESC LIMIT 20;

-- MongoDB: db.orders.aggregate([{$group: {_id: "$status", count: {$sum: 1}, avg: {$avg: "$total"}}}])
SELECT status, COUNT(*) AS count, AVG(total) AS avg_total FROM orders GROUP BY status;

-- MongoDB: db.users.find({"address.city": "New York"})
-- Normalized:
SELECT u.* FROM users u JOIN addresses a ON a.user_id = u.id WHERE a.city = 'New York';
-- JSONB:
SELECT * FROM users WHERE metadata->>'city' = 'New York';

-- MongoDB: db.users.find({tags: {$in: ["premium", "vip"]}})
-- JSONB array:
SELECT * FROM users WHERE metadata->'tags' ?| ARRAY['premium', 'vip'];

-- MongoDB: $unwind + $group (average of nested array)
SELECT id, AVG((review->>'rating')::numeric) AS avg_rating
FROM products, jsonb_array_elements(reviews) AS review GROUP BY id;

Anti-Patterns

Wrong: Dumping entire documents into a single JSONB column

-- ❌ BAD — stores entire MongoDB document as one JSONB blob
CREATE TABLE users (id SERIAL PRIMARY KEY, doc JSONB);
INSERT INTO users (doc) VALUES ('{"email":"[email protected]","name":"Alice","status":"active"}'::jsonb);
-- No constraints, no type safety, every query needs JSONB operators

Correct: Hybrid schema with relational columns for common fields

-- ✅ GOOD — frequently queried fields are real columns with types and constraints
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT NOT NULL UNIQUE,
    name TEXT NOT NULL,
    status TEXT DEFAULT 'active' CHECK (status IN ('active','inactive','suspended')),
    created_at TIMESTAMPTZ NOT NULL,
    metadata JSONB DEFAULT '{}'  -- only variable fields here
);

Wrong: Using VARCHAR(24) to store ObjectIds as primary keys

-- ❌ BAD — ObjectIds make poor PostgreSQL primary keys
CREATE TABLE users (id VARCHAR(24) PRIMARY KEY, email TEXT);
-- 24-char string comparisons are slower than UUID, no built-in generation

Correct: Use UUID with a mongo_id reference column

-- ✅ GOOD — native UUID primary key + reference column for traceability
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    mongo_id VARCHAR(24),  -- for cross-referencing during migration
    email TEXT NOT NULL UNIQUE
);
CREATE INDEX idx_users_mongo_id ON users (mongo_id);  -- drop post-migration

Wrong: Normalizing every nested field into separate tables

-- ❌ BAD — over-normalization of rarely-queried nested data
CREATE TABLE user_preferences (id UUID, user_id UUID, key TEXT, value TEXT);
CREATE TABLE user_notification_settings (id UUID, user_id UUID, channel TEXT, enabled BOOLEAN);
CREATE TABLE user_ui_settings (id UUID, user_id UUID, theme TEXT, language TEXT);
-- 4+ JOINs to reconstruct a single user profile

Correct: Use JSONB for infrequently queried nested data

-- ✅ GOOD — one JSONB column for settings, no extra tables
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT NOT NULL UNIQUE,
    preferences JSONB DEFAULT '{"notifications":{"email":true},"ui":{"theme":"light"}}'
);
-- Still queryable: SELECT * FROM users WHERE preferences->'ui'->>'theme' = 'dark';

Wrong: Inserting MongoDB arrays as comma-separated strings

# ❌ BAD — CSV strings lose structure and queryability
tags_str = ",".join(doc.get("tags", []))
cur.execute("INSERT INTO users (tags) VALUES (%s)", (tags_str,))

Correct: Use JSONB arrays or a proper junction table

# ✅ GOOD — preserve array structure in JSONB
import json
tags = doc.get("tags", [])
cur.execute("INSERT INTO users (tags) VALUES (%s::jsonb)", (json.dumps(tags),))
# Query: SELECT * FROM users WHERE tags ? 'premium'

Common Pitfalls

Diagnostic Commands

# Check MongoDB collection stats before migration
mongosh --eval 'db.users.stats()' | grep -E 'count|size|storageSize|nindexes'

# Export MongoDB collection to JSON
mongoexport --db=myapp --collection=users --type=json --out=users.json

# Verify export line count
wc -l users.json

# Check PostgreSQL table after migration
psql -c "SELECT COUNT(*) AS row_count FROM users;"
psql -c "SELECT pg_size_pretty(pg_total_relation_size('users')) AS table_size;"

# Verify JSONB data is valid
psql -c "SELECT COUNT(*) FROM users WHERE metadata IS NOT NULL AND jsonb_typeof(metadata) = 'object';"

# Check index usage
psql -c "SELECT indexrelname, idx_scan, idx_tup_read FROM pg_stat_user_indexes WHERE schemaname = 'public' ORDER BY idx_scan DESC;"

# Compare record counts
mongosh --eval 'db.users.countDocuments()'
psql -c "SELECT COUNT(*) FROM users;"

# Detect orphaned foreign keys
psql -c "SELECT COUNT(*) FROM addresses a LEFT JOIN users u ON a.user_id = u.id WHERE u.id IS NULL;"

# Check for NULL in required fields post-migration
psql -c "SELECT COUNT(*) FROM users WHERE email IS NULL OR name IS NULL;"

Version History & Compatibility

ComponentVersionStatusMigration Notes
MongoDB 8.0CurrentSupportedFull mongoexport support; queryable encryption not supported by DMS; AWS DMS 3.5.4+
MongoDB 7.0SupportedAWS DMS 3.5.4+Standard migration path
MongoDB 6.0SupportedAWS DMS 3.5.2+Timeseries collections not supported by DMS
MongoDB 5.0SupportedAWS DMS 3.5.1+Live resharding not supported by DMS
MongoDB 4.xLegacyAWS DMS 3.4.5+Most documented migration path
PostgreSQL 18Current (Sep 2025)All featuresAsync I/O (2–3x perf), UUID v7, enhanced RETURNING clause
PostgreSQL 17SupportedAll featuresJSON_TABLE() for analytics, improved VACUUM
PostgreSQL 16SupportedAll featuresImproved JSONB performance
PostgreSQL 14–15SupportedJSONB subscripting (PG 14+)Use jsonb_set() for older versions

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Need ACID transactions across related documentsSchema is truly dynamic with no common fieldsKeep MongoDB or use DynamoDB
Want SQL JOINs for complex reporting/analyticsSub-millisecond reads on simple key-value lookupsRedis or MongoDB with read replicas
Regulatory requirements demand strong data integrityHorizontal sharding across 10+ nodes is requiredMongoDB Atlas, CockroachDB, or Citus
Team knows SQL better than MongoDB aggregationData is primarily geospatial with GeoJSONMongoDB or PostGIS
PostgreSQL JSONB covers your schema flexibility needsWrite volume exceeds 100K inserts/sec sustainedMongoDB or TimescaleDB
Want a single database for relational + document dataApplication is stable on MongoDB with no pain pointsDon't migrate for migration's sake

Important Caveats

Related Units