How to do zero-downtime database migrations

- Bottom line: Use the expand-contract pattern for zero-downtime schema migrations; pair with versioned migration files and CI validation for safety and reproducibility.

Database schema change best practices

- Bottom line: Use the expand-contract pattern for zero-downtime schema migrations; pair with versioned migration files and CI validation for safety and reproducibility.

Expand and contract migration pattern

- Bottom line: Use the expand-contract pattern for zero-downtime schema migrations; pair with versioned migration files and CI validation for safety and reproducibility.

How to safely alter production database tables

- Bottom line: Use the expand-contract pattern for zero-downtime schema migrations; pair with versioned migration files and CI validation for safety and reproducibility.

Database Migration Strategies

What are the best database migration strategies?

TL;DR

Bottom line: Use the expand-contract pattern for zero-downtime schema migrations; pair with versioned migration files and CI validation for safety and reproducibility.
Key tool/command: flyway migrate / alembic upgrade head / knex migrate:latest
Watch out for: Long-running ALTER TABLE locks on large tables -- use online schema change tools (gh-ost, pt-online-schema-change) for tables over 10M rows.
Works with: PostgreSQL, MySQL, SQL Server, SQLite, and all major ORMs; patterns are engine-agnostic.

Constraints

Never run DDL migrations without a tested rollback script or compensating migration
Always test migrations against production-sized datasets before deploying -- a migration that takes 2 seconds on dev can lock a table for 30 minutes in production
Never mix schema changes (DDL) and data changes (DML) in a single migration file -- they have different rollback semantics
Never drop columns or tables that are still referenced by running application code -- deploy application changes first
For MySQL tables >10M rows, always use gh-ost or pt-online-schema-change instead of direct ALTER TABLE
In PostgreSQL, always use CREATE INDEX CONCURRENTLY instead of CREATE INDEX to avoid blocking writes

Quick Reference

Strategy	Downtime	Reversibility	Complexity	Best For
Versioned migrations (Flyway/Alembic)	Depends on DDL	Forward-only (compensating migration)	Low	Most projects; sequential, auditable changes
State-based (Redgate, SchemaSync)	Depends on diff	Snapshot rollback	Medium	Teams preferring declarative schema definitions
Expand-contract	Zero	Full (old schema remains until contract)	Medium-High	Renaming columns, changing types, splitting tables
Dual-write	Zero	Switch reads back to old table	High	Cross-database migrations, service extractions
Blue-green database	Near-zero	Switch back to blue environment	High	Major version upgrades, engine migrations
Shadow tables	Zero	Drop shadow; keep original	High	Validating new schema under real traffic before cutover
Online schema change (gh-ost)	Near-zero (brief metadata lock)	Drop ghost table	Medium	Large MySQL tables (>10M rows)
pg_repack / pgroll	Zero	Revert migration	Medium	Large PostgreSQL tables needing VACUUM FULL or schema changes

Decision Tree

START
|-- Can you tolerate any downtime?
|   |-- YES (maintenance window available)
|   |   |-- Simple column add/drop?
|   |   |   |-- YES -> Direct ALTER TABLE in a versioned migration
|   |   |   +-- NO -> State-based diff tool or multi-step versioned migrations
|   +-- NO (zero-downtime required) v
|-- Is the table large (>10M rows)?
|   |-- YES
|   |   |-- MySQL?
|   |   |   |-- YES -> gh-ost or pt-online-schema-change
|   |   |   +-- NO (PostgreSQL) -> pg_repack, pgroll, or CREATE INDEX CONCURRENTLY
|   +-- NO (<10M rows) v
|-- What type of schema change?
|   |-- Adding a nullable column -> Direct ALTER TABLE (fast, no lock in PostgreSQL)
|   |-- Renaming a column -> Expand-contract pattern (3-phase)
|   |-- Changing column type -> Expand-contract pattern
|   |-- Dropping a column -> Deploy app changes first, then drop in separate migration
|   |-- Splitting/merging tables -> Dual-write pattern
|   +-- Full engine migration -> Blue-green database deployment
+-- DEFAULT -> Versioned migration file with CI validation

Step-by-Step Guide

1. Set up versioned migration infrastructure

Choose a migration tool that matches your stack and initialize it. Every migration gets a monotonically increasing version number and lives in version control alongside application code. [src5]

# Flyway (Java/.NET/any)
flyway init

# Alembic (Python/SQLAlchemy)
alembic init migrations
alembic revision --autogenerate -m "baseline"

# Knex (Node.js)
npx knex init
npx knex migrate:make baseline

Verify: flyway info / alembic history / npx knex migrate:status -- shows the baseline migration as pending.

2. Write forward-only migration files

Each migration file contains the SQL or code to move the schema forward. Never edit a migration that has already been applied to any shared environment. [src7]

-- V2__add_email_verified_column.sql (Flyway naming convention)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
CREATE INDEX CONCURRENTLY idx_users_email_verified ON users (email_verified);

Verify: Apply to a staging environment with production-sized data and measure execution time.

3. Implement expand-contract for breaking changes

When renaming a column or changing a type, use three separate deployments to avoid downtime. [src1]

-- Migration 1: EXPAND -- add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

-- Migration 2: MIGRATE -- backfill (run in batches for large tables)
UPDATE users SET full_name = first_name || ' ' || last_name
WHERE full_name IS NULL
LIMIT 10000;  -- repeat in batches

-- Migration 3: CONTRACT -- drop old columns (only after all app code updated)
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN last_name;

Verify: Between each phase, confirm SELECT COUNT(*) FROM users WHERE full_name IS NULL returns 0 before proceeding to CONTRACT.

4. Use online schema change for large tables

For MySQL tables with millions of rows, direct ALTER TABLE acquires locks that block reads and writes. Use gh-ost or pt-online-schema-change instead. [src3] [src4]

# gh-ost (GitHub's online schema change -- triggerless, uses binlog)
gh-ost \
  --host=db-primary.example.com \
  --database=myapp \
  --table=orders \
  --alter="ADD COLUMN shipping_status VARCHAR(50) DEFAULT 'pending'" \
  --execute

# pt-online-schema-change (Percona -- trigger-based)
pt-online-schema-change \
  --alter="ADD COLUMN shipping_status VARCHAR(50) DEFAULT 'pending'" \
  D=myapp,t=orders \
  --execute

Verify: SHOW CREATE TABLE orders; -- confirm the new column exists and no locks are held.

5. Validate migrations in CI before production

Add migration validation to your CI pipeline to catch errors before they reach production. [src7]

# In CI pipeline: spin up a fresh DB, apply all migrations, verify
docker run -d --name test-db -e POSTGRES_PASSWORD=test -p 5432:5432 postgres:16
sleep 5
flyway -url=jdbc:postgresql://localhost:5432/postgres -user=postgres -password=test migrate
flyway -url=jdbc:postgresql://localhost:5432/postgres -user=postgres -password=test validate

Verify: CI exits with code 0 and flyway validate reports no pending or failed migrations.

Code Examples

SQL: Zero-downtime column rename (expand-contract)

-- Input:  Table 'products' with column 'price' that needs renaming to 'unit_price'
-- Output: Column renamed with zero application downtime

-- Step 1: EXPAND (Migration V10)
ALTER TABLE products ADD COLUMN unit_price NUMERIC(10,2);

-- Step 2: Dual-write trigger (temporary, remove at CONTRACT)
CREATE OR REPLACE FUNCTION sync_price_columns()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
    NEW.unit_price := COALESCE(NEW.unit_price, NEW.price);
    NEW.price := COALESCE(NEW.price, NEW.unit_price);
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_sync_price
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION sync_price_columns();

-- Step 3: MIGRATE -- backfill existing rows
UPDATE products SET unit_price = price WHERE unit_price IS NULL;

-- Step 4: CONTRACT (Migration V12, after app code is updated)
DROP TRIGGER trg_sync_price ON products;
DROP FUNCTION sync_price_columns();
ALTER TABLE products DROP COLUMN price;

Node.js (Knex): Migration file with rollback

// Input:  Knex migration system configured
// Output: Creates 'audit_logs' table with indexes

// migrations/20260224_create_audit_logs.js
exports.up = function(knex) {
  return knex.schema.createTable('audit_logs', (table) => {
    table.bigIncrements('id').primary();
    table.uuid('user_id').notNullable().references('id').inTable('users');
    table.string('action', 100).notNullable();
    table.jsonb('payload').defaultTo('{}');
    table.timestamp('created_at').defaultTo(knex.fn.now());
    table.index(['user_id', 'created_at'], 'idx_audit_user_time');
    table.index(['action'], 'idx_audit_action');
  });
};

exports.down = function(knex) {
  return knex.schema.dropTableIfExists('audit_logs');
};

Python (Alembic): Batched data migration

# Input:  Alembic environment configured with SQLAlchemy
# Output: Safely backfills 'full_name' from 'first_name' + 'last_name'

"""backfill full_name column"""
revision = 'a1b2c3d4e5f6'
down_revision = '9z8y7x6w5v4u'

from alembic import op
import sqlalchemy as sa

BATCH_SIZE = 5000

def upgrade():
    conn = op.get_bind()
    while True:
        result = conn.execute(sa.text(
            "UPDATE users SET full_name = first_name || ' ' || last_name "
            "WHERE full_name IS NULL LIMIT :batch"
        ), {"batch": BATCH_SIZE})
        if result.rowcount == 0:
            break

def downgrade():
    op.execute("UPDATE users SET full_name = NULL")

Go (goose): Migration with transaction control

// Input:  goose migration framework configured
// Output: Adds 'status' enum column to orders table

// migrations/20260224_add_order_status.go
package migrations

import (
    "database/sql"
    "github.com/pressly/goose/v3"
)

func init() {
    goose.AddMigration(upAddOrderStatus, downAddOrderStatus)
}

func upAddOrderStatus(tx *sql.Tx) error {
    _, err := tx.Exec(`
        CREATE TYPE order_status AS ENUM ('pending','processing','shipped','delivered');
        ALTER TABLE orders ADD COLUMN status order_status DEFAULT 'pending';
    `)
    return err
}

func downAddOrderStatus(tx *sql.Tx) error {
    _, err := tx.Exec(`
        ALTER TABLE orders DROP COLUMN status;
        DROP TYPE order_status;
    `)
    return err
}

Anti-Patterns

Wrong: Running manual DDL directly in production

-- BAD -- ad-hoc DDL with no version tracking, no rollback, no audit trail
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- "I'll just run this real quick in the production console..."

Correct: Use versioned migration files

-- GOOD -- tracked in VCS, reviewed in PR, tested in CI, applied by migration tool
-- V14__add_phone_to_users.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Applied via: flyway migrate (with CI validation gate)

Wrong: Big-bang migration (everything in one file)

-- BAD -- 500-line migration that renames, restructures, backfills, and drops
-- If it fails halfway, you're in an inconsistent state
ALTER TABLE orders RENAME COLUMN price TO unit_price;
ALTER TABLE orders ADD COLUMN currency VARCHAR(3);
UPDATE orders SET currency = 'USD';  -- locks table for 45 minutes on 50M rows
ALTER TABLE orders ALTER COLUMN currency SET NOT NULL;
DROP TABLE legacy_orders;

Correct: Break into small, independent migrations

-- GOOD -- each step is a separate, independently reversible migration

-- V20__add_currency_column.sql
ALTER TABLE orders ADD COLUMN currency VARCHAR(3) DEFAULT 'USD';

-- V21__backfill_currency.sql (run in batches)
-- UPDATE orders SET currency = 'USD' WHERE currency IS NULL LIMIT 10000;

-- V22__set_currency_not_null.sql (only after backfill complete)
ALTER TABLE orders ALTER COLUMN currency SET NOT NULL;

-- V23__drop_legacy_orders.sql (separate deployment)
DROP TABLE IF EXISTS legacy_orders;

Wrong: No rollback plan

# BAD -- forward-only with no escape hatch
def upgrade():
    op.drop_column('users', 'legacy_role')  # data is gone forever

Correct: Always preserve data until verified

# GOOD -- rename instead of drop; keep data recoverable
def upgrade():
    op.alter_column('users', 'legacy_role',
                    new_column_name='_legacy_role_deprecated')
    # Drop in a future migration after 2 release cycles

def downgrade():
    op.alter_column('users', '_legacy_role_deprecated',
                    new_column_name='legacy_role')

Common Pitfalls

Table locking on large tables: ALTER TABLE ... ADD COLUMN ... DEFAULT value acquires an ACCESS EXCLUSIVE lock in PostgreSQL <11 and MySQL. Fix: In PostgreSQL 11+, ADD COLUMN ... DEFAULT is instant. In MySQL, use gh-ost or pt-online-schema-change. [src3]
Data backfill ordering with foreign keys: Backfilling a column that references another table can fail if the referenced rows don't exist yet. Fix: Disable foreign key checks during backfill or backfill parent tables first. [src2]
Foreign key constraints blocking online schema changes: Both gh-ost and pt-osc have limited support for tables with foreign keys. Fix: Drop FK constraints before running online schema change, then re-add after. [src4]
Migration order conflicts in team environments: Two developers create migrations with the same version number. Fix: Use timestamp-based naming (20260224143000_add_column.sql) instead of sequential integers. [src5]
Running migrations during peak traffic: Even "fast" DDL can cause metadata lock waits if long-running queries are active. Fix: Schedule migrations during low-traffic windows; kill long-running queries before applying DDL. [src7]
Not testing on production-sized data: A migration that runs in 50ms on a dev database with 100 rows can take 40 minutes on a production table with 50M rows. Fix: Maintain a staging environment with production-scale data. [src7]
Mixing DDL and DML in transactions: In MySQL, DDL statements cause an implicit COMMIT of any open transaction, breaking atomicity. Fix: Separate schema changes and data changes into distinct migration files. [src6]

Diagnostic Commands

# Check Flyway migration status
flyway info

# Check Alembic migration history
alembic history --verbose
alembic current

# Check Knex migration status
npx knex migrate:status

# Check for active locks (PostgreSQL)
SELECT pid, relation::regclass, mode, granted
FROM pg_locks WHERE NOT granted;

# Check for long-running queries that may block DDL (PostgreSQL)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '30 seconds';

# Check table size before deciding migration strategy (PostgreSQL)
SELECT pg_size_pretty(pg_total_relation_size('orders'));

# Verify MySQL table structure after migration
SHOW CREATE TABLE orders;

# Check for pending migrations in Flyway schema history
SELECT * FROM flyway_schema_history WHERE success = false;

Version History & Compatibility

Tool	Current Version	Key Feature	Migration Notes
Flyway	10.x	Baseline on migrate, undo (Teams)	Free tier: no undo; use compensating migrations
Alembic	1.13.x	Autogenerate, branching	Pair with SQLAlchemy 2.0+ for async support
Knex	3.x	Promise-based, multi-dialect	Knex 3.x dropped callback API; use async/await
goose	3.x	Go + SQL migrations, no-transaction mode	v3 dropped GOOSE_DRIVER env; use --driver flag
gh-ost	1.1.x	Triggerless, throttle-aware	Requires binlog access (ROW format)
pt-osc	3.6.x	Trigger-based, mature	Conflicts with tables that already have triggers
pgroll	0.x	PostgreSQL-native, versioned rollback	Still pre-1.0; API may change
Liquibase	4.x	Multi-format (XML/YAML/SQL), rollback	Free tier: limited rollback; Pro for full undo

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Evolving schema in a running application	One-time data import/export	ETL tools (dbt, Airbyte)
Team collaboration requiring auditable changes	Prototyping with throwaway databases	Direct DDL in dev environment
Zero-downtime deployments with expand-contract	Migrating between different database engines	AWS DMS, pgloader
Large table alterations (>10M rows)	Small tables in low-traffic applications	Direct ALTER TABLE
CI/CD pipeline integration	Schema managed by ORM auto-sync in dev	ORM auto-migrate (dev only)

Important Caveats

PostgreSQL 11+ handles ADD COLUMN ... DEFAULT without a full table rewrite, but older versions lock the entire table -- always check your PostgreSQL version before assuming instant DDL
MySQL's InnoDB online DDL supports some ALTER TABLE operations without locking, but not all -- CHANGE COLUMN type changes still require a table copy
In MySQL, DDL statements cause an implicit COMMIT of any open transaction -- never place DDL inside an explicit transaction expecting rollback
gh-ost requires binlog in ROW format and SUPER or REPLICATION SLAVE privileges -- verify prerequisites before planning a migration window
CREATE INDEX CONCURRENTLY in PostgreSQL cannot run inside a transaction block -- migration tools must be configured to run this statement outside of a transaction