Schema Migrations on a Billion-Row Table
Altering a table with a billion rows is not like altering a table with a million rows, except slower. The failure modes are qualitatively different. A table lock that would be annoying at a million rows will take down your application for hours at a billion. A migration that works in staging — where you have 50,000 rows — will silently destroy production.
The mistake most teams make is treating schema migrations as a deployment concern handled by their ORM's migration runner. At scale, they are an infrastructure operation that requires planning, tooling, monitoring, and an abort plan.
Why Naive ALTER TABLE Fails
Most databases acquire a table-level lock for schema changes. In Postgres, ALTER TABLE ADD COLUMN DEFAULT NULL was made lock-free in version 11 — but ALTER TABLE ADD COLUMN DEFAULT 'some_value' still requires a full table rewrite, which locks for the duration. In MySQL prior to 8.0, most ALTER TABLE statements lock the table for the entire rewrite duration.
At a billion rows, a table rewrite can take 4–12 hours depending on row width and disk speed. A 4-hour table lock is an outage.
Even "online" DDL has gotchas:
- Postgres concurrent index builds (
CREATE INDEX CONCURRENTLY) do not lock writes but will fail if interrupted, leaving an invalid index behind. - MySQL online DDL queues schema changes, but the queue has a timeout. Long-running transactions prevent the DDL from completing.
- Both engines can saturate disk I/O during rewrites, degrading query performance even without explicit locks.
The Tool Landscape
For MySQL, two tools dominate:
pt-online-schema-change (pt-osc) — from Percona Toolkit. Creates a shadow table, copies rows in batches, uses triggers to mirror live writes, then atomically swaps tables. Mature, well-understood, production-proven. The catch: triggers add write overhead. On very high write-rate tables (>50K writes/sec), trigger overhead can be significant.
gh-ost — from GitHub. Replicates changes from the MySQL binlog instead of using triggers. Lower write overhead, better control over throttling, and pauseable mid-migration. GitHub open-sourced it after using it on their own production tables. For high-write-rate tables, gh-ost is the better choice.
For Postgres, the story is different. PostgreSQL's DDL capabilities have improved significantly:
ADD COLUMN DEFAULT NULL— instant since Postgres 11.ADD COLUMN DEFAULT <constant>— table rewrite in older versions; in Postgres 11+ constants are also stored as column defaults without rewriting.ALTER COLUMN TYPE— almost always requires a table rewrite. No native online solution.- Index creation —
CREATE INDEX CONCURRENTLYavoids write locks but takes longer.
For complex Postgres migrations, the patterns are: expand-contract, backfill scripts, and tools like pgslice (table partitioning) or pganonymize.
gh-ost on MySQL: The Mechanics
The critical insight: gh-ost reads binlog events and applies them to the shadow table, so the shadow table stays nearly in sync with the source during the copy phase. The final lock window is typically under 2 seconds — just long enough to catch up the final binlog events and rename.
# gh-ost invocation for adding a column to a billion-row orders table
gh-ost \
--host=mysql-primary.internal \
--port=3306 \
--user=ghost_user \
--password="${GHOST_PASSWORD}" \
--database=production \
--table=orders \
--alter="ADD COLUMN fulfillment_center_id BIGINT UNSIGNED DEFAULT NULL AFTER warehouse_id, ADD INDEX idx_fulfillment_center (fulfillment_center_id)" \
--chunk-size=1000 \
--max-load="Threads_running=50" \
--critical-load="Threads_running=100" \
--critical-load-interval-millis=2000 \
--throttle-control-replicas="replica1.internal,replica2.internal" \
--max-lag-millis=5000 \
--initially-drop-ghost-table \
--initially-drop-old-table \
--ok-to-drop-table \
--executeKey flags to understand:
--max-load: pause copying ifThreads_runningexceeds 50. This is your primary throttle.--max-lag-millis: pause if replication lag exceeds 5 seconds on any monitored replica. Critical for protecting replica read workloads.--chunk-size: rows copied per batch. 1000 is conservative; you can go to 2000–5000 on fast disks with low contention.- Without
--execute, gh-ost runs in dry-run mode — verify this first.
Postgres: The Expand-Contract Pattern
Postgres has no gh-ost equivalent. The standard approach for complex schema changes is expand-contract (also called parallel change):
Phase 1: Expand. Add the new column (or table) without removing the old. Both old and new exist simultaneously.
-- Step 1: Add column with no default (instant in Postgres 11+)
ALTER TABLE orders ADD COLUMN fulfillment_center_id BIGINT;
-- Step 2: Backfill in batches (see backfill strategy below)
-- Step 3: Add NOT NULL constraint only after backfill is complete
-- Use NOT VALID to skip scanning existing rows — validates only new rows
ALTER TABLE orders ADD CONSTRAINT orders_fulfillment_center_id_not_null
CHECK (fulfillment_center_id IS NOT NULL) NOT VALID;
-- Step 4: Validate the constraint (separate transaction, no lock on writes)
ALTER TABLE orders VALIDATE CONSTRAINT orders_fulfillment_center_id_not_null;The NOT VALID + VALIDATE CONSTRAINT split is the key Postgres technique. NOT VALID adds the constraint instantly (no full-table scan). VALIDATE CONSTRAINT scans existing rows to confirm compliance — it takes time but only holds a SHARE UPDATE EXCLUSIVE lock, which does not block reads or writes.
Phase 2: Migrate. Update application code to write to both old and new columns. Backfill old rows.
Phase 3: Contract. Once all rows are migrated and application code no longer reads the old column, drop the old column. ALTER TABLE DROP COLUMN in Postgres marks the column as dropped in the catalog and is instant — the space is reclaimed lazily.
Backfill Strategies
Never backfill a billion rows in a single transaction. It will lock the table, consume enormous undo/WAL space, and likely fail partway through.
The correct approach is chunked backfills with cursor-based pagination:
import psycopg2
import time
def backfill_fulfillment_center(conn_str: str, batch_size: int = 5000):
conn = psycopg2.connect(conn_str)
conn.autocommit = False
with conn.cursor() as cur:
# Get total count for progress tracking
cur.execute("SELECT COUNT(*) FROM orders WHERE fulfillment_center_id IS NULL")
total = cur.fetchone()[0]
print(f"Rows to backfill: {total:,}")
processed = 0
last_id = 0
while True:
with conn: # transaction per batch
cur.execute("""
UPDATE orders
SET fulfillment_center_id = warehouse_to_center(warehouse_id)
WHERE id > %s
AND fulfillment_center_id IS NULL
ORDER BY id
LIMIT %s
RETURNING id
""", (last_id, batch_size))
rows = cur.fetchall()
if not rows:
break
last_id = rows[-1][0]
processed += len(rows)
print(f"Progress: {processed:,} / {total:,} ({100*processed/total:.1f}%)")
# Throttle: sleep between batches to avoid I/O saturation
time.sleep(0.1)
conn.close()
print("Backfill complete.")Throttle between batches. The 100ms sleep is not optional. Without throttling, the backfill will saturate disk I/O and degrade application performance. Adjust based on your replication lag: if replicas start falling behind, increase the sleep.
Monitor replication lag during backfill. On Postgres, watch pg_stat_replication.write_lag and replay_lag. On MySQL, watch Seconds_Behind_Master. If lag spikes above your threshold, pause the backfill.
Use the primary key as cursor. ID-based cursors are more reliable than OFFSET-based pagination. OFFSET scans from the beginning of the table for each batch — ID-based cursors use the index.
Index Migrations
Adding an index to a billion-row table is a frequent need and a common source of incidents.
In Postgres:
-- Always use CONCURRENTLY for large tables
CREATE INDEX CONCURRENTLY idx_orders_fulfillment_center
ON orders (fulfillment_center_id)
WHERE fulfillment_center_id IS NOT NULL; -- partial index if appropriateRisks of concurrent index builds:
- If the build is interrupted (process killed, connection dropped), it leaves an invalid index. Check with
SELECT * FROM pg_indexes WHERE indexname = 'idx_orders_fulfillment_center'and look atpg_index.indisvalid. Drop and rebuild invalid indexes. - Long-running transactions can block the final phase of the concurrent build. Check
pg_stat_activityfor long-running queries before starting.
In MySQL with gh-ost, include index creation in the --alter clause. gh-ost creates the index on the shadow table before the swap, so the primary table never takes a lock for index creation.
The Abort Plan
Every migration needs a tested abort plan. "We can always roll back" is not a plan. Write it down:
-
gh-ost abort. Kill the gh-ost process. The shadow table (
_orders_gho) is left behind. The original table is untouched. Clean up:DROP TABLE _orders_gho; DROP TABLE _orders_del; -
Backfill abort. Kill the backfill script. The column has partial data (some rows populated, some NULL). Application code must handle NULL for this column until the backfill is restarted.
-
Constraint abort. If
VALIDATE CONSTRAINTis running and causing load, you can kill it. The constraint remainsNOT VALID. You can drop the constraint without penalty and retry validation during a quieter period. -
Application abort. If the new schema causes application errors post-migration, the fastest rollback is a feature flag disabling the code path that uses the new column — not a database rollback. Database rollbacks are often slower than the original migration.
Document these steps in your runbook before you start the migration. Run a fire drill in staging.
Key Takeaways
- Naive
ALTER TABLEat billion-row scale causes hours-long table locks; treat schema migrations as infrastructure operations, not ORM tasks. - For MySQL, gh-ost is the preferred tool for high-write-rate tables because it reads from binlog instead of using triggers, keeping write overhead low.
- For Postgres, the expand-contract pattern with
ADD CONSTRAINT ... NOT VALIDfollowed byVALIDATE CONSTRAINTlets you add constraints incrementally without blocking reads or writes. - Always backfill in batches using a primary-key cursor with inter-batch throttling; monitor replication lag and pause if it spikes above your threshold.
- Create large indexes with
CONCURRENTLYin Postgres and check for invalid indexes after any interrupted build. - Write and test your abort plan before starting the migration — the fastest rollback in most cases is a feature flag, not a database operation.