Data

CDC Patterns in 2026

Ravinder·March 12, 2026·8 min read

DataCDCDebeziumPostgresEvent-Driven

Change Data Capture should be a solved problem by now. The tooling has matured, Postgres logical replication is stable, and Debezium has been production-hardened for years. Yet teams still fall into the same traps: dual-write bugs that corrupt state, schema changes that silently break pipelines, and the "exactly-once" promise that turns out to be at-least-once in disguise.

This post is a practitioner's guide to CDC in 2026 — what works, what doesn't, and how to build pipelines that survive schema evolution and production chaos.

The Three Patterns, Honestly Ranked

There are three common patterns for propagating database changes to downstream systems. They're not equally good.

graph TD subgraph "Pattern 1: Dual Write" DW_APP[Application] -->|write| DW_DB[(Database)] DW_APP -->|write| DW_MQ[Message Queue] DW_DB -.->|"inconsistent\non partial failure"| DW_FAIL[Data Loss / Divergence] end subgraph "Pattern 2: Outbox" OB_APP[Application] -->|write + outbox row\nin one transaction| OB_DB[(Database\n+ outbox table)] OB_DB -->|CDC reads outbox| OB_CDC[CDC Connector] OB_CDC --> OB_MQ[Message Queue] end subgraph "Pattern 3: Native CDC" NC_DB[(Database)] -->|WAL / redo log| NC_CDC[Debezium / pgoutput] NC_CDC --> NC_MQ[Message Queue] end

Dual-write is the pattern everyone starts with and regrets. Write to the database, then write to Kafka (or Redis, or an API). If the second write fails, your systems diverge and you have no recovery path that doesn't involve manual intervention or accepting data loss. Under load, partial failures are not rare events.

The outbox pattern fixes dual-write by using the database as the source of truth for events. The application writes its main record and an outbox row in a single transaction. CDC reads the outbox table and publishes to the message queue. Atomicity is guaranteed by the database; delivery is guaranteed by CDC.

Native CDC on the main table is simpler but requires more discipline. Every change to your data model is now a public contract. It works well when you control both producer and consumer, and poorly when the data model evolves faster than consumers.

The right choice in most cases: outbox pattern with Debezium.

Postgres Logical Replication: What's Actually Happening

Debezium's Postgres connector uses logical replication — specifically the pgoutput or wal2json output plugin. Understanding this mechanism prevents a category of operational surprises.

Postgres WAL (Write-Ahead Log) records every change as a physical operation. Logical decoding transforms these into a logical stream of row-level changes (INSERT, UPDATE, DELETE), which is what Debezium consumes.

-- Create a replication slot (Debezium does this for you, but knowing the mechanics helps)
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');
 
-- Check replication slot lag (critical to monitor)
SELECT
  slot_name,
  plugin,
  database,
  active,
  pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes,
  confirmed_flush_lsn
FROM pg_replication_slots;

The replication slot is the most important operational concern. It holds a reference to the WAL position that the consumer has confirmed. If Debezium falls behind — due to downtime, consumer lag, or slow processing — WAL files accumulate and disk space fills up. Postgres cannot delete WAL segments ahead of the replication slot's confirmed LSN.

Monitor replication slot lag. Alert when it exceeds a threshold that would cause your disk to fill before your team wakes up. This is not optional.

-- Alert rule: lag > 5 GB warrants immediate attention
SELECT
  slot_name,
  round(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) / 1e9, 2) AS lag_gb
FROM pg_replication_slots
WHERE active = true;

Debezium Configuration That Actually Matters

Most Debezium tutorials show the minimal configuration. Here's the configuration that matters for production:

{
  "name": "orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres-primary",
    "database.port": "5432",
    "database.user": "debezium_user",
    "database.password": "${file:/secrets/db.properties:password}",
    "database.dbname": "orders_db",
    "database.server.name": "orders",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_orders",
    "publication.name": "debezium_publication",
    "table.include.list": "public.orders,public.outbox_events",
 
    "heartbeat.interval.ms": "10000",
    "heartbeat.action.query": "UPDATE debezium_heartbeat SET last_heartbeat = NOW() WHERE id = 1",
 
    "snapshot.mode": "initial",
    "snapshot.locking.mode": "none",
 
    "transforms": "unwrap,route",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.drop.tombstones": "false",
    "transforms.unwrap.delete.handling.mode": "rewrite",
    "transforms.route.type": "io.debezium.transforms.outbox.EventRouter",
 
    "tombstones.on.delete": "true",
    "decimal.handling.mode": "string",
    "time.precision.mode": "connect"
  }
}

Key settings explained:

heartbeat.interval.ms + heartbeat.action.query: Without heartbeats, an idle table means no WAL activity, which means Debezium doesn't advance its confirmed LSN, which means slot lag grows even when there's nothing to process. The heartbeat query touches a dedicated table to generate WAL activity.
snapshot.locking.mode: none: Avoids table locks during initial snapshot. Required for production databases.
tombstones.on.delete: true: Kafka compaction requires tombstone records (null-value messages) for deletes.
decimal.handling.mode: string: Avoids floating-point precision loss for NUMERIC/DECIMAL columns.

The Outbox Pattern: Implementation

The outbox pattern requires careful schema design. The outbox table is a contract — its structure affects your event schema.

-- Outbox table schema
CREATE TABLE outbox_events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  aggregate_type TEXT NOT NULL,  -- 'order', 'user', etc.
  aggregate_id TEXT NOT NULL,    -- the entity's ID
  event_type TEXT NOT NULL,      -- 'OrderPlaced', 'OrderShipped'
  payload JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  processed_at TIMESTAMPTZ      -- set by cleanup job, nullable
);
 
CREATE INDEX outbox_events_created_at ON outbox_events (created_at)
  WHERE processed_at IS NULL;

Application code writes to both tables in a single transaction:

import psycopg2
import uuid
import json
from datetime import datetime
 
def place_order(conn, order_data: dict) -> str:
    order_id = str(uuid.uuid4())
 
    with conn.cursor() as cur:
        # Both writes in a single transaction — atomicity guaranteed
        cur.execute(
            """INSERT INTO orders (id, customer_id, total, status, created_at)
               VALUES (%s, %s, %s, 'pending', NOW())""",
            (order_id, order_data['customer_id'], order_data['total'])
        )
 
        cur.execute(
            """INSERT INTO outbox_events
               (aggregate_type, aggregate_id, event_type, payload)
               VALUES ('order', %s, 'OrderPlaced', %s)""",
            (order_id, json.dumps({
                'order_id': order_id,
                'customer_id': order_data['customer_id'],
                'total': str(order_data['total']),  # string for decimal safety
                'timestamp': datetime.utcnow().isoformat(),
            }))
        )
 
    conn.commit()
    return order_id

Debezium's EventRouter transform reads the outbox table and routes messages to Kafka topics based on aggregate_type. A row with aggregate_type = 'order' goes to the outbox.event.order topic.

Schema Evolution: The Exactly-Once Illusion

Schema changes are where CDC pipelines quietly break. A column rename or type change on the producer side can cause consumer deserialization failures — and depending on your error handling, this can mean silent data loss or a consumer that's stuck forever.

The rules for safe schema evolution with CDC:

SAFE (backward compatible):
  + Add a nullable column
  + Add a column with a default value
  + Widen a type (int → bigint)
 
DANGEROUS (breaking):
  - Rename a column
  - Remove a column consumers depend on
  - Narrow a type (bigint → int)
  - Change semantics of an existing column

The recommended approach: use a schema registry (Confluent Schema Registry or Apicurio) with Avro or Protobuf. This enforces compatibility at publish time, not at consumer failure time.

from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
from confluent_kafka import SerializingProducer
 
schema_str = """
{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.example.orders",
  "fields": [
    {"name": "order_id", "type": "string"},
    {"name": "customer_id", "type": "string"},
    {"name": "total", "type": "string"},
    {"name": "currency", "type": ["null", "string"], "default": null}
  ]
}
"""
 
schema_registry_client = SchemaRegistryClient({'url': 'http://schema-registry:8081'})
avro_serializer = AvroSerializer(schema_registry_client, schema_str)
 
producer = SerializingProducer({
    'bootstrap.servers': 'kafka:9092',
    'value.serializer': avro_serializer,
})

The exactly-once illusion: Debezium guarantees at-least-once delivery. If the connector restarts after publishing a message but before committing the offset, the message will be published again. Your consumers must be idempotent — processing the same event twice must produce the same result as processing it once.

The common idempotency approach: store the event ID and reject duplicates.

-- Idempotent consumer: track processed event IDs
CREATE TABLE processed_events (
  event_id UUID PRIMARY KEY,
  processed_at TIMESTAMPTZ DEFAULT NOW()
);
 
-- In your consumer:
-- BEGIN
-- INSERT INTO processed_events (event_id) VALUES ($1) ON CONFLICT DO NOTHING
-- RETURNING event_id
-- If no row returned: event already processed, skip
-- Otherwise: apply the event
-- COMMIT

Operational Runbook: What to Monitor

flowchart LR subgraph Postgres WAL[WAL Disk Usage] SLOT[Replication Slot Lag] CONN[Replication Connections] end subgraph Debezium LAG[Consumer Lag] ERR[Error Rate] SNAP[Snapshot Status] end subgraph Kafka KLAG[Consumer Group Lag] KTHROUGHPUT[Topic Throughput] end WAL --> ALERT1[Alert: Disk > 70%] SLOT --> ALERT2[Alert: Lag > 5 GB] LAG --> ALERT3[Alert: Lag growing] ERR --> ALERT4[Alert: Any errors] KLAG --> ALERT5[Alert: Consumer falling behind]

Critical metrics to instrument from day one:

pg_replication_slots.lag_bytes — disk accumulation rate
Debezium's milliseconds.behind.master (renamed to MilliSecondsBehindSource in newer versions)
Kafka consumer group lag on outbox topics
Debezium connector status (RUNNING vs PAUSED vs FAILED) — poll every 60s via Kafka Connect REST API

# Check connector status via Kafka Connect REST API
curl -s http://kafka-connect:8083/connectors/orders-cdc/status | jq .
 
# Force restart a failed task
curl -X POST http://kafka-connect:8083/connectors/orders-cdc/tasks/0/restart

Key Takeaways

Dual-write is not a CDC pattern — it's a race condition waiting to manifest; the outbox pattern is the correct replacement because it uses the database's atomicity guarantee instead of fighting it.
Replication slot lag is the most dangerous operational metric in a Debezium deployment — alert on it aggressively, because unchecked lag will fill your disk and cause Postgres downtime.
Debezium heartbeats are mandatory for tables with low write volume; without them, WAL cannot advance past idle periods and slot lag grows silently.
Schema evolution must be governed by a schema registry with compatibility enforcement — consumer failures at deserialization time are the worst time to discover a breaking schema change.
Exactly-once is a misleading label; Debezium delivers at-least-once, and your consumers must be idempotent to handle duplicate events correctly.
Native CDC on the main table is fine when the data model is stable; use the outbox pattern when you need to decouple the internal data model from the public event schema.