Event Sourcing: The Four Ways It Goes Wrong
Event sourcing is the only pattern I know that makes teams dramatically overconfident at the start and dramatically regretful six months later. The initial implementation feels clean: everything is an event, events are immutable, you can replay history. Then production teaches you about schema drift at month 3, a GDPR deletion request at month 5, and a 14-hour projection rebuild at month 6.
None of these problems are unsolvable. But each one is easier to mitigate if you design for it before the event store has a billion rows. Here are the four failure modes and what to do about each.
Failure Mode 1: Schema Drift
Events accumulate in an append-only store. Unlike a relational schema where you run a migration and the old shape disappears, events live forever in their original shape. Six months after launch, your OrderCreated event has evolved through seven incompatible versions, and your projection code contains a switch statement that nobody fully understands.
# What schema drift looks like in practice
def apply_order_created(self, event: dict):
version = event.get("schema_version", 1)
if version == 1:
# Original shape: no currency field
amount = event["amount_cents"]
currency = "USD" # assumed
elif version == 2:
# Added currency in v2
amount = event["amount_cents"]
currency = event["currency"]
elif version == 3:
# Renamed amount_cents to total_cents in v3
amount = event["total_cents"]
currency = event["currency"]
elif version == 4:
# Split into subtotal + tax in v4
amount = event["subtotal_cents"] + event["tax_cents"]
currency = event["currency"]
else:
raise UnknownEventVersion(version)This code is a maintenance trap. The right approach is event upcasting: transform old event shapes into the current shape at read time, so projection code only handles one shape.
from typing import Callable
EventUpcaster = Callable[[dict], dict]
UPCASTERS: dict[tuple[str, int], EventUpcaster] = {
("OrderCreated", 1): lambda e: {**e, "currency": "USD", "schema_version": 2},
("OrderCreated", 2): lambda e: {**e, "total_cents": e.pop("amount_cents"), "schema_version": 3},
("OrderCreated", 3): lambda e: {
**e,
"subtotal_cents": e["total_cents"],
"tax_cents": 0, # historical events had no tax breakdown
"schema_version": 4
},
}
def upcast_event(event: dict) -> dict:
"""Apply all upcasters in sequence until current version."""
current_version = CURRENT_SCHEMA_VERSION[event["type"]]
e = dict(event)
while e.get("schema_version", 1) < current_version:
key = (e["type"], e.get("schema_version", 1))
upcaster = UPCASTERS.get(key)
if upcaster is None:
raise MissingUpcaster(key)
e = upcaster(e)
return eRules for managing schema evolution:
- Never change the meaning of an existing field. Add fields; do not rename or reuse them.
- Increment
schema_versionon every structural change. Make it explicit in the event payload, not inferred. - Write an upcaster for every version transition before deploying a schema change.
- Test upcasters against real historical events from production — do not test only against synthetic fixtures.
Failure Mode 2: Projection Rebuild Storms
A projection is derived from replaying all relevant events. When you change the projection logic — a new field, a different aggregation, a bug fix — you must rebuild it by replaying the entire event stream. For a small store, this takes seconds. For a large store, it takes hours.
The failure mode is not slow rebuilds per se. It is a rebuild that competes with the live system for the same database resources, causing production degradation while the rebuild runs. I have seen a rebuild of 50 million events take 14 hours and raise p99 query latency by 40x on the shared database.
Mitigation: rebuild into a shadow store, not in-place.
class ProjectionRebuilder:
"""
Rebuilds a projection into a shadow table, then swaps atomically.
Live read traffic is unaffected during rebuild.
"""
def __init__(self, event_store, read_db):
self.event_store = event_store
self.read_db = read_db
def rebuild(self, projector_class, projection_name: str, batch_size: int = 5000):
shadow = f"{projection_name}_{epoch_seconds()}"
self.read_db.create_table_like(shadow, projection_name)
projector = projector_class(target_table=shadow)
total = 0
for batch in self.event_store.stream_all(batch_size=batch_size):
for raw_event in batch:
event = upcast_event(raw_event)
if projector.handles(event["type"]):
projector.apply(event)
total += len(batch)
if total % 100_000 == 0:
print(f"Rebuilt {total} events...")
# Validate before swap
self._validate_shadow(shadow, projection_name)
# Atomic rename
self.read_db.rename_table(projection_name, f"{projection_name}_old")
self.read_db.rename_table(shadow, projection_name)
print(f"Swap complete. Old table retained as {projection_name}_old for 24h.")
def _validate_shadow(self, shadow: str, live: str):
shadow_count = self.read_db.count(shadow)
live_count = self.read_db.count(live)
if abs(shadow_count - live_count) / max(live_count, 1) > 0.05:
raise ValidationError(
f"Shadow count {shadow_count} diverges from live {live_count} by >5%"
)Additional mitigations:
- Snapshotting. Store periodic aggregate snapshots so replay starts from the snapshot, not event zero. A snapshot every 1,000 events reduces replay time by 3 orders of magnitude for long-lived aggregates.
- Throttled replay. Rate-limit event reads during rebuild to avoid saturating the event store. 5,000 events/second is usually safe; 500,000/second will starve live traffic.
- Catch-up mode. After the shadow rebuild finishes, enter a catch-up mode where you apply events that arrived during the rebuild before swapping. Otherwise, the shadow is stale by hours.
Failure Mode 3: GDPR and the Right to Erasure
GDPR Article 17 requires you to erase personal data on request. An append-only, immutable event store is structurally hostile to erasure. This is not a theoretical concern — regulators have issued fines for "we cannot delete because our architecture is append-only."
There are three approaches, in order of preference:
Approach 1: Crypto-Shredding
Encrypt PII fields in events using a per-subject key. On erasure request, delete the key. The events remain; the PII becomes unreadable ciphertext.
import os
from cryptography.fernet import Fernet
class CryptoShredder:
def __init__(self, key_store):
self.key_store = key_store # e.g., AWS KMS, HashiCorp Vault
def encrypt_pii(self, subject_id: str, data: dict, pii_fields: list[str]) -> dict:
key = self.key_store.get_or_create_key(subject_id)
f = Fernet(key)
result = dict(data)
for field in pii_fields:
if field in result:
result[field] = f.encrypt(str(result[field]).encode()).decode()
result[f"{field}_encrypted"] = True
return result
def erase_subject(self, subject_id: str):
"""
Deletes the encryption key. All PII for this subject
in all historical events becomes permanently unreadable.
"""
self.key_store.delete_key(subject_id)
# Audit the erasure
self.audit_log.write({
"type": "GDPR_ERASURE",
"subject_id": subject_id,
"erased_at": utcnow().isoformat()
})Crypto-shredding preserves event integrity (the event exists, the audit trail is intact) while making PII irrecoverable. This satisfies GDPR in most interpretations, though consult your legal team on jurisdiction-specific requirements.
Approach 2: PII Externalization
Do not store PII in events at all. Store a reference (user ID, hashed identifier) in the event, and maintain a separate PII store. On erasure, delete from the PII store.
# Event contains no PII:
{
"type": "OrderCreated",
"order_id": "ord_123",
"customer_ref": "cust_456", # opaque reference
"total_cents": 9999
}
# PII store (separate, erasable):
# customer_ref → {name, email, address, phone}Projections that need to display customer name join against the PII store at query time. After erasure, those fields return null or a placeholder.
Approach 3: Event Tombstoning (Last Resort)
Write a tombstone event that signals erasure, and filter PII from event replay after the tombstone:
# Tombstone event
{
"type": "SubjectDataErased",
"subject_id": "cust_456",
"erased_fields": ["email", "name", "address"],
"erased_at": "2025-06-29T00:00:00Z"
}Projections check for tombstones and redact fields. This is operationally fragile — every projection must implement redaction — and does not actually remove data from the event store. Use only as a last resort when the other approaches are infeasible.
Failure Mode 4: Replay Storms
A replay storm occurs when multiple projections simultaneously rebuild from the event store, overwhelming it with read traffic. This happens predictably after: a major schema change, a large bug fix requiring multiple projection rebuilds, or a new team member who decides to "just rebuild all projections to be safe."
Disk I/O: saturated
Live queries: timing out ES-->>P1: slow / errors ES-->>P2: slow / errors ES-->>P3: slow / errors
Mitigations:
Serialize rebuilds. Only one projection rebuilds at a time. Use a distributed lock or a rebuild queue.
class RebuildQueue:
def __init__(self, redis_client):
self.redis = redis_client
def enqueue_rebuild(self, projection_name: str, priority: int = 5):
self.redis.zadd("rebuild_queue", {projection_name: priority})
def process_next(self):
# Blocking pop from queue — one worker, one rebuild at a time
item = self.redis.bzpopmin("rebuild_queue", timeout=30)
if item:
_, projection_name, _ = item
self.run_rebuild(projection_name.decode())Event store read replicas. Route projection rebuilds to a read replica or a dedicated follower node. Live event writes and current-position reads go to the primary.
Materialized snapshots. For aggregates replayed frequently, store snapshots at regular intervals in a separate snapshot store. Replay starts from the latest snapshot, not event 0.
@dataclass
class AggregateSnapshot:
aggregate_id: str
aggregate_type: str
version: int
state: dict
snapshot_at: str
def load_aggregate(event_store, snapshot_store, aggregate_id: str):
snapshot = snapshot_store.latest(aggregate_id)
if snapshot:
# Replay only events after the snapshot version
events = event_store.load_from_version(
aggregate_id,
after_version=snapshot.version
)
return replay_from_snapshot(snapshot.state, events)
else:
events = event_store.load_all(aggregate_id)
return replay_from_start(events)When Event Sourcing Is the Wrong Choice
Event sourcing adds permanent complexity. The benefits — complete audit trail, temporal queries, event-driven projections — are real but narrow. It is the wrong choice when:
- You do not need the audit trail. If you only care about current state and nobody will ever ask "what was the state at time T," you are paying for a feature you do not use.
- Your domain has no meaningful events. A CRUD service for static reference data (country codes, tax rates) has no business events worth sourcing.
- Your team lacks the operational experience. Projection rebuilds, schema upcasting, and GDPR compliance in an event store require genuine expertise. The learning curve is steep and the production failures are painful.
- You need strong read consistency. Async projections produce eventual consistency. If your reads must reflect writes immediately (financial balances, inventory counts), you will fight the architecture constantly.
The honest answer: most services do not need event sourcing. A relational database with an audit log table (append-only, written by triggers or application code) gives you 80% of the audit benefit with 10% of the complexity.
Key Takeaways
- Schema drift is inevitable in any long-lived event store — design upcasters from day one and version every event schema explicitly.
- Projection rebuilds must write to a shadow table and swap atomically; in-place rebuilds degrade production.
- GDPR compliance requires crypto-shredding or PII externalization before you have an event store with sensitive data — retrofitting is painful.
- Replay storms are caused by concurrent rebuilds; serialize them behind a queue and route to read replicas.
- Snapshotting aggregates at regular version intervals reduces replay time from O(all events) to O(events since snapshot).
- Event sourcing is the right tool for auditable, event-rich domains — not a default architecture for every service.