Strangler Fig in Practice: Replacing a Monolith Without a Big-Bang Rewrite
Why Most Rewrites Fail Before They Ship
I have watched three full rewrites crater in my career. Two were declared victories that quietly reverted within eighteen months. One was cancelled at 70% complete after burning two years of engineering effort. The pattern is so common it has a name: the second-system effect. You escape one set of problems only to import a fresh set you did not anticipate.
The Strangler Fig pattern is the antidote. Named after the strangler fig tree — which grows around a host tree, gradually replacing it while the host continues to stand — it gives you a disciplined, incremental path out of a monolith. Traffic routes through a proxy. New functionality lands in new services. Old functionality migrates piece by piece. Eventually the monolith is empty and you decommission it. No big bang. No parallel system maintained forever. No rewrite death march.
This post is a practitioner's guide. I will cover the proxy layer, extraction sequencing, data migration, testing strategy, and the signals that tell you a migration is complete.
The Core Mechanic
The pattern has three moving parts working together at all times.
The Facade Proxy is the linchpin. It intercepts every request and routes it either to the monolith or to a new service. Routing rules are configuration — you flip a flag to redirect a path, not redeploy the monolith. The proxy is your control plane for the migration.
The New Services are built fresh, in your target stack, with proper domain boundaries. They are production services from day one, not prototypes. They receive real traffic as soon as a route is redirected.
The Monolith keeps running. It serves unextracted routes. It processes unextracted background jobs. It shrinks as extraction proceeds, but it never goes cold until you are ready.
Phase 1 — Installing the Proxy
The first step has nothing to do with rewriting anything. You put a proxy in front of the monolith and pass all traffic straight through. The monolith sees no change. Consumers see no change. But now you own the routing layer.
For most teams the right proxy is an API gateway (Kong, AWS API Gateway, Traefik) or a thin Backend-for-Frontend (BFF) you own. Avoid adding business logic here — the proxy should route, authenticate, and observe. Nothing else.
What to instrument from day one
Before you migrate a single route, you need observability on the proxy. You need to know:
- Which routes exist and their traffic volume (you will be surprised)
- Latency and error rates per route (your baseline)
- Which clients call which routes (consumer mapping for breaking-change risk assessment)
# Traefik label example — route /api/orders to legacy
traefik.http.routers.orders-legacy.rule: PathPrefix(`/api/orders`)
traefik.http.routers.orders-legacy.service: monolith
traefik.http.middlewares.orders-legacy.headers.customrequestheaders.X-Route-Version: legacyPhase 1 is complete when every request flows through the proxy and you have dashboards for every route. This typically takes one to two weeks.
Phase 2 — Extraction Sequencing
Now you choose what to extract first. This is where most teams make their biggest mistake: they pick the most exciting domain (payments, usually) rather than the safest one.
Extract leaf domains first. A leaf domain is one with minimal inbound calls from other parts of the monolith. Notifications, PDF generation, search indexing — these are leaves. Payments, orders, and inventory are deeply entangled cores. Extract the leaves and you build extraction muscle memory. Extract the core first and you spend six months debugging distributed transaction failures.
The extraction checklist for each domain
Before redirecting a route to a new service, the service must:
- Pass all unit and integration tests
- Serve real traffic in shadow mode (requests duplicated to both monolith and new service, responses compared) for at least one week
- Have independent deployment pipeline
- Have its own database schema (not sharing the monolith DB)
- Have circuit breakers and retries configured
- Have alerts wired to the on-call rotation
Shadow mode is non-negotiable. It surfaces mismatches in business logic before users see them.
# Shadow mode comparison — simplified pseudocode
async def shadow_request(path: str, payload: dict) -> Response:
monolith_task = asyncio.create_task(call_monolith(path, payload))
new_service_task = asyncio.create_task(call_new_service(path, payload))
monolith_response, new_service_response = await asyncio.gather(
monolith_task, new_service_task
)
if not responses_match(monolith_response, new_service_response):
log_divergence(path, monolith_response, new_service_response)
emit_metric("shadow.divergence", tags={"path": path})
return monolith_response # Always serve monolith during shadow phasePhase 3 — Data Migration Strategy
The hardest part of any extraction is the data. The monolith's database is a big shared schema with implicit ownership. Domain A writes to tables that Domain B reads. Foreign keys span what should be service boundaries. You cannot simply hand a table to a new service and call it done.
The strangler data model
Run both schemas in parallel during the transition. The monolith writes to its schema. The new service has its own schema. A synchronisation layer keeps them consistent.
Change Data Capture (Debezium is the standard choice) tails the monolith's write-ahead log and publishes events to Kafka. The new service consumes those events and maintains its own projection. This is eventually consistent, which means you need to design your new service to tolerate a small replication lag.
Three-phase data cutover
| Phase | Monolith DB | New Service DB | Writes go to |
|---|---|---|---|
| Sync | Primary | CDC replica | Monolith |
| Dual-write | Primary | Live | Both (with compare) |
| Cutover | Frozen | Primary | New service only |
During dual-write you write to both and compare results. Any mismatch is logged and investigated. When your mismatch rate reaches zero for 48 hours, you cut over. The monolith DB becomes read-only for that domain's tables and is eventually dropped.
Phase 4 — Retiring the Monolith
You know the migration is complete when:
- Every route in the proxy routing table points to a new service
- No background jobs run inside the monolith process
- The monolith database has no tables with active writes
- The monolith codebase receives no new commits
Before you pull the plug, run a final dark launch: deploy the monolith process but receive no traffic for two weeks. If nothing breaks in the new services during that window, decommission.
Common Failure Modes
The Proxy Becomes a God Object
Teams add business logic to the proxy because it is easier than building a proper service. After six months the proxy has authentication, transformation, and orchestration logic baked in. It is now a second monolith. Keep the proxy thin. Route and observe only.
Shared Database Not Addressed
Extracting a service while it still talks to the monolith database is not extraction — it is process separation with shared state. The database is the monolith. If you do not address the data layer, you will never be able to deploy the services independently.
Extracting Too Many Things Simultaneously
Three teams extracting three domains in parallel sounds efficient. It is a coordination nightmare. Each extraction creates integration work for every other team. Extract serially within a domain boundary. Parallelize only across fully independent leaf domains.
Skipping Shadow Mode
Every team that skips shadow mode discovers a behavioural difference between the monolith and the new service in production. Shadow mode is two weeks of discipline that saves you from a midnight incident.
Measuring Migration Progress
Track these metrics in your weekly migration review:
Migration Health Dashboard
───────────────────────────────────────────
Routes migrated: 23 / 41 (56%)
Routes in shadow mode: 6 / 41 (15%)
Routes remaining: 12 / 41 (29%)
───────────────────────────────────────────
Monolith DB tables active: 18 → 12 → 7
Monolith CPU (weekly avg): 68% → 41% → 22%
New service p99 latency: 82ms (target: <100ms)
Shadow divergence rate: 0.003% (target: 0%)
───────────────────────────────────────────CPU and database activity are your leading indicators. As they drop, you know extraction is real and the monolith is actually shrinking — not just running alongside new services.
The Strangler Fig Is a Discipline, Not a Pattern
I want to be direct: the strangler fig is easy to do badly. Teams proxy traffic but never redirect it. Teams extract services but leave shared databases. Teams declare victory at 80% and live with a zombie monolith for years.
The pattern works when you treat migration velocity as a first-class engineering metric. Measure it. Review it weekly. Have a public dashboard. Make it embarrassing for a domain to stay in the monolith longer than planned.
The fig tree does not ask permission to grow. It just grows, persistently, around the host, until the host is gone. Your migration should work the same way.
Key Takeaways
- Install the proxy first. Extract nothing until every route flows through it and you have observability.
- Extract leaf domains before core domains. Build muscle memory on low-risk paths.
- Shadow mode is mandatory. Divergences found in shadow cost nothing. Divergences found in production cost everything.
- Address the database. Service extraction without data extraction is theatre.
- Measure migration velocity weekly. If the monolith is not shrinking, the migration is stalled.
- Decommission completely. A "mostly retired" monolith is still a monolith.