Architecture

Strangler Fig in Practice: Replacing a Monolith Without a Big-Bang Rewrite

Ravinder·April 3, 2026·9 min read

ArchitectureMicroservicesMigrationStrangler FigMonolith

Strangler Fig in Practice: Replacing a Monolith Without a Big-Bang Rewrite

Why Most Rewrites Fail Before They Ship

I have watched three full rewrites crater in my career. Two were declared victories that quietly reverted within eighteen months. One was cancelled at 70% complete after burning two years of engineering effort. The pattern is so common it has a name: the second-system effect. You escape one set of problems only to import a fresh set you did not anticipate.

The Strangler Fig pattern is the antidote. Named after the strangler fig tree — which grows around a host tree, gradually replacing it while the host continues to stand — it gives you a disciplined, incremental path out of a monolith. Traffic routes through a proxy. New functionality lands in new services. Old functionality migrates piece by piece. Eventually the monolith is empty and you decommission it. No big bang. No parallel system maintained forever. No rewrite death march.

This post is a practitioner's guide. I will cover the proxy layer, extraction sequencing, data migration, testing strategy, and the signals that tell you a migration is complete.

The Core Mechanic

The pattern has three moving parts working together at all times.

flowchart TD Client["Client / API Consumer"] --> Proxy["Facade Proxy\n(nginx / API Gateway / BFF)"] Proxy -->|"Legacy routes"| Monolith["Legacy Monolith"] Proxy -->|"Migrated routes"| Services["New Microservices"] Monolith -.->|"Async events\n(Outbox / CDC)"| Services Services -.->|"Read fallback\n(if data not migrated)"| Monolith style Proxy fill:#DBEAFE,stroke:#3B82F6 style Monolith fill:#FEF3C7,stroke:#D97706 style Services fill:#D1FAE5,stroke:#10B981

The Facade Proxy is the linchpin. It intercepts every request and routes it either to the monolith or to a new service. Routing rules are configuration — you flip a flag to redirect a path, not redeploy the monolith. The proxy is your control plane for the migration.

The New Services are built fresh, in your target stack, with proper domain boundaries. They are production services from day one, not prototypes. They receive real traffic as soon as a route is redirected.

The Monolith keeps running. It serves unextracted routes. It processes unextracted background jobs. It shrinks as extraction proceeds, but it never goes cold until you are ready.

Phase 1 — Installing the Proxy

The first step has nothing to do with rewriting anything. You put a proxy in front of the monolith and pass all traffic straight through. The monolith sees no change. Consumers see no change. But now you own the routing layer.

sequenceDiagram participant C as Client participant P as Proxy participant M as Monolith C->>P: GET /api/orders/123 P->>M: GET /api/orders/123 (passthrough) M-->>P: 200 OK P-->>C: 200 OK

For most teams the right proxy is an API gateway (Kong, AWS API Gateway, Traefik) or a thin Backend-for-Frontend (BFF) you own. Avoid adding business logic here — the proxy should route, authenticate, and observe. Nothing else.

What to instrument from day one

Before you migrate a single route, you need observability on the proxy. You need to know:

Which routes exist and their traffic volume (you will be surprised)
Latency and error rates per route (your baseline)
Which clients call which routes (consumer mapping for breaking-change risk assessment)

# Traefik label example — route /api/orders to legacy
traefik.http.routers.orders-legacy.rule: PathPrefix(`/api/orders`)
traefik.http.routers.orders-legacy.service: monolith
traefik.http.middlewares.orders-legacy.headers.customrequestheaders.X-Route-Version: legacy

Phase 1 is complete when every request flows through the proxy and you have dashboards for every route. This typically takes one to two weeks.

Phase 2 — Extraction Sequencing

Now you choose what to extract first. This is where most teams make their biggest mistake: they pick the most exciting domain (payments, usually) rather than the safest one.

Extract leaf domains first. A leaf domain is one with minimal inbound calls from other parts of the monolith. Notifications, PDF generation, search indexing — these are leaves. Payments, orders, and inventory are deeply entangled cores. Extract the leaves and you build extraction muscle memory. Extract the core first and you spend six months debugging distributed transaction failures.

graph LR Orders["Orders\n(Core — last)"] --> Inventory["Inventory\n(Core — later)"] Orders --> Notifications["Notifications\n(Leaf — first)"] Orders --> PDFGen["PDF Gen\n(Leaf — first)"] Inventory --> Search["Search Index\n(Leaf — second)"] Payments["Payments\n(Core — last)"] --> Notifications style Notifications fill:#D1FAE5,stroke:#10B981 style PDFGen fill:#D1FAE5,stroke:#10B981 style Search fill:#DBEAFE,stroke:#3B82F6 style Orders fill:#FEF3C7,stroke:#D97706 style Inventory fill:#FEF3C7,stroke:#D97706 style Payments fill:#FEF3C7,stroke:#D97706

The extraction checklist for each domain

Before redirecting a route to a new service, the service must:

Pass all unit and integration tests
Serve real traffic in shadow mode (requests duplicated to both monolith and new service, responses compared) for at least one week
Have independent deployment pipeline
Have its own database schema (not sharing the monolith DB)
Have circuit breakers and retries configured
Have alerts wired to the on-call rotation

Shadow mode is non-negotiable. It surfaces mismatches in business logic before users see them.

# Shadow mode comparison — simplified pseudocode
async def shadow_request(path: str, payload: dict) -> Response:
    monolith_task = asyncio.create_task(call_monolith(path, payload))
    new_service_task = asyncio.create_task(call_new_service(path, payload))
 
    monolith_response, new_service_response = await asyncio.gather(
        monolith_task, new_service_task
    )
 
    if not responses_match(monolith_response, new_service_response):
        log_divergence(path, monolith_response, new_service_response)
        emit_metric("shadow.divergence", tags={"path": path})
 
    return monolith_response  # Always serve monolith during shadow phase

Phase 3 — Data Migration Strategy

The hardest part of any extraction is the data. The monolith's database is a big shared schema with implicit ownership. Domain A writes to tables that Domain B reads. Foreign keys span what should be service boundaries. You cannot simply hand a table to a new service and call it done.

The strangler data model

Run both schemas in parallel during the transition. The monolith writes to its schema. The new service has its own schema. A synchronisation layer keeps them consistent.

flowchart LR subgraph Monolith DB M_Orders["orders table"] M_Items["order_items table"] end CDC["Change Data Capture\n(Debezium / AWS DMS)"] subgraph Order Service DB S_Orders["orders (service schema)"] S_Items["line_items (service schema)"] end M_Orders -->|"writes"| CDC M_Items -->|"writes"| CDC CDC -->|"streams"| S_Orders CDC -->|"streams"| S_Items style CDC fill:#DBEAFE,stroke:#3B82F6

Change Data Capture (Debezium is the standard choice) tails the monolith's write-ahead log and publishes events to Kafka. The new service consumes those events and maintains its own projection. This is eventually consistent, which means you need to design your new service to tolerate a small replication lag.

Three-phase data cutover

Phase	Monolith DB	New Service DB	Writes go to
Sync	Primary	CDC replica	Monolith
Dual-write	Primary	Live	Both (with compare)
Cutover	Frozen	Primary	New service only

During dual-write you write to both and compare results. Any mismatch is logged and investigated. When your mismatch rate reaches zero for 48 hours, you cut over. The monolith DB becomes read-only for that domain's tables and is eventually dropped.

Phase 4 — Retiring the Monolith

You know the migration is complete when:

Every route in the proxy routing table points to a new service
No background jobs run inside the monolith process
The monolith database has no tables with active writes
The monolith codebase receives no new commits

Before you pull the plug, run a final dark launch: deploy the monolith process but receive no traffic for two weeks. If nothing breaks in the new services during that window, decommission.

stateDiagram-v2 [*] --> ProxyInstalled: Phase 1 ProxyInstalled --> ExtractionStarted: Phase 2 ExtractionStarted --> ExtractionStarted: Extract next domain ExtractionStarted --> DataMigrated: Phase 3 DataMigrated --> DarkLaunch: Phase 4 DarkLaunch --> Decommissioned: No issues in 14 days Decommissioned --> [*]

Common Failure Modes

The Proxy Becomes a God Object

Teams add business logic to the proxy because it is easier than building a proper service. After six months the proxy has authentication, transformation, and orchestration logic baked in. It is now a second monolith. Keep the proxy thin. Route and observe only.

Shared Database Not Addressed

Extracting a service while it still talks to the monolith database is not extraction — it is process separation with shared state. The database is the monolith. If you do not address the data layer, you will never be able to deploy the services independently.

Extracting Too Many Things Simultaneously

Three teams extracting three domains in parallel sounds efficient. It is a coordination nightmare. Each extraction creates integration work for every other team. Extract serially within a domain boundary. Parallelize only across fully independent leaf domains.

Skipping Shadow Mode

Every team that skips shadow mode discovers a behavioural difference between the monolith and the new service in production. Shadow mode is two weeks of discipline that saves you from a midnight incident.

Measuring Migration Progress

Track these metrics in your weekly migration review:

Migration Health Dashboard
───────────────────────────────────────────
Routes migrated:        23 / 41    (56%)
Routes in shadow mode:   6 / 41    (15%)
Routes remaining:       12 / 41    (29%)
───────────────────────────────────────────
Monolith DB tables active:   18 → 12 → 7
Monolith CPU (weekly avg):   68% → 41% → 22%
New service p99 latency:     82ms (target: <100ms)
Shadow divergence rate:      0.003% (target: 0%)
───────────────────────────────────────────

CPU and database activity are your leading indicators. As they drop, you know extraction is real and the monolith is actually shrinking — not just running alongside new services.

The Strangler Fig Is a Discipline, Not a Pattern

I want to be direct: the strangler fig is easy to do badly. Teams proxy traffic but never redirect it. Teams extract services but leave shared databases. Teams declare victory at 80% and live with a zombie monolith for years.

The pattern works when you treat migration velocity as a first-class engineering metric. Measure it. Review it weekly. Have a public dashboard. Make it embarrassing for a domain to stay in the monolith longer than planned.

The fig tree does not ask permission to grow. It just grows, persistently, around the host, until the host is gone. Your migration should work the same way.

Key Takeaways

Install the proxy first. Extract nothing until every route flows through it and you have observability.
Extract leaf domains before core domains. Build muscle memory on low-risk paths.
Shadow mode is mandatory. Divergences found in shadow cost nothing. Divergences found in production cost everything.
Address the database. Service extraction without data extraction is theatre.
Measure migration velocity weekly. If the monolith is not shrinking, the migration is stalled.
Decommission completely. A "mostly retired" monolith is still a monolith.