Engineering

Feature Flags as Architecture, Not Toggles

Ravinder·November 10, 2025·8 min read

EngineeringFeature FlagsArchitectureDevOps

Feature Flags as Architecture, Not Toggles

We counted them once. Across four services, we had 214 active feature flags. Of those, fewer than 30 were being actively toggled by anyone. The rest were ghosts: flags that had been "enabled for everyone" for months, flags that nobody knew the owner of, flags whose enabling conditions referenced A/B experiment IDs that had ended two product cycles ago. We had a flag called use_new_checkout_v2 that was enabled in production — but the "old checkout v1" it was supposed to replace had been deleted from the codebase a year prior.

That is feature flag debt. It accumulates silently, it compounds your cognitive load, and it occasionally kills you. In 2021, a major cloud provider suffered an hours-long outage caused in part by a configuration flag interaction nobody had fully traced. Flag debt is not a theoretical problem.

The way out is to stop treating feature flags as light switches and start treating them as architecture: first-class components with owners, lifecycle states, runtime costs, and retirement deadlines.

The Four Kinds of Flags

Not all flags are the same, and conflating them is the root cause of most flag debt. Before we talk about lifecycle and ownership, get clear on which kind of flag you're creating.

graph TD A[Feature Flag] --> B[Release Flag] A --> C[Experiment Flag] A --> D[Ops Flag] A --> E[Permission Flag] B -->|Lifespan| B1["Days to weeks\nDeleted after full rollout"] C -->|Lifespan| C1["Days to months\nDeleted after experiment concludes"] D -->|Lifespan| D1["Indefinite\nKill switch / circuit breaker"] E -->|Lifespan| E1["Indefinite\nEntitlement / plan gate"] style A fill:#1e3a5f,color:#fff style B fill:#2c5282,color:#fff style C fill:#2c5282,color:#fff style D fill:#7b341e,color:#fff style E fill:#1a5e35,color:#fff

Release flags control whether a feature is visible to users during a gradual rollout. They have a short, known lifespan. Once the rollout completes and the feature is stable, the flag should be deleted — not flipped to true everywhere and left.

Experiment flags gate A/B and multivariate tests. They have an experiment ID, a start date, a planned end date, and a winner. When the experiment concludes, the flag gets deleted and the winning code path becomes the default.

Ops flags are kill switches and circuit breakers. These are intentionally permanent: disable_expensive_recommendation_engine, use_fallback_payment_processor. They live indefinitely and should be treated with the same rigor as infrastructure configuration.

Permission flags control access based on user entitlement: plan tier, beta access, enterprise features. These are also permanent and belong in your entitlement system, not your feature flag system. If your feature flag tool is doing double duty as an entitlement system, fix that first.

Naming and Ownership: The Contract That Prevents Orphans

The use_new_checkout_v2 flag I mentioned earlier had no owner. Nobody knew who created it. Nobody knew what the "old checkout" it referenced looked like. That happens when naming is casual and ownership is implicit.

Enforce a naming convention that encodes the flag type and scope:

{type}/{team}/{feature-name}

Examples:

release/payments/3ds-authentication
experiment/growth/onboarding-step-reduction
ops/infra/disable-async-indexer
permission/enterprise/bulk-export

The type prefix tells you the expected lifecycle. The team prefix tells you who to blame. The feature name tells you what it does.

Ownership must be explicit and machine-readable. Store it in the flag definition itself:

# flags/release/payments/3ds-authentication.yaml
name: release/payments/3ds-authentication
type: release
owner: payments-team
created: 2025-09-12
planned_retirement: 2025-11-01
jira_ticket: PAY-4821
description: >
  Gates the new 3DS v2 authentication flow for card payments.
  Replaces the legacy 3DS v1 redirect flow.
rollout:
  strategy: percentage
  current_percentage: 100
  segments: []

Codify the retirement date at creation time. Not "when we're done" — a specific calendar date. If the date passes and the flag still exists, your CI pipeline should fail.

The Flag Lifecycle Enforced by Code

Lifecycle management only works if it's automated. Humans don't reliably retire flags under deadline pressure. Machines do.

stateDiagram-v2 [*] --> Created: PR merged with flag definition Created --> Active: First evaluation in production Active --> RollingOut: Percentage > 0 RollingOut --> FullyRolledOut: Percentage = 100 FullyRolledOut --> PendingRetirement: Retirement date reached PendingRetirement --> Retired: Cleanup PR merged Retired --> [*] Active --> Abandoned: No evaluations in 30 days Abandoned --> PendingRetirement: Automated alert to owner

The automation hooks you need:

At CI time: Parse all flag definitions. For any flag past its planned_retirement date, fail the build with a message naming the owner team.

# scripts/check_flag_retirement.py
import yaml, sys
from datetime import date, datetime
from pathlib import Path
 
overdue = []
for flag_file in Path("flags/").rglob("*.yaml"):
    with open(flag_file) as f:
        flag = yaml.safe_load(f)
    retirement = flag.get("planned_retirement")
    if retirement and datetime.strptime(str(retirement), "%Y-%m-%d").date() < date.today():
        overdue.append((flag["name"], flag["owner"], retirement))
 
if overdue:
    print("ERROR: The following flags are past their retirement date:")
    for name, owner, ret_date in overdue:
        print(f"  {name} (owner: {owner}, was due: {ret_date})")
    sys.exit(1)

At runtime: Emit an evaluation event for every flag check. Feed this into your metrics system.

func (f *FlagClient) IsEnabled(ctx context.Context, flagName string, user User) bool {
    result := f.evaluate(ctx, flagName, user)
 
    f.metrics.Inc("feature_flag.evaluation",
        "flag", flagName,
        "result", strconv.FormatBool(result),
        "flag_type", f.registry.TypeOf(flagName),
    )
 
    return result
}

On a schedule: Query your metrics system for flags with zero evaluations in the last 30 days and open a ticket against the owning team.

# Pseudocode for a scheduled staleness check
stale_flags = metrics_client.query("""
    SELECT flag_name, owner
    FROM feature_flag_evaluations
    GROUP BY flag_name, owner
    HAVING max(timestamp) < now() - interval '30 days'
       AND flag_type != 'ops'
       AND flag_type != 'permission'
""")
 
for flag in stale_flags:
    jira.create_ticket(
        project=flag.owner,
        title=f"Stale feature flag: {flag.flag_name}",
        description="This flag has had no evaluations in 30+ days. Please retire it or confirm it is still needed."
    )

The Runtime Cost You're Ignoring

Every flag evaluation is a network call, a cache lookup, or a file read. At low volume this is invisible. At high volume — inside a hot path, inside a loop — it compounds.

Profile your flag evaluation paths. If you're calling isEnabled() inside a database result iteration, you're making N flag evaluations where N is your result set size.

// Problematic: flag evaluated per-item
const orders = await db.getOrders(userId);
const enriched = await Promise.all(
  orders.map(async (order) => {
    if (await flags.isEnabled('release/orders/enriched-metadata', user)) {
      return enrichOrder(order);
    }
    return order;
  })
);
 
// Better: evaluate once, apply to all
const useEnrichedMetadata = await flags.isEnabled(
  'release/orders/enriched-metadata',
  user
);
const enriched = useEnrichedMetadata
  ? await Promise.all(orders.map(enrichOrder))
  : orders;

Beyond per-call cost, permanent flags with complex targeting rules add latency to every request they're evaluated in. Audit the p99 latency of your flag evaluation SDK. If it's above 5ms for local cache hits, something is wrong with your caching strategy.

Measuring Flag Debt: The Dashboard You Should Have

Flag debt is invisible without explicit measurement. Build a dashboard with these metrics:

Total active flags by type: If ops and permission flags grow unboundedly, that's expected. If release flags accumulate, you have a retirement problem.
Flags past retirement date: Should be 0 at all times. Non-zero means your CI check isn't enforced.
Flags with no evaluations in 30 days: Should trigger automated tickets.
Flags with no listed owner: Should be 0. Your naming convention enforces this.
Average flag age by type: Release flags averaging > 60 days is a red flag.

xychart-beta title "Flag Age Distribution (Release Flags)" x-axis ["0-7d", "8-14d", "15-30d", "31-60d", "61-90d", "90d+"] y-axis "Count" 0 --> 25 bar [12, 8, 5, 3, 1, 0]

The healthy chart shows a steep drop-off after 30 days. If you have a long tail past 60 days, your retirement process is broken.

Key Takeaways

Treat flags as four distinct types — release, experiment, ops, permission — each with different expected lifespans. Conflating them is the root cause of debt accumulation.
Encode the owner and retirement date in the flag definition at creation time, not after. If the retirement date passes with the flag still active, CI should fail.
Automate staleness detection: emit evaluation metrics, query for zero-evaluation flags, and open tickets automatically. Humans will not reliably retire flags under deadline pressure.
Evaluate flag runtime cost explicitly. Flags inside hot paths or loops multiply network overhead; evaluate once per request scope, not once per item.
Build a flag debt dashboard with four metrics: total active flags by type, flags past retirement date, flags with no evaluations, and average flag age by type. Make it visible to engineering leadership.
Permanent flags (ops, permission) deserve the same rigor as infrastructure configuration: they should be reviewed, versioned, and documented — not left to drift.