Architecture

Idempotency Keys Done Right

Ravinder·June 3, 2025·8 min read

ArchitectureAPI DesignIdempotencyReliabilityDistributed Systems

A payment API that charges a customer twice is not a billing bug — it is a trust-destroying event. Yet most idempotency implementations I have audited will do exactly that under one of three conditions: the client retries before the first request finishes, the server crashes after charge but before storing the response, or the key expires while the client is still retrying.

The fix is not hard, but it requires thinking through each failure mode precisely. Let us do that.

What an Idempotency Key Actually Promises

An idempotency key says: for a given key, the server will execute the side-effectful operation at most once and will return the same response on every subsequent call carrying that key.

That is two promises, not one:

At-most-once execution — the mutation fires once.
Stable response replay — retries get the same response body and status code.

Most teams implement promise 1 and forget promise 2. The client then cannot distinguish "retry saw cached 200" from "retry triggered a second charge that also succeeded."

Key Design

The key must be:

Client-generated — never server-generated. The client must be able to produce the same key across process restarts and across network failures before the request lands.
Request-scoped, not user-scoped — user_id is a terrible key. Use a UUID v4 or a deterministic hash of the operation parameters.
Coupled to request body — if the body changes, the key must change. Accepting the same key for a different body is either an error or a security vulnerability.

import uuid
import hashlib
import json
 
def generate_idempotency_key(operation: str, params: dict) -> str:
    """
    Deterministic key: survives process restarts, safe for retries.
    Use this when the client must regenerate the key without storage.
    """
    canonical = json.dumps({"op": operation, **params}, sort_keys=True)
    digest = hashlib.sha256(canonical.encode()).hexdigest()
    return f"{operation}:{digest[:32]}"
 
def generate_random_key() -> str:
    """
    Random key: simpler, requires client-side persistence across retries.
    Prefer this for payment SDKs where the client can store the key.
    """
    return str(uuid.uuid4())

Storage Schema

The idempotency record needs to capture more than you think.

CREATE TABLE idempotency_keys (
    key             TEXT        PRIMARY KEY,
    request_hash    TEXT        NOT NULL,        -- SHA-256 of request body
    status          TEXT        NOT NULL DEFAULT 'in_flight',
    -- in_flight | completed | failed
    response_status INT,
    response_body   JSONB,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at    TIMESTAMPTZ,
    expires_at      TIMESTAMPTZ NOT NULL,
    lock_token      TEXT        -- used for concurrent request dedup
);
 
CREATE INDEX ON idempotency_keys (expires_at);  -- for TTL cleanup job

status is the underappreciated column. Without it you cannot distinguish "first request is still running" from "key does not exist."

The State Machine

stateDiagram-v2 [*] --> in_flight : INSERT key (atomic) in_flight --> completed : operation succeeds, response cached in_flight --> failed : operation fails, error cached completed --> [*] : replay response on retry failed --> [*] : replay error on retry in_flight --> in_flight : concurrent retry — 409 Conflict completed --> [*] : key expires — DELETE failed --> [*] : key expires — DELETE

The in_flight state is critical. A naive implementation only checks for existence and writes on completion. Under that design, a concurrent retry will execute the operation again before the first response is stored.

Handling Concurrent Retries

Insert the key before executing the operation. Use a unique constraint to detect races.

import psycopg2
from contextlib import contextmanager
 
@contextmanager
def idempotency_guard(conn, key: str, request_body: dict):
    body_hash = sha256(request_body)
    try:
        with conn.cursor() as cur:
            cur.execute("""
                INSERT INTO idempotency_keys
                    (key, request_hash, status, expires_at)
                VALUES
                    (%s, %s, 'in_flight', NOW() + INTERVAL '24 hours')
                ON CONFLICT (key) DO NOTHING
                RETURNING key
            """, (key, body_hash))
            inserted = cur.fetchone()
 
        if inserted is None:
            # Key exists — check state
            row = fetch_key(conn, key)
            if row["status"] == "in_flight":
                raise ConflictError("Request already in flight. Retry after 2s.")
            if row["request_hash"] != body_hash:
                raise BadRequestError("Key reuse with different body.")
            # completed or failed — replay
            yield {"replay": True, "row": row}
            return
 
        conn.commit()
        yield {"replay": False}
 
    except ConflictError:
        raise
    except Exception as e:
        mark_failed(conn, key, str(e))
        raise

Response Caching Strategy

Cache the exact HTTP response: status code, headers relevant to the client, and body. Do not cache derived data or re-serialize from a database entity — you will hit serialization drift after schema changes.

def complete_idempotency_key(conn, key: str, status_code: int, body: dict):
    with conn.cursor() as cur:
        cur.execute("""
            UPDATE idempotency_keys SET
                status          = 'completed',
                response_status = %s,
                response_body   = %s,
                completed_at    = NOW()
            WHERE key = %s AND status = 'in_flight'
        """, (status_code, json.dumps(body), key))
        if cur.rowcount == 0:
            # Lost a race — log and return, do not double-write
            logger.warning("idempotency key %s already completed by concurrent request", key)
    conn.commit()
 
def replay_response(row: dict) -> HTTPResponse:
    return HTTPResponse(
        status=row["response_status"],
        body=row["response_body"],
        headers={"X-Idempotency-Replayed": "true"}
    )

The X-Idempotency-Replayed header is not cosmetic — it lets the client distinguish a fresh response from a cached one without parsing the body.

TTL: Longer Than You Think

Most teams set a 1-hour TTL. That is wrong for payment operations.

Consider the retry schedule of a well-behaved client: exponential backoff starting at 1s, cap at 60s, max 10 attempts — total window is roughly 10 minutes. But the client may also come back after a process restart, or a mobile app coming back online. Set TTL to 24–72 hours for financial operations, 1 hour for idempotent reads, and 5 minutes for very short-lived operations like OTP validation.

Clean up with a background job, not on read:

-- Run every 15 minutes
DELETE FROM idempotency_keys
WHERE expires_at < NOW()
  AND status IN ('completed', 'failed');
-- Do NOT delete in_flight keys; they may indicate a hung worker.

The Partial Failure Problem

The hardest case: the database write for the business operation succeeds, then the server crashes before writing the idempotency response. On retry:

The key status is still in_flight.
The operation already happened.

sequenceDiagram participant C as Client participant S as Server participant DB as Business DB participant IK as Idempotency Store C->>S: POST /payments (key=abc) S->>IK: INSERT key=abc, status=in_flight S->>DB: INSERT payment row Note over S: CRASH — never reaches IK update C->>S: POST /payments (key=abc) [retry] S->>IK: SELECT key=abc → status=in_flight S-->>C: 409 Conflict (request still processing?)

You have two options:

Option A — Fencing token + idempotent business write. The business operation itself is idempotent keyed on the same key. On retry, the business DB upsert is a no-op. Promote the key to completed and replay.

Option B — Stuck in_flight detection. If status = 'in_flight' and NOW() - created_at > threshold, treat it as a failed operation and return an error the client can retry with a new key.

Option A is correct. Option B is pragmatic when you cannot make the business write idempotent (e.g., calling a third-party payment processor).

Conflict Handling: Wrong Body, Same Key

If a client sends the same key with a different request body, reject it with 422:

def validate_request_hash(row: dict, incoming_hash: str):
    if row["request_hash"] != incoming_hash:
        raise UnprocessableEntityError(
            "Idempotency key reuse with different request body. "
            "Generate a new key for a new request."
        )

Do not silently accept it. A key collision on different bodies is almost always a client bug — surface it loudly.

Edge Cases Worth Handling

Clock skew. If your idempotency store is distributed, NOW() comparisons for stuck in-flight detection may be unreliable. Use a monotonic sequence or a dedicated timeout service rather than wall-clock comparisons.

Key rotation under A/B deployment. If you deploy a new version that changes the request body schema, old keys may collide with new request shapes. Version the key namespace: v2:payment:<uuid>.

Distributed idempotency store. Redis is popular here. Use SET key value NX EX ttl for atomic insert-if-absent. But Redis is not durable by default — use appendfsync always or accept the risk that a crash loses in-flight records.

# Redis atomic idempotency insert
SET "idem:abc123" '{"status":"in_flight","hash":"..."}' NX EX 86400
# Returns OK if inserted, nil if key exists

Key Takeaways

Insert the idempotency record in in_flight state before executing the operation — not after.
Cache the raw HTTP response (status + body), not a derived entity, to survive schema changes.
Set TTL to match your longest realistic retry window — 24 hours for payments, not 1 hour.
Return 409 Conflict for concurrent in-flight retries; return the cached response for completed/failed replays.
Validate request body hash on every call — key reuse with a different body is a bug, not a feature.
For partial failures, prefer making the business write idempotent over stuck-detection heuristics.