Architecture

Schema-on-Read vs Schema-on-Write at the API Edge

Ravinder·September 3, 2025·10 min read

ArchitectureAPI DesignSchemaProtobufJSON Schema

Schema-on-Read vs Schema-on-Write at the API Edge

The Validation Decision Nobody Makes Explicitly

Every API has a schema. The question is not whether you have one — it is whether you enforce it at write time, at read time, both, or neither. Most teams stumble into an answer by accident: they add a JSON Schema validator to the intake endpoint and call it done, or they skip runtime validation entirely and trust their TypeScript types. Neither is obviously wrong. Both carry hidden costs that compound over years.

Schema-on-write means: reject bad data before it enters the system. Schema-on-read means: accept whatever arrives, interpret it at consumption time. These two stances produce radically different systems. Getting this decision wrong shows up in outages, migration pain, and the kind of technical debt that takes quarters to unpick.

This post is opinionated. I will tell you when each approach wins, when Protobuf changes the calculus entirely, and why the real answer for most production APIs is a layered strategy that uses both — in the right places.

What Schema-on-Write Actually Buys You

Schema-on-write rejects invalid payloads at the point of ingestion. The producer learns immediately that their data is wrong. Nothing malformed reaches your storage, your queues, or your downstream consumers.

// JSON Schema validation at the API edge — write time
import Ajv from "ajv";
import addFormats from "ajv-formats";
 
const ajv = new Ajv({ allErrors: true, coerceTypes: false });
addFormats(ajv);
 
const orderSchema = {
  type: "object",
  required: ["orderId", "customerId", "items", "createdAt"],
  additionalProperties: false,
  properties: {
    orderId:    { type: "string", format: "uuid" },
    customerId: { type: "string", minLength: 1 },
    items: {
      type: "array",
      minItems: 1,
      items: {
        type: "object",
        required: ["sku", "quantity", "unitPriceCents"],
        properties: {
          sku:            { type: "string" },
          quantity:       { type: "integer", minimum: 1 },
          unitPriceCents: { type: "integer", minimum: 0 }
        }
      }
    },
    createdAt: { type: "string", format: "date-time" }
  }
};
 
const validate = ajv.compile(orderSchema);
 
function ingestOrder(payload: unknown) {
  const valid = validate(payload);
  if (!valid) {
    throw new ValidationError(validate.errors!);
  }
  return storeOrder(payload as Order);
}

The advantage is containment. Your storage layer only ever sees well-formed data. Your downstream consumers can skip defensive null-checks on fields they know the schema enforces. Debugging is fast: the error surface is at the boundary, not somewhere deep in a pipeline six hours later.

The cost is rigidity. Every schema change that tightens constraints becomes a coordinated deployment. Add a required field? Every producer must update before you deploy. Make a field more restrictive? You may silently break producers sending previously-valid values.

Schema-on-write works best when:

You own both producer and consumer (internal APIs, platform APIs)
Data enters from a small, controllable set of clients
Downstream processing is expensive and bad data would cascade

When Schema-on-Read Makes More Sense

Schema-on-read stores data as-is and applies interpretation at consumption time. It is the model that powers data lakes, event streams, and most analytics pipelines. It also appears at API edges more than people admit — any API that stores raw JSON blobs and projects views on top is doing schema-on-read.

# Schema-on-read: store raw, validate at projection time
import json
from pydantic import BaseModel, ValidationError
from typing import Optional
 
class OrderV1(BaseModel):
    order_id: str
    customer_id: str
    total_cents: int
 
class OrderV2(BaseModel):
    order_id: str
    customer_id: str
    subtotal_cents: int
    tax_cents: int
    total_cents: int
    shipping_address: Optional[str] = None
 
def project_order(raw_blob: str, schema_version: str):
    data = json.loads(raw_blob)
    if schema_version == "v1":
        try:
            return OrderV1(**data)
        except ValidationError as e:
            return None  # or emit to dead-letter
    elif schema_version == "v2":
        try:
            return OrderV2(**data)
        except ValidationError as e:
            return None
    raise ValueError(f"Unknown schema version: {schema_version}")

The strength here is evolvability. You can add new fields without touching stored data. Old readers keep working because they project what they understand and ignore the rest. New readers can consume older records with defaults for fields that did not exist yet.

The weakness is that bad data survives. A producer sending garbage will have that garbage stored durably. You only discover the problem when a consumer tries to project it — which might be days later, in a report, in a payment processor, in a downstream system that assumed cleanliness.

Schema-on-read wins when:

Producers are external, heterogeneous, or beyond your control
Schema evolves faster than deployment cycles allow
Data has a long shelf life and needs to survive multiple schema versions
You are building an ingestion pipeline where throughput matters more than immediate validation

How Protobuf Changes the Calculus

Protobuf is not schema-on-write or schema-on-read in the classical sense. It is binary, self-describing only with the proto definition, and designed for evolution. Understanding how it handles change determines how you design for it.

// orders.proto — version 1
syntax = "proto3";
package orders.v1;
 
message Order {
  string order_id    = 1;
  string customer_id = 2;
  int64  total_cents = 3;
}

// orders.proto — version 2, backward compatible additions
syntax = "proto3";
package orders.v1;  // same package — this IS forward-compatible
 
message Order {
  string order_id       = 1;
  string customer_id    = 2;
  int64  total_cents    = 3;
  int64  subtotal_cents = 4;  // new — old consumers ignore it
  int64  tax_cents      = 5;  // new — old consumers ignore it
  string shipping_tag   = 6;  // new — old consumers ignore it
}

Protobuf enforces schema at serialization. Unknown fields are preserved by default in proto3, which means a consumer compiled against v1 receiving a v2 message will not crash — it will silently drop the new fields when it deserializes. This is the schema evolution contract Protobuf provides: additive changes are safe; removals and type changes break the contract.

flowchart LR subgraph Protobuf Evolution Rules A[Add field with new tag] --> OK1[Safe — old readers ignore] B[Remove field] --> BAD1[Unsafe — tag may be reused] C[Rename field] --> OK2[Safe — wire uses tag not name] D[Change field type] --> BAD2[Unsafe — wire encoding differs] E[Reserve removed tags] --> OK3[Prevents tag reuse accidents] end

The runtime cost of Protobuf validation is lower than JSON Schema because the schema is embedded in the generated code. There is no schema compilation step at request time. For high-throughput APIs where JSON Schema validation adds 2–5ms per request, this matters.

The Layered Validation Architecture

In practice, the answer is not either/or. A robust API edge uses validation at multiple points with different responsibilities at each layer.

flowchart TD Client -->|HTTP request| Gateway subgraph Gateway ["API Gateway Layer"] G1[Structural check: is this valid JSON / valid proto?] G2[Auth + rate limit] end Gateway --> AppEdge subgraph AppEdge ["Application Edge"] A1[JSON Schema / proto validation] A2[Business constraint validation] A3[Idempotency check] end AppEdge --> Store[(Storage)] Store --> Stream[[Event Stream]] subgraph Consumer ["Consumer / Read Path"] C1[Schema projection] C2[Version-aware deserialization] C3[Fallback / dead-letter] end Stream --> Consumer

Each layer has a clear job:

Gateway: Fast, cheap checks. Is the content type correct? Is the body parseable? Is the payload under size limits? The gateway rejects malformed HTTP, not malformed business data.

Application edge: Full schema validation against the current version. Business rules (is the order total consistent with the line items?). This is where JSON Schema or Protobuf validation runs. Errors here return 4xx to the caller immediately.

Storage: Accepts only validated, well-formed records. Never write blind.

Consumer/read path: Projection, version mapping, and graceful handling of data that predates the current schema. This is your schema-on-read layer — intentional, bounded, and controlled.

Schema Evolution Without Coordination Pain

The biggest real-world problem with schema-on-write is the coordination cost of schema changes. You want to add a required field. Every producer must update. Every producer must deploy. You must coordinate or you break the API.

There are three patterns that reduce this pain significantly.

1. Make new required fields optional for one release cycle. Ship the schema change as optional. Communicate to producers they have one release cycle to add the field. After that cycle, promote it to required. This gives producers time without indefinite looseness.

2. Version at the resource level, not the API level. Instead of /v2/orders, evolve the Order resource using content negotiation or an explicit schema version field. This lets you evolve individual resources without forking the entire API surface.

POST /orders
Content-Type: application/json
X-Schema-Version: 2025-09-03
 
{ "orderId": "...", "subtotalCents": 4500, "taxCents": 450 }

3. Use an explicit schema registry for event-driven APIs. For Kafka-based or async APIs, a schema registry (Confluent Schema Registry, AWS Glue, Buf Schema Registry) makes backward compatibility enforcement automatic. Producers that break the schema fail at publish time, not at consumer time.

# Confluent Schema Registry — producer-side validation
from confluent_kafka import Producer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
 
schema_registry_conf = {"url": "https://schema-registry.internal"}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
 
avro_serializer = AvroSerializer(
    schema_registry_client,
    order_schema_str,
    to_dict=lambda obj, ctx: obj.__dict__
)
 
producer = Producer({"bootstrap.servers": "kafka:9092"})
 
def publish_order(order: Order):
    producer.produce(
        topic="orders",
        value=avro_serializer(order, SerializationContext("orders", MessageField.VALUE))
    )
    producer.flush()

Runtime Cost: What the Numbers Actually Look Like

Runtime validation is not free. Here are representative numbers from production systems:

Approach	Median latency add	P99 latency add	Notes
No validation	0ms	0ms	Baseline — bad idea
JSON Schema (ajv)	0.8ms	3.2ms	Depends on schema complexity
Zod (TypeScript)	0.6ms	2.1ms	Slightly faster than ajv for simple schemas
Protobuf decode	0.1ms	0.4ms	Binary — much faster
Pydantic (Python)	1.2ms	4.8ms	Slower per-request but cached model compilation

For most APIs handling < 1,000 req/s, JSON Schema validation at the edge is invisible. At 10,000+ req/s on latency-sensitive endpoints, the difference between JSON Schema and Protobuf is worth measuring. Do not optimize prematurely, but do not ignore it if you are already hitting latency budgets.

Type Safety vs Flexibility: Where the Real Tension Lives

Type safety (schema-on-write, Protobuf, strict JSON Schema) optimises for correctness in a world where you control the contract. Flexibility (schema-on-read, permissive ingestion, late binding) optimises for evolvability in a world where you do not.

The mistake I see most often is applying the wrong model to the wrong side of the API. Teams use strict schema-on-write validation on public webhook receivers — then wonder why integrations keep breaking when third-party producers add new fields. They use schema-on-read for internal service-to-service APIs — then spend hours debugging why a consumer failed silently on malformed data from a colleague's service.

The rule of thumb: strict schema-on-write for APIs you produce, permissive schema-on-read with explicit projection for APIs you consume. Postel's law is still right — be conservative in what you send, liberal in what you accept. Just make "liberal" mean "project what you understand" rather than "silently accept garbage."

Key Takeaways

Schema-on-write rejects bad data at the boundary, producing clean storage and fast error feedback — at the cost of tight coupling between schema changes and producer deployments.
Schema-on-read stores data raw and interprets it at consumption time, enabling schema evolution without coordination — at the cost of delayed error discovery and potential for silent data corruption.
Protobuf occupies a middle ground: schema enforced at serialization, backward-compatible evolution via field tagging, and significantly lower runtime overhead than JSON Schema.
A layered validation strategy — structural checks at the gateway, full schema validation at the application edge, version-aware projection at the consumer — gives you the best of both approaches.
Schema registries are non-optional for event-driven systems at scale; they make backward compatibility enforcement automatic and auditable.
Match the model to the trust boundary: strict validation on APIs you control, liberal acceptance with explicit projection on APIs you consume from external parties.