Schema-on-Read vs Schema-on-Write at the API Edge
The Validation Decision Nobody Makes Explicitly
Every API has a schema. The question is not whether you have one — it is whether you enforce it at write time, at read time, both, or neither. Most teams stumble into an answer by accident: they add a JSON Schema validator to the intake endpoint and call it done, or they skip runtime validation entirely and trust their TypeScript types. Neither is obviously wrong. Both carry hidden costs that compound over years.
Schema-on-write means: reject bad data before it enters the system. Schema-on-read means: accept whatever arrives, interpret it at consumption time. These two stances produce radically different systems. Getting this decision wrong shows up in outages, migration pain, and the kind of technical debt that takes quarters to unpick.
This post is opinionated. I will tell you when each approach wins, when Protobuf changes the calculus entirely, and why the real answer for most production APIs is a layered strategy that uses both — in the right places.
What Schema-on-Write Actually Buys You
Schema-on-write rejects invalid payloads at the point of ingestion. The producer learns immediately that their data is wrong. Nothing malformed reaches your storage, your queues, or your downstream consumers.
// JSON Schema validation at the API edge — write time
import Ajv from "ajv";
import addFormats from "ajv-formats";
const ajv = new Ajv({ allErrors: true, coerceTypes: false });
addFormats(ajv);
const orderSchema = {
type: "object",
required: ["orderId", "customerId", "items", "createdAt"],
additionalProperties: false,
properties: {
orderId: { type: "string", format: "uuid" },
customerId: { type: "string", minLength: 1 },
items: {
type: "array",
minItems: 1,
items: {
type: "object",
required: ["sku", "quantity", "unitPriceCents"],
properties: {
sku: { type: "string" },
quantity: { type: "integer", minimum: 1 },
unitPriceCents: { type: "integer", minimum: 0 }
}
}
},
createdAt: { type: "string", format: "date-time" }
}
};
const validate = ajv.compile(orderSchema);
function ingestOrder(payload: unknown) {
const valid = validate(payload);
if (!valid) {
throw new ValidationError(validate.errors!);
}
return storeOrder(payload as Order);
}The advantage is containment. Your storage layer only ever sees well-formed data. Your downstream consumers can skip defensive null-checks on fields they know the schema enforces. Debugging is fast: the error surface is at the boundary, not somewhere deep in a pipeline six hours later.
The cost is rigidity. Every schema change that tightens constraints becomes a coordinated deployment. Add a required field? Every producer must update before you deploy. Make a field more restrictive? You may silently break producers sending previously-valid values.
Schema-on-write works best when:
- You own both producer and consumer (internal APIs, platform APIs)
- Data enters from a small, controllable set of clients
- Downstream processing is expensive and bad data would cascade
When Schema-on-Read Makes More Sense
Schema-on-read stores data as-is and applies interpretation at consumption time. It is the model that powers data lakes, event streams, and most analytics pipelines. It also appears at API edges more than people admit — any API that stores raw JSON blobs and projects views on top is doing schema-on-read.
# Schema-on-read: store raw, validate at projection time
import json
from pydantic import BaseModel, ValidationError
from typing import Optional
class OrderV1(BaseModel):
order_id: str
customer_id: str
total_cents: int
class OrderV2(BaseModel):
order_id: str
customer_id: str
subtotal_cents: int
tax_cents: int
total_cents: int
shipping_address: Optional[str] = None
def project_order(raw_blob: str, schema_version: str):
data = json.loads(raw_blob)
if schema_version == "v1":
try:
return OrderV1(**data)
except ValidationError as e:
return None # or emit to dead-letter
elif schema_version == "v2":
try:
return OrderV2(**data)
except ValidationError as e:
return None
raise ValueError(f"Unknown schema version: {schema_version}")The strength here is evolvability. You can add new fields without touching stored data. Old readers keep working because they project what they understand and ignore the rest. New readers can consume older records with defaults for fields that did not exist yet.
The weakness is that bad data survives. A producer sending garbage will have that garbage stored durably. You only discover the problem when a consumer tries to project it — which might be days later, in a report, in a payment processor, in a downstream system that assumed cleanliness.
Schema-on-read wins when:
- Producers are external, heterogeneous, or beyond your control
- Schema evolves faster than deployment cycles allow
- Data has a long shelf life and needs to survive multiple schema versions
- You are building an ingestion pipeline where throughput matters more than immediate validation
How Protobuf Changes the Calculus
Protobuf is not schema-on-write or schema-on-read in the classical sense. It is binary, self-describing only with the proto definition, and designed for evolution. Understanding how it handles change determines how you design for it.
// orders.proto — version 1
syntax = "proto3";
package orders.v1;
message Order {
string order_id = 1;
string customer_id = 2;
int64 total_cents = 3;
}// orders.proto — version 2, backward compatible additions
syntax = "proto3";
package orders.v1; // same package — this IS forward-compatible
message Order {
string order_id = 1;
string customer_id = 2;
int64 total_cents = 3;
int64 subtotal_cents = 4; // new — old consumers ignore it
int64 tax_cents = 5; // new — old consumers ignore it
string shipping_tag = 6; // new — old consumers ignore it
}Protobuf enforces schema at serialization. Unknown fields are preserved by default in proto3, which means a consumer compiled against v1 receiving a v2 message will not crash — it will silently drop the new fields when it deserializes. This is the schema evolution contract Protobuf provides: additive changes are safe; removals and type changes break the contract.
The runtime cost of Protobuf validation is lower than JSON Schema because the schema is embedded in the generated code. There is no schema compilation step at request time. For high-throughput APIs where JSON Schema validation adds 2–5ms per request, this matters.
The Layered Validation Architecture
In practice, the answer is not either/or. A robust API edge uses validation at multiple points with different responsibilities at each layer.
Each layer has a clear job:
Gateway: Fast, cheap checks. Is the content type correct? Is the body parseable? Is the payload under size limits? The gateway rejects malformed HTTP, not malformed business data.
Application edge: Full schema validation against the current version. Business rules (is the order total consistent with the line items?). This is where JSON Schema or Protobuf validation runs. Errors here return 4xx to the caller immediately.
Storage: Accepts only validated, well-formed records. Never write blind.
Consumer/read path: Projection, version mapping, and graceful handling of data that predates the current schema. This is your schema-on-read layer — intentional, bounded, and controlled.
Schema Evolution Without Coordination Pain
The biggest real-world problem with schema-on-write is the coordination cost of schema changes. You want to add a required field. Every producer must update. Every producer must deploy. You must coordinate or you break the API.
There are three patterns that reduce this pain significantly.
1. Make new required fields optional for one release cycle. Ship the schema change as optional. Communicate to producers they have one release cycle to add the field. After that cycle, promote it to required. This gives producers time without indefinite looseness.
2. Version at the resource level, not the API level.
Instead of /v2/orders, evolve the Order resource using content negotiation or an explicit schema version field. This lets you evolve individual resources without forking the entire API surface.
POST /orders
Content-Type: application/json
X-Schema-Version: 2025-09-03
{ "orderId": "...", "subtotalCents": 4500, "taxCents": 450 }3. Use an explicit schema registry for event-driven APIs. For Kafka-based or async APIs, a schema registry (Confluent Schema Registry, AWS Glue, Buf Schema Registry) makes backward compatibility enforcement automatic. Producers that break the schema fail at publish time, not at consumer time.
# Confluent Schema Registry — producer-side validation
from confluent_kafka import Producer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
schema_registry_conf = {"url": "https://schema-registry.internal"}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
avro_serializer = AvroSerializer(
schema_registry_client,
order_schema_str,
to_dict=lambda obj, ctx: obj.__dict__
)
producer = Producer({"bootstrap.servers": "kafka:9092"})
def publish_order(order: Order):
producer.produce(
topic="orders",
value=avro_serializer(order, SerializationContext("orders", MessageField.VALUE))
)
producer.flush()Runtime Cost: What the Numbers Actually Look Like
Runtime validation is not free. Here are representative numbers from production systems:
| Approach | Median latency add | P99 latency add | Notes |
|---|---|---|---|
| No validation | 0ms | 0ms | Baseline — bad idea |
| JSON Schema (ajv) | 0.8ms | 3.2ms | Depends on schema complexity |
| Zod (TypeScript) | 0.6ms | 2.1ms | Slightly faster than ajv for simple schemas |
| Protobuf decode | 0.1ms | 0.4ms | Binary — much faster |
| Pydantic (Python) | 1.2ms | 4.8ms | Slower per-request but cached model compilation |
For most APIs handling < 1,000 req/s, JSON Schema validation at the edge is invisible. At 10,000+ req/s on latency-sensitive endpoints, the difference between JSON Schema and Protobuf is worth measuring. Do not optimize prematurely, but do not ignore it if you are already hitting latency budgets.
Type Safety vs Flexibility: Where the Real Tension Lives
Type safety (schema-on-write, Protobuf, strict JSON Schema) optimises for correctness in a world where you control the contract. Flexibility (schema-on-read, permissive ingestion, late binding) optimises for evolvability in a world where you do not.
The mistake I see most often is applying the wrong model to the wrong side of the API. Teams use strict schema-on-write validation on public webhook receivers — then wonder why integrations keep breaking when third-party producers add new fields. They use schema-on-read for internal service-to-service APIs — then spend hours debugging why a consumer failed silently on malformed data from a colleague's service.
The rule of thumb: strict schema-on-write for APIs you produce, permissive schema-on-read with explicit projection for APIs you consume. Postel's law is still right — be conservative in what you send, liberal in what you accept. Just make "liberal" mean "project what you understand" rather than "silently accept garbage."
Key Takeaways
- Schema-on-write rejects bad data at the boundary, producing clean storage and fast error feedback — at the cost of tight coupling between schema changes and producer deployments.
- Schema-on-read stores data raw and interprets it at consumption time, enabling schema evolution without coordination — at the cost of delayed error discovery and potential for silent data corruption.
- Protobuf occupies a middle ground: schema enforced at serialization, backward-compatible evolution via field tagging, and significantly lower runtime overhead than JSON Schema.
- A layered validation strategy — structural checks at the gateway, full schema validation at the application edge, version-aware projection at the consumer — gives you the best of both approaches.
- Schema registries are non-optional for event-driven systems at scale; they make backward compatibility enforcement automatic and auditable.
- Match the model to the trust boundary: strict validation on APIs you control, liberal acceptance with explicit projection on APIs you consume from external parties.