Skip to main content
Data

The Postgres Index Nobody Teaches

Ravinder··9 min read
DataPostgresIndexesPerformanceDatabase
Share:
The Postgres Index Nobody Teaches

Most engineers reach for CREATE INDEX and get a B-tree. That's fine — B-tree is the Swiss Army knife of indexes and it's right most of the time. But Postgres ships four other access methods that outperform B-tree for specific workloads by orders of magnitude, and the majority of teams never learn they exist until they're already in trouble.

This post covers BRIN, GIN, and GiST: what they are, how they work, and the exact query patterns where they win. These are not academic curiosities — they're production tools you should be reaching for regularly.

Why B-tree Falls Short

B-tree works by maintaining a sorted tree of key values. Equality lookups are O(log n). Range scans are efficient. Point lookups on high-cardinality columns are B-tree's sweet spot.

It starts to struggle when:

  • The column has very low cardinality relative to table size (timestamps on append-only tables)
  • The data type is complex (arrays, JSONB, geometric types, tsvector)
  • The index represents a containment or overlap relationship, not a simple ordering
  • The table is enormous and you're okay with approximate range queries

Let's look at what Postgres offers instead.

BRIN: Block Range INdexes

BRIN is the most misunderstood index in Postgres. It's tiny, it builds in seconds, and for the right workload it's dramatically more effective than B-tree.

How it works: BRIN divides the table into fixed-size block ranges (default 128 pages). For each range, it stores only the minimum and maximum value of the indexed column. A query against a BRIN index eliminates ranges where the query value falls outside the min/max, then does a sequential scan over remaining ranges.

This is only useful when there's a natural correlation between physical row order and column values — i.e., rows are inserted in roughly sorted order by that column. Time-series tables, append-only event logs, and order tables with sequential IDs all fit this pattern perfectly.

-- For a 500M-row event table where events are inserted in time order:
CREATE INDEX CONCURRENTLY events_created_brin
ON events USING brin (created_at)
WITH (pages_per_range = 64);
 
-- vs B-tree for comparison:
-- B-tree size: ~11 GB
-- BRIN size: ~48 KB
-- Yes, that's KB, not MB

The size difference is real. A BRIN index on a 500M-row table with timestamps can fit in a single Postgres page. A B-tree on the same column would be gigabytes.

graph LR subgraph Table Blocks B1["Blocks 1-64\nmin: Jan 1\nmax: Jan 3"] B2["Blocks 65-128\nmin: Jan 3\nmax: Jan 7"] B3["Blocks 129-192\nmin: Jan 7\nmax: Jan 12"] end Q["Query: WHERE created_at\nBETWEEN Jan 4 AND Jan 6"] --> B1 Q --> B2 Q --> B3 B1 -- "skip (max < Jan 4)" --> X1[Eliminated] B2 -- "scan (overlaps)" --> Y[Sequential Scan] B3 -- "skip (min > Jan 6)" --> X2[Eliminated]

When BRIN wins:

-- ✓ Append-only time-series: excellent
SELECT * FROM sensor_readings
WHERE recorded_at BETWEEN '2026-01-01' AND '2026-01-31';
 
-- ✓ Sequential ID ranges on insert-ordered tables
SELECT * FROM orders
WHERE order_id BETWEEN 10000000 AND 10500000;
 
-- ✗ Updated rows that get physically relocated: BRIN correlation breaks
-- ✗ Random-access lookups by email, UUID, etc.

Check your column's correlation before creating BRIN:

SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'events'
  AND attname = 'created_at';
-- correlation close to 1.0 or -1.0 = good BRIN candidate
-- correlation close to 0.0 = stick with B-tree

GIN: Generalized Inverted iNdex

GIN is built for composite values — columns where a single value contains multiple searchable elements. Arrays, JSONB, and full-text search (tsvector) are GIN's native territory.

How it works: GIN maintains a posting list: for each element that appears in any row, GIN records which rows contain it. Lookups ask "which rows contain element X?" and GIN answers in O(1) per element.

-- Full-text search: the canonical GIN use case
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
  ) STORED;
 
CREATE INDEX CONCURRENTLY articles_search_gin
ON articles USING gin (search_vector);
 
-- Now this is fast:
SELECT id, title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgres & index')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'postgres & index')) DESC
LIMIT 20;

GIN for JSONB is where teams leave significant performance on the table by defaulting to a computed B-tree on a specific JSONB key. If your query patterns vary or you're searching across multiple keys, GIN on the whole JSONB column is far more flexible:

-- Instead of multiple partial B-tree indexes:
CREATE INDEX ON events ((payload->>'event_type'));
CREATE INDEX ON events ((payload->>'user_id'));
 
-- One GIN index handles arbitrary key containment:
CREATE INDEX CONCURRENTLY events_payload_gin
ON events USING gin (payload jsonb_path_ops);
 
-- Supports containment queries efficiently:
SELECT * FROM events
WHERE payload @> '{"event_type": "purchase", "user_id": "u_123"}';

jsonb_path_ops vs jsonb_ops: jsonb_path_ops is smaller and faster for @> containment queries. jsonb_ops supports ? (key exists) and ?| / ?& operators but builds a larger index. Prefer jsonb_path_ops unless you need key-existence checks.

GIN for arrays:

CREATE TABLE posts (
  id bigint PRIMARY KEY,
  tags text[]
);
 
CREATE INDEX CONCURRENTLY posts_tags_gin
ON posts USING gin (tags);
 
-- Find posts with any of these tags:
SELECT id FROM posts WHERE tags && ARRAY['postgres', 'performance'];
 
-- Find posts with ALL of these tags:
SELECT id FROM posts WHERE tags @> ARRAY['postgres', 'performance'];

GIN build time warning: GIN index builds are slower than B-tree. For large tables, always use CONCURRENTLY. Also, GIN indexes have a pending list (fastupdate) that gets merged periodically — this can cause occasional latency spikes on write-heavy tables. Tune gin_pending_list_limit if you see this.

-- Disable fastupdate if write latency spikes are unacceptable:
CREATE INDEX ON posts USING gin (tags) WITH (fastupdate = off);

GiST: Generalized Search Tree

GiST is a framework for building custom index types. Postgres ships GiST implementations for geometric types, range types, and full-text search. The key property: GiST supports overlap, containment, and nearest-neighbor queries that B-tree cannot express.

Range type indexing:

-- Schedule table with time ranges
CREATE TABLE bookings (
  id bigint PRIMARY KEY,
  room_id int,
  during tstzrange
);
 
CREATE INDEX CONCURRENTLY bookings_during_gist
ON bookings USING gist (during);
 
-- Find conflicting bookings (overlapping ranges):
SELECT * FROM bookings
WHERE during && tstzrange('2026-03-10 14:00', '2026-03-10 16:00', '[)');
 
-- Find bookings that contain a specific point:
SELECT * FROM bookings
WHERE during @> '2026-03-10 15:00:00+00'::timestamptz;

B-tree cannot index ranges as ranges — you'd need two separate B-tree indexes on start and end timestamps and combine them with application logic. GiST does it natively.

Geometric queries:

-- PostGIS uses GiST for spatial indexes
CREATE INDEX ON locations USING gist (geom);
 
-- Find all points within 10km of a location:
SELECT id, name
FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-122.4, 37.8)::geography, 10000);

GiST vs GIN for full-text search: GiST can also index tsvectors, but GIN is almost always faster for read-heavy workloads. Use GiST for tsvector only when the table is heavily updated (GiST updates in-place; GIN maintains a pending list).

flowchart TD Q[What are you indexing?] --> A{Data type?} A --> B["Simple scalar\n(int, text, timestamp)"] A --> C["Composite\n(array, JSONB, tsvector)"] A --> D["Geometric / Range\n(PostGIS, tstzrange)"] A --> E["Large append-only\ntime-ordered column"] B --> F[B-tree] C --> G[GIN] D --> H[GiST] E --> I{Correlation > 0.9?} I -- Yes --> J[BRIN] I -- No --> F

Real Workload Patterns

Here's how these indexes map to common application patterns:

Multi-tenant SaaS audit log:

-- events table: 2 billion rows, insert-only, ordered by created_at
-- tenant_id has ~50K distinct values
 
-- BAD: B-tree on created_at (11 GB index, slow builds)
-- GOOD: BRIN on created_at (48 KB), B-tree on tenant_id
CREATE INDEX CONCURRENTLY events_tenant_btree ON events (tenant_id);
CREATE INDEX CONCURRENTLY events_created_brin ON events USING brin (created_at);
 
-- Query hits tenant_id B-tree first (high selectivity), then BRIN filters time range
SELECT * FROM events
WHERE tenant_id = 1234
  AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC
LIMIT 100;

Product search with faceted filtering:

-- products table with JSONB attributes and text search
CREATE INDEX CONCURRENTLY products_search_gin ON products USING gin (search_vector);
CREATE INDEX CONCURRENTLY products_attrs_gin ON products USING gin (attributes jsonb_path_ops);
 
-- Full-text + attribute filter in one pass:
SELECT id, name, price
FROM products
WHERE search_vector @@ to_tsquery('english', 'wireless & headphones')
  AND attributes @> '{"brand": "sony", "noise_cancelling": true}'
ORDER BY ts_rank(search_vector, to_tsquery('english', 'wireless & headphones')) DESC;

Reservation system with conflict detection:

CREATE INDEX CONCURRENTLY reservations_during_gist ON reservations USING gist (during);
CREATE INDEX CONCURRENTLY reservations_resource_btree ON reservations (resource_id);
 
-- Check for conflicts before inserting:
SELECT EXISTS (
  SELECT 1 FROM reservations
  WHERE resource_id = $1
    AND during && $2::tstzrange
    AND status != 'cancelled'
);

Key Takeaways

  • BRIN is the right choice for large, append-only, time-ordered tables — it can be thousands of times smaller than a B-tree with comparable range query performance, but only when physical row order correlates with column values.
  • GIN excels at composite values: use it for full-text search (tsvector), JSONB containment queries, and array overlap/containment — a single GIN index often replaces multiple partial B-tree indexes.
  • GiST is the tool for overlap and containment semantics on ranges and geometric types — anything involving scheduling conflicts, spatial proximity, or range intersection should reach for GiST first.
  • Always check pg_stats.correlation before choosing BRIN; a correlation below 0.9 means BRIN won't help much.
  • GIN's fastupdate can cause latency spikes on write-heavy tables — disable it or tune gin_pending_list_limit if you see periodic write latency outliers.
  • Use CONCURRENTLY for all index builds on production tables — it's not optional when availability matters.