Postgres as a Vector Database, Honestly
Every new AI-adjacent project eventually hits the same question: do we need a dedicated vector database, or can Postgres handle it? The honest answer is: Postgres can handle it — until it can't. Knowing exactly where that line sits will save you a painful migration later.
This post is not a tutorial on installing pgvector. It's a frank assessment of what pgvector does well, where it degrades, and what signals should push you toward a purpose-built store.
What pgvector Actually Gives You
pgvector adds a vector column type and two index strategies to Postgres. You get:
- Exact nearest-neighbor search (no index, brute-force)
- Approximate nearest-neighbor via IVFFlat or HNSW
- Standard Postgres ACID guarantees, joins, and filtering
- One fewer system to operate
The value proposition is real. If you have fewer than a few million vectors and your embedding dimension is ≤1536, pgvector will work fine and you won't need to babysit another cluster.
Index Types: IVFFlat vs HNSW
These two indexes make very different tradeoffs.
-- IVFFlat: partition vectors into lists, search a subset
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- HNSW: hierarchical graph, better recall at the cost of build time and RAM
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);IVFFlat splits your vectors into lists Voronoi cells at build time. At query time you probe probes cells (default 1). More probes = better recall, more latency. The index build is fast but recall degrades as your dataset grows unless you rebuild periodically (the centroids become stale).
HNSW builds a layered proximity graph. It has better recall than IVFFlat at equivalent latency, but the build is slower and the graph lives entirely in RAM. m controls graph connectivity (edges per node); ef_construction controls build-time search depth. Higher values improve recall but inflate memory.
A rough mental model for choosing:
| Situation | Pick |
|---|---|
| Dataset changes frequently | IVFFlat (fast rebuilds) |
| Dataset is relatively static | HNSW (better recall) |
| RAM is tight | IVFFlat |
| You need >95% recall | HNSW with high ef |
Recall vs Latency: The Uncomfortable Curve
Neither index gives you 100% recall by default. With IVFFlat at probes=1 on a 1M-vector dataset, you might see 70–80% recall. Bump probes to 10 and you recover to ~95%, but latency climbs proportionally.
HNSW is more forgiving but still requires tuning ef_search at query time:
-- Set per-session for HNSW query-time search width
SET hnsw.ef_search = 100;
SELECT id, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;The problem is that ef_search is a session-level setting. In a connection pool, you have to set it on every connection or use a SET LOCAL inside a transaction. This is a friction point that purpose-built databases handle transparently.
Where Postgres Starts to Buckle
Be honest with yourself when you hit these signals:
1. Index build time becomes a deployment blocker
HNSW index builds on 10M+ vectors can take hours. You can't build concurrently (without careful coordination), and you can't update the index incrementally — new rows enter but the graph quality degrades until a rebuild.
2. RAM pressure
The HNSW graph for 1M vectors at dimension 1536 can consume 6–10 GB of RAM depending on m. This competes directly with Postgres's shared_buffers and your regular relational workload. You're now tuning memory for two different access patterns on one machine.
3. Filtered vector search kills index selectivity
This is the one that bites teams hardest. You want vectors for user_id = 42 that are similar to a query. Postgres will use the vector index and then filter, or the B-tree index and then scan — it rarely picks both optimally.
-- Postgres struggles to use both indexes efficiently
SELECT id, embedding <=> $1 AS dist
FROM documents
WHERE tenant_id = 42
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1
LIMIT 10;Purpose-built vector DBs (Qdrant, Weaviate, Pinecone) solve this with native pre-filtering — they store metadata alongside vectors and filter before the ANN search. pgvector has no equivalent.
4. Dimension ceiling
pgvector caps dimensions at 2000. OpenAI's text-embedding-3-large is 3072 dimensions. You'd have to truncate, which costs recall.
5. Multi-tenancy at scale
Partitioning a vector table by tenant in Postgres works but adds operational weight. Each partition needs its own index. Queries across partitions don't benefit from ANN indexes. This is a known gap.
The Scale Ceiling in Practice
Based on real workloads, here's a rough ceiling chart:
Vectors | Dimension | Workload | pgvector verdict
-----------|-----------|-------------------|------------------
< 500K | ≤ 1536 | Low QPS (<50/s) | Excellent
500K–5M | ≤ 1536 | Medium QPS | Workable with tuning
5M–20M | ≤ 1536 | High QPS (>200/s) | Struggling
> 20M | Any | Any | Graduate
Any | > 2000 | Any | Not supportedThese aren't hard limits from a spec sheet — they're where teams actually start paging their on-call.
When to Graduate
Leave pgvector when any of these are true:
- You need filtered ANN with low latency (<50ms P99) on a large dataset
- Your vector dimension exceeds 1536 and you're losing recall from truncation
- HNSW index size is crowding out your working set
- You need real-time index updates without rebuild windows
- You're hitting multi-tenant isolation requirements that partition overhead can't meet
The migration path is not that painful. Your embeddings are just floats — export them, load them into Qdrant or Weaviate, update your query layer. Keep Postgres for the relational truth.
A Pragmatic Architecture
For most teams in 2026, this hybrid works well:
Run the ANN search in the vector store, get back IDs and scores, then JOIN against Postgres for the full record. This keeps your transactional data in Postgres (where it belongs) and offloads the ANN workload to a system designed for it.
Key Takeaways
- pgvector is production-ready for datasets under ~5M vectors with moderate QPS — don't over-engineer from day one.
- HNSW beats IVFFlat on recall for static datasets; IVFFlat wins when you rebuild frequently or need fast index builds.
- Filtered vector search is pgvector's Achilles' heel — metadata pre-filtering is where purpose-built stores earn their keep.
- The 2000-dimension cap is a hard blocker for modern large embedding models; plan for it.
- RAM pressure from HNSW graphs competes with your relational workload on the same machine — monitor this actively.
- Graduate to a dedicated vector store when P99 latency, dimension limits, or filtered search requirements exceed what Postgres can deliver cleanly.