Collapsing Nodes: Why We’re Moving Hybrid Search to PostgreSQL
Running a bootstrapped dev lab means getting maximum output from existing resources. Reducing infra footprint is a large part of this approach - every new node brought online needs attention (even when managed) and adds extra network hop. The latter does not seem important enough but it does add up to the list of latency concerns and failure modes.
Another way to look at it - spread out infra is a liability and reducing amount of nodes creates more resilient system.
Since I always look for the ways to collapse the nodes, the pg_textsearch release is quite exciting.
Lab’s projects require a solid search implementation across multiple document formats. The first iteration of vector search implementation relied on neo4j and was later moved to pgvector - a decision that improved latency, complexity and future costs.
While the vector search worked fine contextually, the result consistency was not not stable, swinging both ways: either failing to match clearly present documents or returning unrelated entries.
The accuracy problems could usually be traced to one of these instances:
- Typos in search term. Because of how vector search works, the results are often irrelevant noise.
- Non-scientific approach to similarity ratio cut-off. For some searches returning chunks with match over .85 generally produced good results, but for others it had to be dialed up to .90.
- Exact match for combination words like document id
The first problem was partially solvable with using pg_trgm and collating two results to distinguish “typo” vs “a special term”. However combining vector with pg_trgm reduces typo sensitivity but only barely improves accuracy as trigrams lack corpus-aware ranking.
To address all issues required hybrid search with BM25 empowered algorithm. Adding OpenSearch provided significantly better results, but just like with neo4j cluster, came with added infra complexity, latency and availability delay - newly added documents were not fully searchable until data sync completion.
In October 2025 Tiger Data announced pg_textsearch, a PostgreSQL extension supporting BM25 enabled text search. Even better, they made a decision to open source it. While this is a first release and not yet production ready, the promise is quite exciting.
Migrating to pg_textsearch removes dependency on OpenSearch, making for a single target architecture. Instead of requiring data sync, same document index table can be used for hybrid search, with vector side pulling results from encoded chunk column and bm25 index using raw text chunk. Add to this RRF re-ranker and it’s a simplest search implementation that solves most issues on our list.
Truly a great addition to PostgreSQL ecosystem and another step toward “all you need is PostgreSQL” simplification.