// the find
timescale/pg_textsearch
PostgreSQL extension for BM25 relevance-ranked full-text search. Postgres OSS licensed.
pg_textsearch is a Postgres C extension from Timescale that adds BM25 relevance-ranked full-text search via a custom index access method. It slots in as `CREATE INDEX ... USING bm25(...)` and queries with a `<@>` operator, so it feels like a first-class Postgres feature rather than a bolt-on. Aimed at teams already on Postgres who want better ranking than GIN/tsvector without standing up a separate search service.
Block-Max WAND optimization for top-k queries is a real win — most FTS workloads are `LIMIT 10` queries, and BMW skips posting blocks that can't reach the cutoff score, which is the same trick Lucene uses. The on-disk memtable architecture is thoughtfully designed: WAL-logged via GenericXLog, no shared-memory state, no custom WAL resource manager, so streaming replication and crash recovery just work. Test coverage is unusually thorough for a C extension — 70+ .sql test files plus separate scripts for replication, crash safety, concurrent build, and WAL audit. Parallel index builds are supported and the README honestly documents the `maintenance_work_mem >= 64MB` prerequisite instead of hiding it.
Phrase queries are not supported — the workaround (BM25 + ILIKE post-filter) is a hack that can silently under-return results if the over-fetch LIMIT is too small. Cross-partition BM25 scores are not comparable, which is a correctness problem for anyone using range-partitioned tables and wanting to rank across partitions — the README mentions it but the recommendation to 'query individual partitions' is often not practical. No background compaction yet, so write-heavy workloads see compaction latency inline during spills. Only supports PostgreSQL 17 and 18, which rules out anyone running 15 or 16 on managed cloud providers that haven't upgraded.