// the find

ysys143/pg_cuvs

★ 2 · C · PostgreSQL · updated Jun 2026

⚡ Blazing-fast GPU vector search for PostgreSQL powered by NVIDIA cuVS.

pg_cuvs is a PostgreSQL extension that routes vector similarity searches to an NVIDIA GPU via a sidecar daemon, using RAPIDS cuVS (CAGRA algorithm) as the search engine. It targets the gap between pgvector's CPU-bound HNSW and purpose-built vector databases — specifically the use case where you already have A100/H100 hardware and want GPU acceleration without leaving Postgres. The GPU build accelerator mode (build CAGRA on GPU, export as standard pgvector HNSW) is the most universally useful feature, offering 13x faster index builds with recall=1.0.

The sidecar architecture (single CUDA context per host, shared via IPC) is the right call — per-backend CUDA init would be unusable at any real connection count. The fail-closed-then-fallback model is production-honest: if the GPU daemon dies, queries fall through to CPU HNSW rather than erroring. The CI strategy of running the full IPC/correctness suite against a CPU shim on every PR is smart — it catches the bug class (struct drift, mode labeling, manifest contract) that actually ships bugs, and gates the expensive A100 run as on-demand only. The known-limitations table is refreshingly direct about what doesn't work (no CONCURRENTLY, pgvector layout pin, DiskANN no-go) rather than hiding it.

All benchmarks are on synthetic random or synthetic clustered data; the Cohere 1M×1024 comparison in BENCHMARK.md is the only real-embedding result, and recall numbers on random uniform vectors are meaningless for production workloads where embeddings cluster heavily — the 0.978 CAGRA recall figure will look different on MTEB-style datasets. The extension hardcodes the pgvector on-disk page format in hnsw_export.c, which means any pgvector minor update that touches HNSW internals silently breaks the export path — there's no runtime format version check, only a build-time constant. The cost model crossover (GPU pays off at N≈50K) is stated as a single number derived from synthetic clustered data and a specific A100 configuration; on L4 or H100 with different PCIe bandwidth and IPC latency, that number shifts and there's no tooling to measure it for your own setup. Zero stars/forks as of today means this has not been validated by anyone other than the author on their specific GCP A100 VM.

View on GitHub →