// the find
amutu/zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
zhparser is a PostgreSQL extension that adds a Chinese text search parser built on top of SCWS (Simple Chinese Word Segmentation). It plugs into Postgres's existing full-text search infrastructure so you can use `to_tsvector`, `to_tsquery`, and GIN indexes on Chinese text without a separate search engine. Aimed at developers who need Chinese FTS inside Postgres and don't want to bolt on Elasticsearch.
Native Postgres integration means you get ts_vector, ts_query, GIN indexes, and ranking functions without changing your query model. Custom dictionary support (TXT or binary XDB format) lets you tune segmentation for domain-specific terms — useful for technical or legal corpora where the default SCWS dictionary gets words wrong. The v2.1 per-database custom word table (`zhprs_custom_word`) is a real improvement over file-based dicts: you can add terms with SQL and sync them without restarting Postgres. Docker images for multiple PG versions (15, 16 on Debian/Alpine) make local testing quick.
The underlying SCWS library hasn't had a meaningful update in years and the dictionary is showing its age — segmentation quality on modern internet Chinese (slang, brand names, technical terms) is noticeably worse than jieba or LTP. Installation is a two-step process (build SCWS from source, then build zhparser) with no pre-built binaries for common distros, which makes production deployment on managed Postgres (RDS, Cloud SQL, Supabase) essentially impossible. The README is almost entirely in Chinese with no English translation of the configuration options or custom dictionary format, which will frustrate anyone outside China. `dict_in_memory` and `extra_dicts` require a backend restart to take effect — you can't hot-reload a custom dictionary in a busy production database without disrupting connections.