// the find
toluaina/pgsync
Postgres to Elasticsearch/OpenSearch sync
PGSync is a CDC tool that reads from PostgreSQL/MySQL WAL or binlog and keeps Elasticsearch/OpenSearch indexes in sync. You define a JSON schema describing how to denormalize your relational tables into nested documents, and PGSync generates the SQL and handles incremental updates. It's aimed at teams who want Postgres as their write path but need fast full-text or faceted search without hand-rolling ETL.
The JSON schema approach for defining denormalization is genuinely well-thought-out — you describe parent/child relationships and PGSync generates the JOIN queries, which saves real work for anyone maintaining complex nested documents. Redis-based checkpointing means crashes don't cause full resyncs, which matters for large datasets. The plugin system for transforms (including LLM embedding plugins for OpenAI, Cohere, Anthropic) is a practical addition for teams adding semantic search alongside keyword search. Test coverage is solid with a dedicated regression test file and fixtures, which is a good sign for something that touches production data pipelines.
Redis is listed as optional in WAL mode but in practice most production setups need it for checkpointing, so the 'optional' framing undersells the operational complexity — you're now running Postgres, Elasticsearch, and Redis as required infrastructure. The transform system (replace, rename, concat) covers only the simplest cases; anything beyond string manipulation requires writing a plugin in Python, which breaks the 'zero code' promise pretty quickly. MySQL/MariaDB support is newer and the examples are all Postgres-centric, so adopters on those databases will hit underdocumented edge cases. There's no built-in way to handle schema migrations — if you add a column to Postgres, you need to manually update the schema.json and re-bootstrap, which is easy to forget and causes silent drift.