// the find

sodadata/soda-core

★ 2,370 · Python · NOASSERTION · updated Jun 2026

Data Contracts engine for the modern data stack. https://www.soda.io

Soda Core is a data quality contract verification engine — you write YAML contracts defining expected schema, row counts, null rates, valid values, and freshness, then run them against your actual data sources. It supports Postgres, Snowflake, BigQuery, Databricks, DuckDB, Redshift, and more via a plugin-per-datasource architecture. Target audience is data engineers who want pipeline-level quality gates rather than ad-hoc queries.

The YAML contract syntax is genuinely clean and readable — defining a missing-value threshold or valid_values list takes three lines, not a custom SQL query. The plugin architecture (soda-postgres, soda-bigquery, etc.) means you install only what you need and each connector is independently versioned. The snapshot-based test infrastructure in soda-tests is well-designed — it intercepts real DB calls, records them, then replays for CI without requiring live connections. UV support is a real win for dependency management in a monorepo with this many packages.

The commercial cloud offering (Soda Cloud) is tightly woven into the architecture — remote execution, contract publishing, and anomaly detection all require it, so the open-source path is deliberately limited. The v3-to-v4 package rename (soda-core-postgres → soda-postgres) with no migration tooling will silently break anyone who upgrades without reading the docs. There is no built-in concept of contract inheritance or reuse — if 40 tables share the same schema contract, you copy-paste 40 YAML files. The telemetry module phones home by default; it is opt-out, not opt-in, which will block some enterprise adoption.

View on GitHub → Homepage ↗