// the find

opendatadiscovery/odd-platform

★ 1,407 · Java · Apache-2.0 · updated Jun 2026

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

ODD Platform is a self-hosted data catalog and lineage tool for data engineering teams. It ingests metadata from 50+ connectors (Snowflake, BigQuery, Airflow, dbt, Spark, etc.), builds end-to-end lineage graphs, and surfaces data quality test results. The target is mid-to-large data teams who need a single place to discover what data exists, who owns it, and whether it's healthy.

The connector ecosystem is genuinely wide — not just databases but also orchestrators (Airflow), BI tools (Tableau, Superset), ML platforms (SageMaker, MLflow, Kubeflow), and streaming (Kafka, Kinesis). The open spec (opendatadiscovery-specification) means you can write your own collector without waiting on maintainers. It uses R2DBC and reactive Spring throughout the backend, so it won't fall over under concurrent ingestion load. RBAC, OAuth2/OIDC, LDAP, and Cognito are all supported — enterprise auth is not an afterthought here.

The collector agents are a separate Python project (odd-collectors), which means you're operating two different runtimes just to get data in — operationally annoying. The data model is rigid around seven fixed entity types (Dataset, Transformer, Consumer, etc.); if your pipeline topology doesn't map cleanly to those, you'll be doing creative workarounds. At 1.4k stars and a small named team, bus factor is real — community contribution is thin outside the core org. Documentation exists but is sparse on production tuning; if Postgres performance degrades with millions of entities, you're mostly on your own.

View on GitHub → Homepage ↗