finds.dev← search

// the find

whylabs/whylogs

★ 2,820 · Jupyter Notebook · Apache-2.0 · updated Jan 2025

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

whylogs generates statistical profiles of datasets — distributions, cardinality, missing values — that you can diff over time to catch data drift, schema changes, and training/serving skew. It's Python-first (with a Java port that's clearly not the priority) and designed to run inline in your pipeline with minimal overhead. The primary delivery mechanism is their hosted WhyLabs platform, which the library pushes you toward hard.

Profile mergeability is the genuinely clever part — you can compute profiles in parallel across partitions and merge them, which means it scales with Spark or Dask without changing your code. The sketch-based algorithms (HLL for cardinality, KLL for distributions) keep profiles compact and fixed-size regardless of data volume. Constraints API gives you data unit tests that actually run in CI. Benchmark numbers are credible and show sub-5% overhead even at 100GB.

The library is structured as a WhyLabs acquisition funnel — most of the interesting monitoring features (anomaly detection, alerting, dashboards) require their SaaS platform, so the open-source value is 'collect profiles locally' and not much else. Last push was January 2025, and the Java implementation is effectively abandoned (sparse tests, no real feature parity). Anonymous telemetry is on by default, which is annoying for teams with data governance requirements. The visualization layer requires a separate install and only works in Jupyter, so if you're not notebook-first it's dead weight.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →