finds.dev← search

// the find

salesforce/TransmogrifAI

★ 2,276 · Scala · BSD-3-Clause · updated Jun 2026

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

TransmogrifAI is a Salesforce-built AutoML library for Spark that wraps SparkML with a type-safe Scala DSL. It automates feature engineering, model selection via cross-validated hyperparameter search, and basic feature validation (sanity checks, correlation filtering). Target audience is data engineering teams already running Spark who want to reduce time-to-first-model without writing boilerplate pipeline code.

Compile-time type safety on features is genuinely useful — passing the wrong feature type to a transformer is a runtime crash in vanilla SparkML, here it's a compile error. The SanityChecker that detects label leakage and removes correlated/low-variance features automatically is a real time saver that most teams implement badly by hand. The LOCO (Leave One Covariate Out) record-level insights are more honest than feature importances — you get per-prediction explanations, not just global averages. The CLI scaffolding tool that generates a working Spark project from your Avro schema is a practical touch that cuts project setup from an afternoon to minutes.

Pinned to Spark 2.4 and Scala 2.11 — both are years past EOL. In 2026 this means fighting dependency hell with anything modern, and it's unclear whether Salesforce is actively maintaining it for Spark 3.x given the last published stable is 0.7.0. Deep learning is absent — no neural net model selectors, which matters for text-heavy datasets where the SmartTextVectorizer's TF-IDF approach will underperform. The automation is still structured-data-only; if your features are images, time series, or raw text at scale, you're outside the happy path. Build tooling is Gradle with a large multi-module setup that requires significant JVM memory to compile — new contributors routinely hit OOM on first build.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →