finds.dev← search

// the find

microsoft/SynapseML

★ 5,229 · Scala · MIT · updated Jun 2026

Simple and Distributed Machine Learning

SynapseML is a Spark ML library from Microsoft that wraps external services (Azure OpenAI, Cognitive Services, LightGBM, ONNX, Vowpal Wabbit) behind a SparkML-compatible pipeline API. It's for data scientists and engineers already running Spark workloads who want to call external AI APIs or run distributed inference at scale without leaving their existing Spark/Databricks/Fabric environment. Not a general ML framework — it's glue code, and deliberately so.

The SparkML API compatibility is the real value here: you can slot an OpenAI embedding call or a LightGBM trainer into an existing pipeline `.fit()`/`.transform()` chain without rewriting anything. The HTTP-on-Spark abstraction for hitting arbitrary REST endpoints in a distributed fan-out is genuinely useful for batch inference at scale. ONNX model serving on Spark workers for hardware-accelerated inference without a separate serving tier is a legitimate architectural win. The binding autogeneration for PySpark and SparklyR means the Scala core gets Python and R wrappers without manual maintenance, which matters for a library this wide.

Heavy Azure coupling throughout the cognitive services modules — if you're not already on Azure, most of the interesting features require Azure endpoints and credentials, making this effectively an Azure SDK that runs on Spark. Build complexity is brutal: SBT + Spark + Scala 2.12 + a custom Azure pipeline setup means getting a dev environment working is a non-trivial afternoon even with the Doctor scripts. The R bindings are explicitly flagged as incomplete in the README. Deep dependency on deprecated or rebranding services (Cognitive Services → Azure AI) means wrapper classes are in a constant state of partial update — don't be surprised if a specific endpoint wrapper lags a service version behind.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →