finds.dev← search

// the find

4paradigm/OpenMLDB

★ 1,688 · C++ · Apache-2.0 · updated Jun 2026

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.

OpenMLDB is a specialized database built to solve the training/inference skew problem in ML feature engineering. You write SQL once, deploy it, and get consistent feature computation for both offline training and online inference at millisecond latency. It's for ML teams that are tired of maintaining two separate feature pipelines — one in Python for training, one in C++ or Java for serving.

The unified execution plan generator that guarantees offline/online consistency is the core value prop and appears to be genuinely engineered (see the VLDB 2021 paper, not just marketing). The real-time SQL engine is built from scratch and optimized specifically for time-series window operations, which is exactly the bottleneck in production ML feature serving. SQL extensions like LAST JOIN and WINDOW UNION cover patterns that normal SQL can't express cleanly for feature engineering. Connector ecosystem is solid — Kafka, Pulsar, RocketMQ, Airflow, and DolphinScheduler integrations mean you can plug this into an existing data stack without major surgery.

This is a C++ database you have to operate yourself, not a managed service — deployment involves ZooKeeper, tablet servers, and nameservers, which is significant operational overhead before you write a single feature. It's built on a forked Spark distribution for offline processing, meaning you're betting on 4Paradigm keeping that fork current with upstream Spark; that's a real long-term risk. SQL support is deliberately restricted to patterns that can be efficiently served online, so if your feature logic doesn't fit the WINDOW/LAST JOIN model you'll hit walls quickly. The English documentation is thinner than the Chinese documentation, and the community leans heavily Chinese-speaking, which matters when you're debugging production issues at 2am.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →