// the find
alibaba/Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Alink is Alibaba's ML library built on top of Apache Flink, giving you batch and streaming algorithm pipelines in a single API. It covers the usual supervised/unsupervised suspects plus recommender systems, graph algorithms, and feature engineering — all designed to run distributed on a Flink cluster. It's aimed at data engineers who are already running Flink and want ML without switching stacks.
The algorithm coverage is genuinely wide: ALS, GBDT, XGBoost, Word2Vec, FM, graph embedding, and more — most teams would find what they need without rolling their own. The Pipeline API mirrors scikit-learn's fit/transform pattern, so the mental model is familiar even if the Java verbosity isn't. PyAlink gives Python users a real entry point without requiring them to write Java, and it supports Jupyter notebooks well. Running natively on Flink means you get fault tolerance, checkpointing, and horizontal scaling essentially for free.
The last commit was June 2024 and the most recent supported Flink version is 1.13, which is two major versions behind current Flink (1.19 as of mid-2024) — adopting this means running outdated infrastructure. The README is mostly in Chinese with a thin English translation; the English docs are clearly an afterthought, and the external tutorial site (alinklab.cn) may or may not be maintained. The Python version is locked to Python 3.6–3.8, which is end-of-life; you can't use it on any modern Python runtime without pain. There's no obvious path for GPU acceleration or integration with modern LLM-based feature pipelines, which limits its relevance for current ML workloads.