// the find

mljar/mljar-supervised

★ 3,266 · Python · MIT · updated Jun 2026

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

mljar-supervised is a Python AutoML library for tabular data that trains multiple algorithms (XGBoost, LightGBM, CatBoost, RF, linear, NN), does feature engineering, and generates detailed Markdown reports per model including SHAP plots and decision tree visualizations. It targets data scientists who want a sklearn-compatible fit/predict interface without writing the full pipeline themselves. The automatic documentation is the standout differentiator over competitors like FLAML or AutoSklearn.

The per-model Markdown reports with SHAP dependence plots, decision tree visualizations, and coefficient tables are genuinely useful — you can audit what the model learned, not just its score. The four modes (Explain/Perform/Compete/Optuna) map cleanly to real use cases and set sensible defaults for each without requiring expert configuration. Fairness-aware training with sample weighting and demographic parity metrics is built in from v1.0, not bolted on as an afterthought. Resume-on-interrupt is handled by auto-saving every model to disk, so a 48-hour training job that crashes at hour 47 restarts from where it left off.

The Mercury web app integration ties you to a second MLJAR product for deployment, which is a meaningful dependency risk if you want to productionize models outside their ecosystem. Optuna mode doesn't save intermediate models — only the final tuned one — so if tuning gets interrupted you lose everything, which directly contradicts the resume-on-interrupt feature that makes the other modes useful. The 'not-so-random-search' hyperparameter method used in the main modes is essentially random search with a small hill-climbing step; it's documented honestly but will underperform Bayesian methods on expensive model families like CatBoost with large datasets. Time-series support is absent: no native handling of temporal splits, leakage prevention, or forecasting targets.

View on GitHub → Homepage ↗