// the find

BayesWitnesses/m2cgen

★ 2,988 · Python · MIT · updated Aug 2024

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

m2cgen takes a trained scikit-learn, XGBoost, or LightGBM model and spits out pure native code in your target language — no runtime dependencies, no inference server, no numpy. Useful when you need to run predictions inside a Java service, a C embedded system, or anywhere Python can't follow. Covers 16 languages and the major model families (linear, tree, ensemble, boosting).

The zero-dependency output is the whole point and it actually works — generated code is just arithmetic and conditionals, nothing to install or version-pin. Language breadth is genuinely wide: C, Rust, Haskell, Elixir — not just the obvious JVM/JS targets. The architecture is clean: assemblers build a language-agnostic AST, interpreters walk it per language, so adding a new language is isolated work. CLI + pickle support means you can wire it into a build pipeline without writing Python glue code.

Last commit was August 2024 and activity has been sparse for a couple of years — XGBoost and scikit-learn move fast, so version compatibility will eventually rot. Neural networks are a hard no: this only works for classical ML models, which is fine but limits the audience as most new production models aren't linear regressions. Large ensemble models (500-tree random forests) generate files with thousands of nested conditionals that will hit compiler inlining limits or stack-overflow at runtime — the FAQ acknowledges recursion errors but the fix ('reduce estimators') is a real operational constraint. No quantization or pruning hooks, so model size in the generated output is whatever the trained model is.

View on GitHub →