// the find
trekhleb/homemade-machine-learning
🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
From-scratch Python implementations of classic ML algorithms — linear regression, logistic regression, k-means, anomaly detection, and a basic MLP — each paired with an interactive Jupyter notebook and the math explained alongside the code. This is a learning resource, not a library; the implementations are intentionally naive so the mechanics are visible. It's for people who want to understand what's happening inside sklearn, not use it faster.
The math-to-code pairing is the real value here: each algorithm has a README explaining the formulas, then a Python file that maps those formulas directly to numpy operations — no magic, no abstraction layers hiding the gradient descent loop. The Jupyter demos are well-chosen: they use real datasets (MNIST, Fashion MNIST, Iris, World Happiness) so you can see the algorithm doing something meaningful rather than fitting a sine wave. The feature engineering utilities (polynomial expansion, sinusoid features, normalization) are implemented as reusable pieces rather than inlined, which keeps the algorithm implementations clean. Binder support means someone can run everything in a browser with zero local setup, which matters a lot for a teaching resource.
The repo covers only Andrew Ng's Coursera curriculum circa 2017 — there's no tree-based stuff (decision trees, random forests, gradient boosting), no SVMs, no attention, nothing that wasn't in that course. The MLP implementation is a basic backprop loop with sigmoid activations and no batching beyond what you manually set up, so it hits its ceiling fast on anything real. Last meaningful commit appears to be years old; the codebase hasn't tracked Python or numpy API evolution, and requirements.txt will likely need manual pinning to avoid breakage. If you're past the basics-of-gradient-descent stage, you'll outgrow this in an afternoon.