// the find

ashishpatel26/Amazing-Feature-Engineering

★ 796 · Jupyter Notebook · updated Jun 2025

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

A structured reference collection covering the full feature engineering pipeline — cleaning, encoding, transformation, selection — with paired guide docs and scikit-learn notebooks for each technique. Aimed at data scientists who want a single place to look up how and why to apply a specific method, not just copy-paste code. The PDF guide is the real artifact; the notebooks are worked examples.

The paired guide-plus-demo structure is genuinely useful: every technique has both the rationale and runnable code, so you understand *why* WoE encoding exists before you implement it. Coverage of feature selection is broader than most tutorials — filter, wrapper, embedded, shuffling, and hybrid methods are all here with working examples. The Python module layout (feature_cleaning/, feature_engineering/, feature_selection/) means you can pull individual functions into a project without dragging in the whole notebook. Rare techniques like ChiMerge discretization and decision-tree-based binning are covered with actual demos, not just mentions.

Dependencies are pinned to Python 3.5–3.7 and scikit-learn 0.20 — that's five-plus years out of date, and several APIs have changed or been deprecated since then; expect breakage on anything modern. There's no coverage of time-series feature engineering, which is a significant gap for anyone working with temporal data. The feature generation section (crossing, ratio, polynomial) has guide links but several notebooks are missing — you get the theory but not the code. This is also firmly tabular-data territory: text, image, and embedding-based feature work are absent.

View on GitHub →