// the find
fengdu78/Data-Science-Notes
数据科学的笔记以及资料搜集
A Chinese-language collection of Jupyter notebooks covering the standard data science stack: math foundations, Python, NumPy, pandas, scikit-learn, and basic deep learning. It's aimed at Chinese-speaking beginners who want worked examples alongside the theory from books like Li Hang's Statistical Learning Methods.
The coverage is logically sequenced — math before Python before ML — so a beginner can follow it top to bottom without jumping around. The feature engineering section is notably practical, translating a full book into runnable notebooks with real datasets. Including CS229 linear algebra and probability notes as starting material is a good call; most similar repos skip the math entirely. The numpy-100 exercises with hints and solutions in the same directory is a useful self-contained drill set.
The repo hasn't been touched since August 2021, so anything PyTorch or scikit-learn related is running against versions that have since had breaking API changes — cells will fail without pinned dependencies. There are no requirements files or environment specs beyond one environment.yml buried in the scikit-learn folder, so reproducing the notebooks is a manual dependency hunt. The deep learning section is thin: four PyTorch intro notebooks and a word2vec visualization doesn't get you far. The content is almost entirely in Chinese with no English translations, which limits its reach to a fraction of its apparent audience on GitHub.