// the find

microsoft/nlp-recipes

★ 6,437 · Python · MIT · updated Aug 2022

Natural Language Processing Best Practices & Examples

A Microsoft research repo of Jupyter notebooks and Python utilities covering classic NLP tasks — text classification, NER, summarization, QA, sentence similarity — built on top of Hugging Face transformers and Azure ML. Aimed at data scientists who want working examples of fine-tuning BERT/XLNet/RoBERTa rather than a production-ready library. Last commit was August 2022.

The notebook coverage is genuinely broad: eight distinct NLP scenarios each with multiple model variants and end-to-end examples including distributed training on Azure ML. The utils_nlp layer wraps Hugging Face's transformers in a way that reduces boilerplate for common fine-tuning patterns without hiding too much. Multi-language support is a real first-class concern, not an afterthought — text classification notebooks cover nine languages including Arabic, Hindi, and Japanese. Test infrastructure is solid for a research repo: unit, smoke, and integration tests with separate CPU/GPU CI pipelines.

Abandoned since 2022 — the whole premise was 'SOTA' circa BERT/XLNet, which is two generations behind. Any model listed here has been superseded; running these notebooks today will hit broken dependency pins and API incompatibilities with current transformers versions. The Azure ML coupling is deep and non-optional in many notebooks, so local experimentation without an Azure subscription is a dead end. NER support is English-only despite the repo's multilingual ambitions. There is no inference or deployment story beyond ACI/AKS examples — you get fine-tuning scaffolding but nothing for serving the result.

View on GitHub →