// the find
lonePatient/Bert-Multi-Label-Text-Classification
This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.
A fine-tuning harness for BERT, XLNet, and ALBERT on multi-label classification tasks, built around the Jigsaw toxic comment dataset. Targets ML practitioners who want a working training loop without building one from scratch. The AUC numbers (~0.99) look good on the example task.
The optimizer collection is genuinely useful — AdamW, LAMB, RAdam, Lookahead, and several others are all implemented and swappable without touching training code. Training monitor with loss curves and early stopping is included out of the box, which saves real setup time. The tips section in the README is honest and specific: the 512 token limit caveat, the TF checkpoint naming gotcha, and the multi-GPU DataParallel limitation are all things that would cost someone an afternoon to figure out independently.
Pinned to transformers==2.5.1, which is ancient — the current library is at 4.x and the API is completely different, so this won't run against a modern HuggingFace setup without significant porting work. The ALBERT model implementation is vendored locally as a full copy of modeling files rather than imported from the library, which means it's frozen in 2019 state and will diverge from any upstream fixes. Last commit was 2023 but the underlying dependency is effectively 2019-era code. No tests anywhere in the repo — the only validation is running the training script end to end.