// the find

bojone/bert4keras

★ 5,419 · Python · Apache-2.0 · updated Nov 2024

keras implement of transformers for humans

bert4keras is a Keras-native reimplementation of BERT and a family of transformer variants (RoBERTa, ALBERT, T5, ELECTRA, GPT-2, RoFormer, and others), written by Jianlin Su — the researcher behind RoPE. It targets Chinese NLP practitioners who want to load pretrained Chinese transformer weights and fine-tune or modify the internals without fighting a deep dependency chain. If you are working in English with PyTorch, nothing here is for you.

The codebase is genuinely small and readable — the entire library is six files, and the transformer implementation in models.py is meant to be read and hacked, not treated as a black box. Su has shipped several architectural ideas here first (RoFormer/RoPE, GatedAttentionUnit, GlobalPointer) before they spread elsewhere, so the examples are not textbook exercises. Weight loading covers an unusually wide range of Chinese pretrained checkpoints including NEZHA, LaBSE, and several Chinese GPT-2 variants that nothing else supports cleanly. The AutoRegressiveDecoder abstraction handles beam search and sampling in a reusable way that works across seq2seq tasks without reimplementing it per example.

The recommended environment is TF 1.14 + Keras 2.3.1, which is five years old — in 2026 this is essentially unmaintained territory, and the README explicitly warns against TF 2.3+. Active development has drifted to the author's PyTorch follow-up (bert4torch); the last meaningful commit here was late 2024 and the pace before that was already slow. Documentation is almost entirely in Chinese, which is fine for the target audience but makes this invisible to anyone outside it. There is no type annotation, no test suite, and the snippets.py file is a sprawling grab-bag of utilities that makes the 'light' claim feel overstated for anything beyond the examples.

View on GitHub → Homepage ↗