// the find

lilianweng/transformer-tensorflow

★ 483 · Python · updated Mar 2023

Implementation of Transformer Model in Tensorflow

A bare TF1 implementation of the 2017 'Attention Is All You Need' paper, written by Lilly Weng as a learning exercise alongside her blog post. It trains a translation model on IWSLT15/WMT14 and reaches ~BLEU 20 on WMT14 with default settings. This is educational reading material, not production tooling.

The code is flat and readable — eight files, no abstraction maze, which makes it a good companion to the paper. The implementation notes section honestly flags the tricky mask and autoregressive decoding questions rather than papering over them. BLEU 20 on WMT14 is a real number that confirms the implementation is actually correct, not just plausible-looking. The associated blog post fills in the conceptual gaps the code leaves.

It's TF1 with sessions and checkpoints — you can't run this without legacy TF or compatibility shims, and porting it to modern Keras would essentially mean rewriting it from scratch. The 'Implementation Notes' section is marked WIP and the questions listed are never answered in the README. There is no pretrained checkpoint to download, so reproducing the BLEU 20 number means paying for a full WMT14 training run. Last pushed 2023 but the core code predates that by years; if you're trying to understand transformers today, HuggingFace's annotated implementations or the official PyTorch tutorial are more relevant starting points.

View on GitHub → Homepage ↗