// the find
lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
A PyTorch transformer library by Phil Wang (lucidrains) that bundles a working encoder/decoder/autoregressive transformer with a large menu of attention variants toggled via keyword arguments — RoPE, ALiBi, GQA, Flash Attention, memory tokens, and dozens more. It's a research sandbox, not a production training framework. The target is people who want to prototype an architecture idea from a paper without reimplementing the whole transformer from scratch.
The keyword-flag API is genuinely well-designed: you get a working transformer in five lines, then opt into RoPE or GQA or residual attention by adding a single argument. No subclassing, no patching. The feature coverage tracks the literature unusually closely — the repo was pushed yesterday and already references papers from early 2025. Training scripts (enwik8, copy task, parity) are included and runnable, so you can actually verify that a feature combination converges instead of just trusting the flag exists. GQA / MQA support (attn_kv_heads, attn_one_kv_head) is solid, which is the thing most people actually need from a transformer library these days.
There is no versioning story. Lucidrains repos are well-known for silent breaking changes between pip releases — a flag that existed in 0.27 may be renamed or removed in 0.29 with no changelog entry. The core file x_transformers/x_transformers.py is a single multi-thousand-line module; auditing what a feature combination actually does requires reading a lot of interleaved code. No distributed training primitives — no FSDP integration, no tensor parallelism — so the moment you need to scale beyond one GPU it stops being useful. Feature interaction documentation is almost nonexistent: combining sandwich_norm with resi_dual or multiple positional bias strategies may produce silent incorrect behavior, and the README does not warn you which combinations are tested together.