// the find

lucidrains/PaLM-rlhf-pytorch

★ 7,862 · Python · MIT · updated May 2026

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

A PyTorch research implementation of RLHF on top of a PaLM-style transformer, covering the full pipeline: pretraining, reward model, and PPO fine-tuning. This is Phil Wang (lucidrains) doing what he does — translating a paper into clean, runnable PyTorch before anyone else. It's a learning resource and a starting point, not a production training framework.

The pipeline is genuinely complete: you get PaLM, a binned reward model, LoRA support, and PPO in one package, which is rare for research repos this early in the RLHF wave. Flash Attention integration is real, not bolted-on — it's wired into the attention module. The repo has kept pace with the field, adding GRPO, FlowRL, and TPO modules as newer RL methods emerged, so it's not a 2022 snapshot. lucidrains' code style is unusually readable for ML research code.

There is no trained model and no pretrained weights anywhere — the README says so openly, but it means you cannot actually run inference on anything useful without millions of dollars of compute. Several items in the todo list have sat unchecked for three-plus years (wandb, memmapped PPO memory, web feedback interface), which signals the repo is maintained but not actively developed. It was marked 'wip' at creation and never graduated from that status. The reward model approach (binned classification) is already somewhat dated given how much the field has moved toward DPO and process reward models.

View on GitHub →