// the find

lilianweng/multi-armed-bandit

★ 418 · Python · updated May 2024

Play with the solutions to the multi-armed-bandit problem.

A companion repo to Lilian Weng's 2018 blog post on multi-armed bandit algorithms. It implements several classic solvers (epsilon-greedy, UCB, Thompson Sampling, etc.) against a Bernoulli bandit. Useful if you're reading the post and want runnable code alongside it.

The code is clean and pedagogically structured — Solver base class with concrete subclasses makes it easy to follow along with the theory. Thompson Sampling is included, which many toy implementations skip. The regret and estimated-vs-true probability plots are genuinely useful for building intuition. Lilian Weng's original blog post is one of the clearest explanations of the problem in existence, so the companion code benefits from that context.

This is a 2018 blog post companion, not a library — there's no package, no tests, no docs beyond the README pointer. The Bernoulli bandit assumption is baked in; you can't swap in a Gaussian or contextual bandit without rewriting. No contextual bandit (LinUCB, neural epsilon-greedy) support at all, which is where most real-world applications live. Dead since 2018 in substance, with only a minor touch in 2024 — if you need anything beyond the basics, you've already outgrown it.

View on GitHub →