finds.dev← search

// the find

lilianweng/multi-armed-bandit

★ 418 · Python · updated May 2024

Play with the solutions to the multi-armed-bandit problem.

A companion repo to Lilian Weng's 2018 blog post on multi-armed bandit algorithms. It implements several classic solvers (epsilon-greedy, UCB, Thompson Sampling, etc.) against a Bernoulli bandit. Useful if you're reading the post and want runnable code alongside it.

The code is clean and pedagogically structured — Solver base class with concrete subclasses makes it easy to follow along with the theory. Thompson Sampling is included, which many toy implementations skip. The regret and estimated-vs-true probability plots are genuinely useful for building intuition. Lilian Weng's original blog post is one of the clearest explanations of the problem in existence, so the companion code benefits from that context.

This is a 2018 blog post companion, not a library — there's no package, no tests, no docs beyond the README pointer. The Bernoulli bandit assumption is baked in; you can't swap in a Gaussian or contextual bandit without rewriting. No contextual bandit (LinUCB, neural epsilon-greedy) support at all, which is where most real-world applications live. Dead since 2018 in substance, with only a minor touch in 2024 — if you need anything beyond the basics, you've already outgrown it.

View on GitHub →

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →