// the find
amueller/word_cloud
A little word cloud generator in Python
A Python library for generating word clouds from text, with support for custom shapes (masks), colors, fonts, and right-to-left scripts like Arabic. It's been around since 2012 and is the de facto standard for this niche — if you need a word cloud in Python, this is what you reach for.
The mask feature is genuinely useful: you can shape the cloud to any silhouette by passing a black-and-white image, which opens up presentation use cases that a plain rectangle wouldn't. Arabic and CJK support is included with bundled fonts and working examples, not just a checkbox. The CLI (`wordcloud_cli`) covers the common one-shot use case without writing any Python. The Cython-backed layout engine (`query_integral_image.pyx`) is fast enough that it doesn't feel like a toy despite the simple premise.
Word clouds are a pretty maligned visualization — they encode frequency in font size, which humans read poorly, and they throw away word order entirely. You're shipping something that data-vis people will roll their eyes at. The tokenization is basic: splitting on whitespace and stripping common stopwords, with no stemming, lemmatization, or phrase extraction unless you preprocess yourself. There's a C extension (the `.pyx` file) which means installs on uncommon platforms or Python versions without prebuilt wheels will require a compiler — the README acknowledges this but it's still a friction point. Development pace has slowed significantly; the last real activity is intermittent maintenance rather than active feature work.