// the find

retentioneering/retentioneering-tools

★ 889 · Python · NOASSERTION · updated Dec 2025

Retentioneering: product analytics, data-driven CJM optimization, marketing analytics, web analytics, transaction analytics, graph visualization, process mining, and behavioral segmentation in Python. Predictive analytics over clickstream, AB tests, machine learning, and Markov Chain simulations.

Retentioneering is a Python library for analyzing user clickstream data — think funnel analysis, but with actual path traversal, behavioral clustering, and Markov chain modeling on top of pandas DataFrames. It's aimed at product analysts who want to go beyond simple funnels and understand how users actually move through an app, not just whether they hit predefined steps.

The preprocessing graph is genuinely well-designed: you build a DAG of transformations (split sessions, label churned users, collapse loops) that's reproducible and exportable to config — no more notebook spaghetti that nobody can re-run. The Eventstream abstraction wrapping pandas is clean; it adds the metadata layer needed for sequence analysis without forcing you off familiar tooling. The visualization suite (transition graph, Step Sankey, step matrix) renders in Jupyter with interactive controls and is noticeably better than rolling your own with matplotlib. There's real statistical testing support for A/B comparisons on path metrics, which most tools in this space skip entirely.

889 stars for a library that's been around since 2015 is a signal that adoption never broke through — it's probably because the Jupyter-first design means it basically doesn't exist outside notebooks; there's no CLI, no pipeline-friendly API that doesn't require a running display kernel for visualizations. The Markov chain and ML clustering features are surface-level — you get k-means and basic sequence vectorization, not anything that would replace a real ML workflow. No async or streaming support, so processing large clickstream exports means blocking pandas jobs; it won't handle event volumes that a mid-size product generates without significant sampling. Documentation is thorough but the guides are mostly screenshots of Jupyter output, which makes it hard to understand what the underlying data model actually looks like without running the code yourself.

View on GitHub → Homepage ↗