// the find
nerevu/riko
A Python stream processing engine modeled after Yahoo! Pipes
riko is a Python library for building data pipelines using a composable pipe model, directly inspired by Yahoo! Pipes. It targets single-machine ETL tasks, RSS feed aggregation, and lightweight data mashups — not distributed workloads. If you remember Yahoo! Pipes fondly and want that mental model in code, this is it.
The operator/processor distinction is well thought out — knowing at the type level whether something parallelizes cleanly is useful. The SyncPipe chaining API reads naturally and the `emit` pattern for flattening nested results is a clean solution to a real annoyance. PyPy support is explicit and tested, which matters for throughput on CPU-bound transforms. The ~40 built-in modules cover the practical 80% of feed/data manipulation without needing to wire anything up yourself.
Async support is built on Twisted, not asyncio — that's a significant dependency drag in 2024 and will feel alien to anyone who learned async after Python 3.5. The pull-based iterator model means a single consumer per stream; the workaround (`split` module) is real but awkward. Python version support caps at 3.9 per the README, with no sign of 3.11/3.12 testing, and the Travis CI badge is dead. The XPath-based scraping examples are already broken against real sites, which undermines the tutorial experience.