// the find
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Pathway is a Python-fronted stream processing framework with a Rust engine based on Differential Dataflow. You write Python dataflow code; it compiles to incremental Rust computations that run with real multithreading. The LLM/RAG xpack makes it a credible option for anyone building live AI pipelines that need to stay in sync with changing source data.
The batch/stream unification is real — the same pipeline code runs against static files locally and a Kafka stream in production, which eliminates the usual dual-codebase maintenance problem. The incremental computation model means updates propagate only through affected subgraphs rather than reprocessing full windows, which matters at scale. The LLM xpack includes an in-memory vector index that updates as documents change, so RAG pipelines don't go stale between re-index runs — something LangChain and LlamaIndex leave to you. The Airbyte connector integration gives access to 300+ sources without writing custom connectors.
The BSL 1.1 license is the elephant in the room: 'exactly once' consistency is enterprise-only, which means the free tier gives you 'at least once' for anything that matters in production — a meaningful limitation that the README buries. MacOS/Linux only; Windows requires a VM, which is a non-starter for a chunk of developers who might otherwise evaluate it. The Python API is declarative and dataflow-oriented, so anyone who isn't already thinking in terms of tables and transformations will find the mental model friction high compared to just writing a consumer loop. The distributed mode requires their enterprise offering, meaning horizontal scaling is paywalled.