// the find

mara/mara-pipelines

★ 2,086 · Python · MIT · updated Dec 2023

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Mara Pipelines is a Python ETL framework that sits between shell scripts and Airflow — pipelines defined in code, executed locally via multiprocessing, with Postgres for state and a Flask web UI for monitoring and triggering runs. Aimed at small-to-medium data teams who find Airflow's operational overhead unjustifiable but have outgrown pure scripts.

The web UI is genuinely useful: live output streaming, per-task runtime history over 30 days, and a dependency graph that's actually readable. Cost-based priority queues are a smart default — tasks that historically take longer get scheduled first, which reduces total wall time without any manual tuning. The GNU make-style dependency model (node completion, not data flow) is simpler to reason about than DAG frameworks that try to pass data between tasks. Single-machine multiprocessing means standard Python debugging tools work — no distributed tracing required.

Last commit was December 2023 and activity has been sparse for two years — this is effectively in maintenance mode, not active development. Hard Postgres dependency for pipeline state is an odd coupling; you're pulling in a database just to run ETL jobs, which is fine for teams already on Postgres but a real barrier otherwise. No Windows support (forking-based execution) means your CI or dev environment needs Linux or Docker. The Flask integration story requires wiring in mara-app and studying example projects — there's no zero-config quickstart that just works.

View on GitHub →