finds.dev← search

// the find

AgnostiqHQ/covalent

★ 861 · Python · Apache-2.0 · updated May 2026

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.

Covalent is a Python workflow orchestration framework for ML/HPC workloads that uses decorators (@ct.electron, @ct.lattice) to build DAGs from ordinary functions and dispatch them to heterogeneous backends — AWS Batch, SLURM, Dask, IBM Quantum, and others. The key pitch is that swapping from local execution to a cloud cluster is a single decorator argument change. It is aimed at researchers and ML engineers who need to fan out compute without rewriting their code around a specific scheduler.

The decorator API is the cleanest part: annotate functions, wrap them in a lattice, and Covalent infers the dependency graph automatically — no explicit DAG construction, no task IDs to wire up by hand. Per-task dependency injection (DepsPip, DepsBash) is legitimately useful — you can specify that one task needs PyTorch and another needs a different CUDA version without building a monolithic environment. The plugin executor architecture is honest about what it is: each backend is a separate installable package, so you do not drag AWS dependencies into a local run. File transfer strategies (S3, GCS, rsync, Azure Blob) are first-class objects that compose with tasks, which beats the usual approach of manually staging inputs before dispatch.

Requires a persistent local dispatcher server process (covalent start); if that dies, you lose in-flight workflow state and have to restart from scratch — this is a significant ops burden for anything running overnight. Python 3.8–3.10 support only, as of the README badges, means you cannot adopt it on Python 3.12+ without checking whether that has actually been fixed. The default storage is SQLite, which is fine for dev but will bottleneck at scale and is not HA — production use effectively requires running their server somewhere durable. Task data is passed between steps via pickle, which is the classic Python distributed footgun: non-serializable objects and lambdas fail at dispatch time, not at definition time, and the error messages are often unhelpful.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →