// the find

AbsaOSS/spline

★ 659 · Scala · Apache-2.0 · updated Jun 2026

Data Lineage Tracking And Visualization Solution

Spline is a data lineage tracking system for Apache Spark — it intercepts Spark query plans and records what read from where, what wrote where, and what transformed what. It stores the lineage graph in ArangoDB and exposes it via a REST API with a separate UI. For teams running Spark pipelines who need audit trails or want to debug why a dataset changed, this is the most mature open-source option in this space.

The ArangoDB graph model is a genuinely good fit for lineage data — traversing upstream/downstream dependencies is a natural graph query, not a recursive CTE nightmare. The Foxx microservice pattern (business logic runs inside ArangoDB itself) cuts a network hop on the hot query path. The agent model is clean: a Spark listener captures execution plans at the source without you changing pipeline code. Versioning strategy is honest — they separate app semver from DB schema version, which prevents the usual lie where a minor version bump quietly breaks your schema.

ArangoDB is a significant operational bet — most data teams already run Postgres or a cloud warehouse, and adding a separate graph database for lineage metadata is a real infrastructure cost that the README undersells. The Foxx service layer means your query logic is TypeScript running inside the database process, which makes it harder to test, profile, and debug than a normal service. Spark is the only first-class citizen; support for other engines exists but feels bolted on. With 659 stars and corporate backing from ABSA, community momentum is thin — if ABSA deprioritizes it, you're maintaining a fork of a niche tool with an unusual database dependency.

View on GitHub → Homepage ↗