finds.dev← search

// the find

oxnr/awesome-bigdata

★ 14,441 · MIT · updated May 2026

A curated list of awesome big data frameworks, ressources and other awesomeness.

A community-maintained list of big data tools, databases, and frameworks organized by category — distributed systems, stream processing, storage formats, ML, etc. It's a breadth-first reference, not a depth-first guide. Useful as a starting point when you're mapping a problem space, less useful when you need to actually choose something.

Covers the full stack from HDFS to vector databases, including newer entries like Delta Lake, Apache Paimon, and DuckDB that show it's being maintained. The Key-Map vs Columnar distinction section with Abadi's blog reference is genuinely educational, not just a link dump. The Lakehouse Table Formats section is a sign someone is paying attention to where the ecosystem has moved in the last few years.

No quality signal — Apache Gearpump (effectively dead), Facebook Corona (a 2012 blog post), and ClickHouse are given equal weight. The 'Interesting Papers' section stops at 2016, which is telling. Several entries are dead projects (Tuktu, Stratosphere, AddThis Hydra) with no indication they're abandoned. The directory tree is just README.md — there's nothing here except the list itself, so if you want evaluation or comparison, you're on your own.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →