// the find
oxnr/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
A community-maintained list of big data tools, databases, and frameworks organized by category — distributed systems, stream processing, storage formats, ML, etc. It's a breadth-first reference, not a depth-first guide. Useful as a starting point when you're mapping a problem space, less useful when you need to actually choose something.
Covers the full stack from HDFS to vector databases, including newer entries like Delta Lake, Apache Paimon, and DuckDB that show it's being maintained. The Key-Map vs Columnar distinction section with Abadi's blog reference is genuinely educational, not just a link dump. The Lakehouse Table Formats section is a sign someone is paying attention to where the ecosystem has moved in the last few years.
No quality signal — Apache Gearpump (effectively dead), Facebook Corona (a 2012 blog post), and ClickHouse are given equal weight. The 'Interesting Papers' section stops at 2016, which is telling. Several entries are dead projects (Tuktu, Stratosphere, AddThis Hydra) with no indication they're abandoned. The directory tree is just README.md — there's nothing here except the list itself, so if you want evaluation or comparison, you're on your own.