finds.dev← search

// the find

heibaiying/BigData-Notes

★ 16,905 · Java · updated Jan 2024

大数据入门指南 :star:

A Chinese-language study guide covering the core Hadoop ecosystem: HDFS, MapReduce, Spark, Flink, HBase, Kafka, Storm, Zookeeper, and more. It's a structured learning path for developers new to big data, with setup guides, concept explanations, and working code samples. The target audience is Chinese-speaking developers starting out with distributed systems.

The coverage breadth is genuinely useful — 12 technologies with both theory notes and runnable code samples means you don't have to hunt across 12 different official docs to get started. The Flink section covers state management and checkpointing, not just 'hello world'. Code samples are organized as real Maven projects with proper package structure, not just snippets. The learning roadmap document gives a sensible sequencing for what to study in what order.

Last push was January 2024 and the content shows it — Flink examples predate the DataStream API 2.0 changes, Kafka examples use older consumer group patterns, and Spark Streaming content ignores Structured Streaming which has been the preferred API for years. The repo is essentially a snapshot of 2020-era big data stacks. No English content at all, which cuts out most of the GitHub audience. Storm is included as a peer to Flink/Spark Streaming, which will mislead beginners about the current landscape — Storm is effectively legacy at this point.

View on GitHub →

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →