// the find

heibaiying/BigData-Notes

★ 16,905 · Java · updated Jan 2024

大数据入门指南 :star:

A Chinese-language study guide covering the core Hadoop ecosystem: HDFS, MapReduce, Spark, Flink, HBase, Kafka, Storm, Zookeeper, and more. It's a structured learning path for developers new to big data, with setup guides, concept explanations, and working code samples. The target audience is Chinese-speaking developers starting out with distributed systems.

The coverage breadth is genuinely useful — 12 technologies with both theory notes and runnable code samples means you don't have to hunt across 12 different official docs to get started. The Flink section covers state management and checkpointing, not just 'hello world'. Code samples are organized as real Maven projects with proper package structure, not just snippets. The learning roadmap document gives a sensible sequencing for what to study in what order.

Last push was January 2024 and the content shows it — Flink examples predate the DataStream API 2.0 changes, Kafka examples use older consumer group patterns, and Spark Streaming content ignores Structured Streaming which has been the preferred API for years. The repo is essentially a snapshot of 2020-era big data stacks. No English content at all, which cuts out most of the GitHub audience. Storm is included as a peer to Flink/Spark Streaming, which will mislead beginners about the current landscape — Storm is effectively legacy at this point.

View on GitHub →