// the find

wangzhiwubigdata/God-Of-BigData

★ 10,477 · updated Aug 2023

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

A Chinese-language big data interview and learning resource collection covering the Java-centric big data stack: Flink, Spark, Hadoop, HBase, Hive, Kafka, Zookeeper, plus foundational Java/JVM/NIO content. The repo is a curated index of the author's blog posts and WeChat articles, with some actual Markdown files for the JVM and Flink beginner series. Aimed at Chinese developers preparing for data engineering interviews or building a mental map of the ecosystem.

The learning path is structured sensibly — it starts with Java fundamentals (concurrency, JVM, NIO), works through distributed systems theory (Paxos, Raft, CAP), then covers each framework in order of dependency. The in-repo Markdown content for JVM internals and the Flink beginner series (entries 1–17) is actually written out in full and readable without leaving GitHub. The Flink section goes deeper than most overviews, covering watermarks, state backends, window internals, and real production case studies from named companies. Star count (10k+) and fork count (3.2k) suggest it has been genuinely useful as a reference for the target audience.

The most important 'systematic summaries' — the ones listed for Hadoop, Hive, Spark, Flink, HBase, and Kafka — are all locked behind a paid 知识星球 subscription; you get a link to a paywall, not content. Most of the 'advanced' and 'practical' articles are external WeChat links that require a WeChat account and may disappear; link rot is already visible in parts of the README. The repo hasn't been touched since August 2023, so the Flink content still references the deprecated DataSet API and pre-1.17 APIs, and nothing covers Flink's Table API evolution or the Kafka 3.x changes. There are no runnable code examples anywhere in the repo — it's entirely prose, which means you cannot verify whether the sample code actually compiles.

View on GitHub →