// the find

collabH/bigdata-growth

★ 1,776 · Python · MIT · updated Apr 2026

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

A personal knowledge base covering the Hadoop/Flink/Spark ecosystem in Chinese, built by someone who clearly works in big data professionally. It's notes, mind maps, and source-code analysis accumulated over years — not a library, framework, or tool. Target audience is Chinese-speaking engineers studying for interviews or getting up to speed on a specific technology.

Flink coverage is genuinely deep — source code walkthroughs of Checkpoint mechanics, network backpressure, and the TaskExecutor memory model go well beyond surface-level docs. The data lake section covers Hudi, Iceberg, and Paimon side-by-side with real integration examples (Flink+Hudi+Alluxio), which is rare. Production content exists: actual tuning XMind files, Debezium gotchas, Kudu schema design anti-patterns. Still actively maintained as of April 2026, which matters given how fast Flink and Paimon move.

Entirely in Chinese with no English content anywhere — immediately gates out most of the world. A significant fraction of the content is locked in .xmind binary files, which you can't read without the XMind desktop app; GitHub renders them as blobs. Feature tracking stops at Flink 1.14 new-features notes, so anything post-1.15 (Unified Source API, materialized tables, disaggregated state backend) is absent. The 'AI skill' section is marketing fluff — it's just a system prompt wrapper for Alibaba's Lingma IDE, not anything reusable.

View on GitHub →