// the find
abhishek-ch/around-dataengineering
A Data Engineering & Machine Learning Knowledge Hub
A personal knowledge dump of data engineering and ML resources, organized as a long README full of LinkedIn post links, sketchnotes, and a handful of actual markdown docs. It's a bookmark collection, not a library — useful if you want a map of the data engineering landscape circa 2021-2022, but don't expect runnable code or deep technical content.
The breadth of topic coverage is genuinely good — Kafka, Iceberg, Flink, Trino, Kubernetes, CDC, and distributed systems fundamentals are all represented. The leveled structure (Level 0, Level 1, Core) gives some sense of learning order. A few actual docs in the /docs folder (FoundationDB, CockroachDB, Iceberg) are well-written summaries worth reading. The spark-kubernetes directory has working YAML for running Spark on K8s, which is more concrete than most of the repo.
The vast majority of links point to the author's own LinkedIn posts, which are image carousels — you're clicking through to read a PDF screenshot, not actual content. Last push was February 2024 and the bulk of content dates to 2021; anything about Flink, Iceberg, or dbt is two major versions behind. There's almost no code — one Airflow DAG skeleton and some Kubernetes YAML. Anyone expecting a structured curriculum or reusable tooling will be disappointed; this is closer to a personal notes folder made public.