// the find

DataTalksClub/data-engineering-zoomcamp

★ 42,319 · Jupyter Notebook · updated Jun 2026

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

A free 9-week course teaching data engineering fundamentals by building an end-to-end pipeline. Covers Docker, Terraform, Kestra, BigQuery, dbt, Spark, and Kafka through structured modules with homework. Aimed at developers and analysts who know basic Python/SQL but have no data engineering background.

42k stars with 8k forks suggests a massive active community and plenty of peer help when you get stuck. The syllabus spans the actual modern DE stack — not toy examples, but real tools (Kestra for orchestration, dbt for transformation, Kafka for streaming) connected into a coherent end-to-end pipeline. The live cohort structure with deadlines, peer review, and a certificate gives it accountability that self-paced courses usually lack. Materials are genuinely open — all lectures pre-recorded and freely available, no paywall.

Heavy GCP dependency throughout (BigQuery, GCS) — if you're on AWS or Azure, you'll be adapting examples yourself constantly. The course adds new tools each year (Bruin showed up in 2026 alongside Kestra), which means older community answers and Stack Overflow threads become stale quickly. It's a broad survey, not a deep one — you finish knowing how to wire the tools together but not why certain design decisions were made, so production incidents will still surprise you. No testing module: dbt tests get a mention but there's nothing on data quality frameworks, CI for pipelines, or how to handle schema drift in Kafka topics.

View on GitHub → Homepage ↗