// the find

OBenner/data-engineering-interview-questions

★ 1,657 · Python · updated Jan 2026

More than 2000+ Data engineer interview questions.

A flat collection of 2000+ interview questions organized by data engineering tool — Spark, Kafka, Airflow, Flink, and a dozen others, plus theory sections on CDC, data modeling, and system design. Aimed squarely at data engineers preparing for job interviews, not at learning these tools from scratch. Each topic lives in its own markdown file so you can study one stack at a time.

Wide technology coverage that matches what interviewers actually ask about in 2025–2026, including Iceberg, Hudi, and Delta alongside the older Hadoop-era stack. The theory sections (CDC, data modeling, observability, cost optimization) cover the conceptual questions that trip up candidates who only know the tools. CI automation via repo-checks.yml keeps questions from going stale. Organized as separate per-topic files so you can drop into exactly what you need without wading through everything.

Questions are listed without answers in most sections — useful for self-quizzing but not for someone who wants to understand why an answer is correct. Hadoop, Flume, and Impala sections still get equal billing with Iceberg and dbt, which skews the coverage toward a stack most new hires won't touch. No difficulty tiers or common-vs-rare tagging, so you can't distinguish the questions every interviewer asks from the ones that show up once a year. The Python section content is unclear from the README — whether it covers idiomatic pipeline patterns or just generic Python fundamentals makes a significant difference in usefulness.

View on GitHub →