// the find
apache/flink-training
Apache Flink Training Excercises
The official Apache Flink training exercises repo, maintained by the Flink committers. It covers the core DataStream API concepts — filtering, stateful joins, windowed aggregations, and ProcessFunction timers — through four progressively harder exercises built around a NYC taxi dataset. This is for developers who need to learn Flink seriously, not a quick tutorial.
Each exercise ships with an Exercise stub, a Solution, and both unit and integration tests, so you can check your work without guessing. The taxi data generators are self-contained and deterministic, which makes local debugging actually work. The test harness (ComposedPipeline, ParallelTestSource) is genuinely useful — it lets you test individual operators without spinning up a full cluster. Both Java and Scala variants are maintained, with Scala off by default to avoid dependency noise.
Only four exercises, and they top out at ProcessFunction with timers — nothing covering async I/O, the Table API, SQL, or connector configuration, which are the parts that trip people up in real jobs. The repo still requires Java 8 or 11 despite Flink 1.18+ supporting 17/21, so you may fight toolchain issues on a modern setup. There's no coverage of checkpointing configuration or state backends, which is where most production Flink bugs actually live. The exercises are thin on the 'why' — the Discussion docs exist for two of four labs, and they're short.