finds.dev← search

// the find

apache/flink-training

★ 1,036 · Java · Apache-2.0 · updated May 2026

Apache Flink Training Excercises

The official Apache Flink training exercises repo, maintained by the Flink committers. It covers the core DataStream API concepts — filtering, stateful joins, windowed aggregations, and ProcessFunction timers — through four progressively harder exercises built around a NYC taxi dataset. This is for developers who need to learn Flink seriously, not a quick tutorial.

Each exercise ships with an Exercise stub, a Solution, and both unit and integration tests, so you can check your work without guessing. The taxi data generators are self-contained and deterministic, which makes local debugging actually work. The test harness (ComposedPipeline, ParallelTestSource) is genuinely useful — it lets you test individual operators without spinning up a full cluster. Both Java and Scala variants are maintained, with Scala off by default to avoid dependency noise.

Only four exercises, and they top out at ProcessFunction with timers — nothing covering async I/O, the Table API, SQL, or connector configuration, which are the parts that trip people up in real jobs. The repo still requires Java 8 or 11 despite Flink 1.18+ supporting 17/21, so you may fight toolchain issues on a modern setup. There's no coverage of checkpointing configuration or state backends, which is where most production Flink bugs actually live. The exercises are thin on the 'why' — the Discussion docs exist for two of four labs, and they're short.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →