// the find
apache/flink
Apache Flink
Apache Flink is a distributed stream processing engine that handles both streaming and batch workloads on the same runtime. It's the go-to choice when you need exactly-once guarantees, event-time semantics, and stateful processing at scale — think fraud detection pipelines, real-time aggregations, or CDC ingestion. Not for teams running a few hundred events per second; this is infrastructure for serious data engineering work.
Exactly-once processing with checkpointing is genuinely well-implemented — it uses Chandy-Lamport snapshots under the hood, not bolted-on transactions. Event-time windowing with watermarks actually handles out-of-order data correctly, which most stream processors get wrong. The unified batch/stream API means you can run the same job in either mode without rewriting logic. SQL support via Flink SQL is production-grade, including streaming joins and temporal table lookups.
Operational complexity is real: you need to understand TaskManagers, JobManagers, slots, parallelism, and memory fractions before anything works reliably in production — the defaults will OOM you. State backend tuning (RocksDB vs. heap) requires profiling your specific workload; wrong choice means either GC pressure or write amplification. Most connectors have been externalized to separate repos under different release cadences, which means connector version hell when upgrading Flink itself. The Python API (PyFlink) is a second-class citizen — it works but you'll hit missing features and worse performance compared to the Java/Scala API.