// the find

miguno/kafka-storm-starter

★ 722 · Scala · NOASSERTION · updated Mar 2022

[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

A circa-2014 code sample showing how to wire Kafka 0.8, Storm 0.9, and Spark Streaming 1.1 together using Avro for serialization. Explicitly abandoned and unmaintained. The versions pinned here are multiple major releases behind current: Kafka is now at 3.x, Storm is effectively dead, and Spark Streaming has been superseded by Structured Streaming.

The integration tests are genuinely good — they spin up embedded ZooKeeper, Kafka, and Storm in-process rather than requiring a running cluster, which was non-trivial to set up correctly in 2014. The generic `AvroDecoderBolt[T]` and `AvroScheme[T]` designs avoid copy-pasting a new bolt/scheme per Avro schema type. Test output is BDD-style with Given/When/Then steps that make the flow readable. The README is honest about limitations and documents known upstream bugs with JIRA links.

The project is dead — Kafka 0.8, Storm 0.9, and Scala 2.10 are all end-of-life by many years; nothing here compiles against current dependencies without significant rewriting. Storm itself has largely lost to Flink and Spark Structured Streaming for new stream processing work. The Kryo serializer for Avro types is hardcoded to a single `Tweet` schema with an acknowledged inability to make it generic, which means every new schema type requires a new hand-rolled serializer. Using `Thread.sleep()` for test synchronization instead of polling conditions makes the integration tests flaky on slow machines.

View on GitHub → Homepage ↗