// the find

simplesteph/medium-blog-kafka-udemy

★ 246 · Java · updated Dec 2023

Supporting repository for the blog post at https://medium.com/@stephane.maarek/how-to-use-apache-kafka-to-transform-a-batch-pipeline-into-a-real-time-one-831b48a6ad85

A Java demo accompanying a blog post and Udemy course, showing how to replace a batch ETL pipeline with real-time Kafka Streams — pulling Udemy course reviews, routing them through fraud detection, and aggregating stats into Postgres via Kafka Connect. It's a teaching artifact, not a production library. Audience is developers who just finished a Kafka beginner course and want a concrete end-to-end example.

Uses Avro schemas with Schema Registry rather than plain JSON, which is the right call for anything beyond toy demos. The multi-module Maven layout mirrors how you'd actually structure a real Kafka Streams project — producer, streams processors, and connectors are properly separated. Kafka Connect sink to Postgres means no custom consumer code for the write path, which is exactly how you'd do this in practice. The fraud detection branch via a separate KStream topic is a clean illustration of stream branching without overcomplicating it.

Pinned to Confluent Platform 3.3.0 from 2017 — the Zookeeper-based topic creation commands (`--zookeeper localhost:2181`) were deprecated years ago and removed in Kafka 3.x; this won't run as-is on any modern Kafka. The setup requires the Confluent CLI, a Docker Compose Postgres, and a manual PATH change with no fallback or troubleshooting guidance if anything goes wrong. The Udemy REST API it polls against has changed since 2017 and the credentials/endpoint are hardcoded assumptions, so the producer almost certainly doesn't work anymore. No tests for the Streams topology itself — only a hollow REST client test file, and the test directory is full of `.DS_Store` files that should never have been committed.

View on GitHub →