finds.dev← search

// the find

simplesteph/kafka-connect-github-source

★ 464 · Java · MIT · updated May 2023

Get a stream of issues and pull requests for your chosen GitHub repository

A Kafka Connect source connector that polls GitHub's issues API and streams issue/PR updates into a Kafka topic, keyed by issue number — making the topic a natural fit for log compaction. This is explicitly a teaching project bundled with a Udemy course, not production infrastructure.

The connector implements offset management correctly using `since.timestamp` and `updated_at`, so restarts don't reprocess the full history. The Avro schemas are defined in `GitHubSchemas.java` rather than inferred, which is the right call for a Connect connector. There's a `BatchSizeValidator` and `TimestampValidator` showing proper `ConfigDef` validation patterns — the kind of thing most tutorial connectors skip. The standalone Docker run script makes it easy to test without a full Kafka cluster setup.

Requires Java 8, which hit EOL years ago — if you're running a modern Kafka cluster (2.8+) this will likely cause friction. Authentication uses basic username/password, not a GitHub token, and the README still references the deprecated password auth flow that GitHub killed in 2021. Deployment section is literally 'TODO'. Only supports one task (`tasks.max=1` with no path to parallelism), so it can't scale across repos or be partitioned in any meaningful way.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →