// the find
simplesteph/kafka-connect-github-source
Get a stream of issues and pull requests for your chosen GitHub repository
A Kafka Connect source connector that polls GitHub's issues API and streams issue/PR updates into a Kafka topic, keyed by issue number — making the topic a natural fit for log compaction. This is explicitly a teaching project bundled with a Udemy course, not production infrastructure.
The connector implements offset management correctly using `since.timestamp` and `updated_at`, so restarts don't reprocess the full history. The Avro schemas are defined in `GitHubSchemas.java` rather than inferred, which is the right call for a Connect connector. There's a `BatchSizeValidator` and `TimestampValidator` showing proper `ConfigDef` validation patterns — the kind of thing most tutorial connectors skip. The standalone Docker run script makes it easy to test without a full Kafka cluster setup.
Requires Java 8, which hit EOL years ago — if you're running a modern Kafka cluster (2.8+) this will likely cause friction. Authentication uses basic username/password, not a GitHub token, and the README still references the deprecated password auth flow that GitHub killed in 2021. Deployment section is literally 'TODO'. Only supports one task (`tasks.max=1` with no path to parallelism), so it can't scale across repos or be partitioned in any meaningful way.