// the find
kevwan/go-stash
go-stash is a high performance, free and open source server-side data processing pipeline that ingests data from Kafka, processes it, and then sends it to ElasticSearch.
go-stash is a Kafka-to-Elasticsearch pipeline in Go, positioned as a drop-in replacement for Logstash with significantly higher throughput. It's a single binary with YAML config, so if you're already running the ELK stack and Logstash is your bottleneck, this is worth a look.
Built on go-zero, so the concurrency model is well-tested rather than hand-rolled. The config exposes the right knobs — separate tuning for Kafka connections, consumer goroutines, and processor goroutines means you can actually match it to your partition count and CPU topology. Single binary deployment is a real operational win over Logstash's JVM dependency chain. The 5x throughput claim is plausible given Go vs JVM startup and GC characteristics, and the benchmark numbers (150k/s) are specific enough to be useful.
Filter primitives are extremely thin — drop, remove_field, transfer, and add URI field is the entire pipeline. There's no grok parsing, no date parsing, no mutate, no conditional branching beyond drop conditions. Anything more complex than stripping fields before writing to ES means you're preprocessing elsewhere or forking this. The codebase is small enough (six filter files) that it's essentially a weekend project dressed up as infrastructure, which means you own it the moment you hit an edge case. No dead-letter queue or poison-pill handling is visible, so a malformed message that breaks serialization is an interesting problem. Documentation is translated from Chinese and shows it — the Offset field docs say 'default is last' but the example config sets 'first', which is the kind of inconsistency that bites you at 3am.