finds.dev← search

// the find

matanolabs/matano

★ 1,677 · Rust · Apache-2.0 · updated Jan 2025

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

Matano is a serverless security data lake that runs entirely in your AWS account, ingesting logs from 50+ sources, normalizing them to ECS via VRL transforms, storing everything in Apache Iceberg on S3, and letting you write Python detections that fire in realtime. It's aimed at security teams who are paying too much for Splunk or Elastic and want to own their data. The last push was January 2025, which is a yellow flag for an infrastructure project.

Apache Iceberg as the storage format is genuinely the right call — your data stays in S3 in an open format, you can query it from Athena, Snowflake, or Trino without copying anything, and you're not locked into Matano's query layer. The VRL transform pipeline (borrowed from the Vector ecosystem) is expressive and well-suited to log parsing without requiring you to manage any transformation servers. The remote cache primitive in detections (shown in the 'never-before-seen IP' example) is the kind of stateful detection capability that most open-source SIEM tools fumble. Fully serverless on Lambda/SQS/S3 means the idle cost when you're not ingesting much is close to zero.

The last commit was January 2025 and commit frequency has clearly dropped — the managed cloud SIEM product seems to have absorbed the team's attention, so you'd be adopting a project that may be in maintenance mode at best. The polyglot build is brutal: Rust for the hot path, TypeScript for the CDK infra, Java/Kotlin for Iceberg table management, Python for detections, Node for the CLI — getting this running locally for development or debugging requires all of it. Alert destinations are just SNS and Slack; if you need PagerDuty, JIRA, or anything else you're writing your own SNS consumer. There's no built-in investigation or case management UI — you query Athena directly, which is fine for a data lake but means your analysts need SQL.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →