// the find
matanolabs/matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
Matano is a serverless security data lake that runs entirely in your AWS account, ingesting logs from 50+ sources, normalizing them to ECS via VRL transforms, storing everything in Apache Iceberg on S3, and letting you write Python detections that fire in realtime. It's aimed at security teams who are paying too much for Splunk or Elastic and want to own their data. The last push was January 2025, which is a yellow flag for an infrastructure project.
Apache Iceberg as the storage format is genuinely the right call — your data stays in S3 in an open format, you can query it from Athena, Snowflake, or Trino without copying anything, and you're not locked into Matano's query layer. The VRL transform pipeline (borrowed from the Vector ecosystem) is expressive and well-suited to log parsing without requiring you to manage any transformation servers. The remote cache primitive in detections (shown in the 'never-before-seen IP' example) is the kind of stateful detection capability that most open-source SIEM tools fumble. Fully serverless on Lambda/SQS/S3 means the idle cost when you're not ingesting much is close to zero.
The last commit was January 2025 and commit frequency has clearly dropped — the managed cloud SIEM product seems to have absorbed the team's attention, so you'd be adopting a project that may be in maintenance mode at best. The polyglot build is brutal: Rust for the hot path, TypeScript for the CDK infra, Java/Kotlin for Iceberg table management, Python for detections, Node for the CLI — getting this running locally for development or debugging requires all of it. Alert destinations are just SNS and Slack; if you need PagerDuty, JIRA, or anything else you're writing your own SNS consumer. There's no built-in investigation or case management UI — you query Athena directly, which is fine for a data lake but means your analysts need SQL.