// the find
ClickHouse/ClickBench
ClickBench: a Benchmark For Analytical Databases
ClickBench is a reproducible OLAP benchmark built around a single flat table of ~100M web analytics rows, covering 43 queries across cold and hot runs. It's the de facto reference for comparing analytical databases — DuckDB, BigQuery, Snowflake, ClickHouse itself, and 60+ others have entries. If you're evaluating a database for clickstream or event analytics workloads, this is the benchmark you run.
The reproducibility story is genuinely good: a shell script, a public dataset, a fresh Ubuntu VM, and you have results in 20 minutes. The scoring methodology is sound — geometric mean of per-query ratios avoids the trap of one outlier query dominating the summary. The dataset comes from real production traffic with realistic data distributions, which means compression ratios, index selectivity, and skew all reflect reality rather than rand(). The lukewarm vs. true-cold-run distinction is a welcome addition that most benchmarks don't bother making.
Single flat table is a significant structural bias — anything designed around star or snowflake schemas (traditional warehouses, Redshift, Snowflake) is penalized for a workload shape they weren't built for. 100M rows is small by 2026 standards; a system that struggles at 1B rows looks identical to one that handles it fine. No concurrency testing at all means the numbers say nothing about multi-user or mixed read/write workloads, which is where production systems actually fall over. Vendor-submitted results are largely self-reported and not audited — the benchmark is honest about this, but it means the leaderboard is a mix of carefully optimized runs and whatever someone happened to submit two years ago.