// the find
boschresearch/CNC_Machining
data set for process monitoring on CNC machines
A research dataset from Bosch containing tri-axial accelerometer readings (2 kHz) from three real brownfield CNC milling machines, covering 15 machining processes across 6 time periods from 2018–2021, with good/bad labels for anomaly detection. It's a companion to a CIRP 2022 paper and exists primarily to support ML researchers benchmarking process monitoring models on industrial data. If you need real-world vibration data that isn't simulated or sanitized, this is one of the few publicly available options.
Real industrial data from production machines — not a lab setup, which means it has the noise, drift, and irregularities that synthetic datasets hide. The temporal spread across 6 biannual snapshots lets you study concept drift and seasonal variation, which most anomaly detection benchmarks ignore entirely. The folder structure (machine/process/label/file) is sensible and navigable without a metadata file. Binary good/bad labels are simple but unambiguous, making it easy to plug into standard classification or anomaly detection pipelines.
The dataset is severely class-imbalanced — a quick scan of the tree shows many processes have 1–4 bad samples against 10–15 good ones, which makes meaningful precision/recall numbers hard to get without careful stratification. No information on what 'bad' actually means for each process — tool wear, chatter, wrong feed rate? That context matters enormously if you want to do anything beyond binary classification. The only code provided is a data loader and a visualization notebook; there are no baseline model implementations to compare against. Last pushed in 2024 but the data itself hasn't changed since 2021, so it's not a living benchmark.