// the find

shubhomoydas/ad_examples

★ 874 · Python · MIT · updated May 2024

A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convolutional Network.

A research codebase implementing Active Anomaly Discovery (AAD) — a human-in-the-loop approach where analyst feedback re-weights an ensemble of tree-based detectors to surface anomalies faster than passive scoring. Covers iid, streaming, and time series settings, plus GAN-based detection and adversarial graph attacks. Aimed at ML researchers and practitioners who need more than just a black-box anomaly score.

The AAD algorithm itself is the genuine contribution here — the core idea of treating ensemble node memberships as a feature space and using analyst labels to reweight them is well-motivated and backed by peer-reviewed publications with real benchmarks. Streaming support with KL-divergence-based tree replacement (Mode 1) is a practical design: you're not just sliding a window, you're replacing trees that have drifted while keeping weights for stable ones. The description/ruleset layer is surprisingly useful — using the same tree structure that produces anomaly scores to also generate interpretable subspace rules means you get explanations without a second model. GLAD's glocalized weighting addresses a real problem: global detectors fail in heterogeneous feature spaces, and the AFSS network learning per-region ensemble relevance is a principled fix.

The dependency on TensorFlow 1.15.4 is a hard stop for anyone in 2024 — that version is years past EOL, won't run on Python 3.9+, and the GPU path is effectively broken on modern CUDA. The bash-script-driven experiment runner (`aad.sh` with a 12-argument command line) is a research artifact, not an API — there's no clean programmatic interface beyond the demo files, so integrating this into a real pipeline requires significant archaeology. The codebase hasn't been touched since May 2024 and still requires scikit-learn 0.23, which is incompatible with current numpy; the requirements will conflict on a fresh environment. Documentation is extensive but organized as a research paper appendix, not a user guide — finding where to hook in a custom dataset or custom detector requires reading several source files.

View on GitHub →