// the find

WenjieDu/PyPOTS

★ 2,019 · Python · BSD-3-Clause · updated Jun 2026

A Python toolkit/library for reality-centric machine/deep learning & data mining on partially-observed time series, with 50+ SOTA neural network models for scientific analysis tasks (imputation, classification, clustering, forecasting, anomaly detection, cleaning) on incomplete industrial irregularly-sampled multivariate TS with NaN missing values

PyPOTS is a Python library for machine learning on time series with missing values. It wraps 50+ published neural network architectures (Transformer variants, diffusion models, GRU-based models, etc.) under a unified sklearn-style API covering imputation, forecasting, classification, clustering, and anomaly detection. The target audience is researchers and engineers who deal with real-world sensor data where gaps are the norm, not the exception.

Unified API across wildly different model families — you can swap SAITS for TimesNet or CSDI with one line change, which is genuinely useful for ablation studies. The ecosystem split (TSDB for datasets, PyGrinder for synthetic missingness, BenchPOTS for benchmarking) is clean architecture: each piece has a single job. Hyperparameter tuning via Optuna is built in, not bolted on — the migration from NNI to Optuna in v2 was the right call given NNI's stagnation. The ORT+MIT adaptation strategy (borrowed from SAITS) for applying forecasting-only models to POTS is well-documented and consistently applied, rather than leaving each model to handle masks differently.

Many models in the table are originally forecasting architectures retrofitted for imputation via the ORT+MIT trick — they're flagged with 🧑‍🔧 but users can easily miss that this means the POTS support is a wrapper, not native design, and benchmark numbers won't match the original papers. Anomaly detection is the weakest task: all supported models are just reconstruction-error thresholders with no built-in threshold selection — you get raw anomaly scores and have to figure out cutoffs yourself. The dependency footprint is heavy; installing for a single model still pulls in the full PyTorch + optional extras graph, and optional deps (torch-geometric for Raindrop, transformers for LLM models) are not cleanly isolated — you find out at import time. Clustering support is stuck at two models (CRLI and VaDER) from 2019-2021; it's clearly not a maintained priority.

View on GitHub → Homepage ↗