// the find

VoltaML/voltaML

★ 1,176 · Python · Apache-2.0 · updated Nov 2022

⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

VoltaML wraps TensorRT, ONNX, TorchScript, and TVM compilation behind a unified Python API so you can convert a PyTorch model to an optimized inference engine in a few lines. It targets CV and NLP models specifically — YOLO variants, ResNets, BERT-family. The benchmark numbers are real and impressive: 5–13x speedups for classification, 6–7x for transformers.

The benchmark tables are specific and honest — tested on RTX 2080Ti at batch size 1, with actual model names and millisecond figures, not just percentage claims. QDQ (Quantization-Aware) support for BERT-family transformers is non-trivial and the per-model QDQ files show real implementation depth. The Triton Server integration stubs exist for anyone who needs to scale beyond single-GPU inference. Docker image exists so you don't have to fight the TensorRT installation yourself.

Dead since November 2022 — TensorRT 8.4.1.2 and PyTorch 1.12 are ancient; TensorRT 10 has a completely different API and nothing here will compile against it without major surgery. The repo is mostly vendored YOLOv5/YOLOv6 source code dumped into voltaml/models, which inflates the apparent size and makes the actual voltaml abstraction layer (compile.py, build_engine.py, ~500 lines total) look thinner than advertised. No tests. Int8 calibration is documented as a feature but the calibration.cache file checked into the repo suggests it was never really wired up for end-user workflows. The enterprise upsell checklist at the bottom with multiple unchecked items is a bad sign for a repo that stopped getting commits right after.

View on GitHub →