// the find

scikit-hep/awkward

★ 965 · Python · BSD-3-Clause · updated Jun 2026

Manipulate JSON-like data with NumPy-like idioms.

Awkward Array lets you operate on nested, variable-length data structures — think JSON or particle physics event data — using NumPy-style vectorized operations. It's the backbone of the scikit-hep ecosystem and used heavily at CERN for LHC data analysis. If your data has ragged arrays (lists of lists where inner lengths differ), this is the library that makes NumPy idioms actually work on it.

The performance numbers are real and significant — 90x faster and 10x less memory than pure Python for the motivating example, and it gets faster still in Numba JIT contexts. The Apache Arrow interop is solid: you can round-trip through PyArrow, use Parquet I/O, and hand off to other Arrow-compatible tools without copying data. The architecture is clean — a thin Python layer over a compiled C++ kernel library (awkward-cpp), so the hot paths are in C++ while the API stays Pythonic. Active NSF-funded development with a real contributor base (40+ contributors) and support through Python 3.14.

The split into two packages (awkward and awkward-cpp) means `pip install git+https://...` doesn't work for dev installs — you need nox and a C++ compiler, which is a genuine onboarding friction point outside the particle physics community. GPU support exists but only through a separate awkward-cuda package that's nowhere near feature parity with the CPU path, so don't assume vectorized operations just work on CUDA. The target audience is narrow — if your data isn't genuinely nested/ragged, pandas or plain NumPy is less cognitive overhead. Error messages for type mismatches in complex nested operations can be cryptic, especially when union types are involved.

View on GitHub → Homepage ↗