// the find

microsoft/onnxruntime-inference-examples

★ 1,660 · C++ · MIT · updated Feb 2026

Examples for using ONNX Runtime for machine learning inferencing.

A collection of ONNX Runtime usage examples across C/C++, C#, JavaScript, mobile (Android/iOS/MAUI), and quantization workflows. It's the reference starting point for anyone integrating ORT into a new project or targeting a specific execution provider like QNN, OpenVINO, or SNPE. Not a library — just runnable samples.

The C++ examples cover the full EP surface area: CPU, CUDA, OpenVINO, QNN, and SNPE all have working samples with their own CMakeLists, so you're not starting from scratch for any major hardware target. The ort_tutorial series is well-structured — numbered folders (10_, 20_, 30_, 40_) that build on each other from EP selection through device tensor transfers to CUDA stream synchronization and EP context caching. The JS side includes non-trivial demos (Whisper, Segment Anything, Stable Diffusion Turbo) that show real model pipelines, not just 'load a toy model and print output'. CI is actually wired up for C++ (Linux/Windows) and mobile, so the examples are at least mechanically verified to build.

Most examples target models from 2020–2022 (SqueezeNet, MobileNetV2, YOLOv3) — nothing here shows how to run a modern transformer or LLM with ORT, which is increasingly why people come to this runtime. The accuracy_tool in c_cxx is QNN-only with no documentation on extending it to other EPs, despite being the most practically useful thing in the repo. The JS examples bundle their own webpack configs rather than sharing anything, so when onnxruntime-web releases a breaking change you get 12 separate places to update. The mobile section has iOS/Android ObjC and Swift examples scattered across subfolders with no top-level guide explaining which to use for a given scenario — you'll spend more time navigating than coding.

View on GitHub →