Skip to main content
This is the run-inference step of the Choose your journey flow. After your model is prepared, choose how to execute it on the device. The right path depends on your language, model format, and whether you are building a full camera/video pipeline. All runtimes can target the Qualcomm Kryo CPU, Adreno GPU, and Hexagon NPU (HTP). Select the runtime that matches your application.

LiteRT

High-performance on-device inference from Python or C++ using Qualcomm AI Engine Direct delegates. Best for quick, Python-friendly workflows.

QAIRT SDK C++ APIs

Low-level C++ control over model execution and the inference backend (QNN or SNPE).

Qualcomm IM SDK

Build real-time camera, video, and vision pipelines with GStreamer: zero-copy buffers and GPU pre/post-processing. Use it for any AI media pipeline.
Building a robotics application? The Qualcomm Intelligent Robotics (QIR) SDK adds ROS-based modules and hardware-accelerated nodes on top of these runtimes.