Skip to main content
This guide covers AI application development using the tools, runtimes, and frameworks supported on Qualcomm Dragonwing IoT platforms. It is intended for developers who want to train or fine-tune models, prepare them for deployment, and build AI applications on Qualcomm hardware. Most decisions in this guide come down to two questions:
  • How will you prepare your model? Convert, quantize, compile, or fine-tune it for the target hardware.
  • How will you run inference? The runtime or integration path that executes the model on the device.
You can bring pretrained models from ONNX, PyTorch, TensorFlow, or LiteRT and run them efficiently across the Qualcomm Kryo™ CPU, Adreno™ GPU, and Hexagon™ NPU (HTP).

Choose your journey

Use the flowchart below to find the path that fits your application. It walks you through whether you already have a model, how to prepare it (purple), and how to run inference (blue), then links you to the right guide. Every highlighted box is clickable.
On-device generative AI availability depends on your Qualcomm Linux release. See the GenAI workflow page for the current support status.

AI architecture

The following diagram illustrates the AI application development architecture on Qualcomm platforms. AI/ML developer workflow architecture

Prepare your model

Before a model runs efficiently on Qualcomm hardware, it is converted to an executable format and, for the Hexagon NPU (HTP), quantized to a supported precision. (LiteRT models are an exception: they run directly through the AI Engine Direct delegate.) Choose the preparation tool that matches your starting point.
ToolUse it toOutput
Qualcomm AI HubDownload a preoptimized model, or bring your own model (BYOM) and have QAIRT compile, convert, and quantize it in the cloud for your target chipset, with no local toolchain required.Ready-to-run LiteRT or Qualcomm AI Engine Direct model
Qualcomm AI Runtime SDK (QAIRT)The local alternative to AI Hub BYOM: convert, quantize, and compile models from TensorFlow, PyTorch, LiteRT, or ONNX yourself. Integrates AI Engine Direct and the Neural Processing SDK.Quantized model / compiled context binary
Qualcomm AI Model Efficiency Toolkit (AIMET)Recover accuracy lost during quantization using post-training quantization (PTQ) and quantization-aware training (QAT).Higher-accuracy quantized model
Edge ImpulseBuild, train, or fine-tune models from your own audio, image, and sensor data.Trained model in your chosen format

Run inference

After your model is prepared, choose how to execute it on the device. The runtime you pick depends on your language, model format, and whether you are building a full camera/video pipeline.
Runtime / integrationBest for
LiteRTHigh-performance on-device inference from Python or C++, using Qualcomm AI Engine Direct delegates.
QAIRT SDK C++ APIsLow-level C++ control over model execution and the inference backend.
Qualcomm Intelligent Multimedia SDK (IM SDK)High-performance camera, video, and vision pipelines that combine capture, preprocessing, inference, and rendering, with zero-copy buffers and GPU pre/post-processing.
Qualcomm® GenAI Inference Engine (Genie)Running and orchestrating on-device generative AI (LLMs, multimodal) workflows.
Building a robotics application? The Qualcomm® Intelligent Robotics (QIR) SDK adds ROS-based modules and hardware-accelerated nodes on top of these runtimes.

AI hardware

Qualcomm platforms include the following hardware accelerators for AI inference:
  • Qualcomm Kryo™ CPU — High-performance CPU with best-in-class power efficiency.
  • Qualcomm Adreno™ GPU: Balanced power and performance for AI workloads, accelerated with OpenCL kernels. Also used for model pre- and post-processing (for example, the IM SDK runs resize, color conversion, and overlay on the GPU).
  • Qualcomm Hexagon™ Tensor Processor (HTP): Also known as NPU/DSP/HMX. Optimized for low-power, high-performance AI inference. For best performance, quantize pretrained models to a supported precision.