Summary
Use the LiteRT runtime with the QNN delegate to accelerate AI inference on the NPU of Qualcomm® Dragonwing™ devices. This guide demonstrates the end-to-end workflow by deploying a quantized object detection model (YOLOX) that processes video input and outputs annotated frames with bounding boxes — either saved to a file or streamed to a display. What you’ll learn:- Configure a Dragonwing device and deploy the QIM SDK Docker environment
- Run a pre-built object detection application accelerated on the NPU
- Understand the application code to adapt it for your own models and use cases
Prerequisites
Ensure you have the following before proceeding:| Requirement | Details |
|---|---|
| Hardware | Qualcomm® Dragonwing™ device with NPU support |
| Host machine | Linux or macOS with SSH client and Docker support |
| Network | Wi-Fi or Ethernet connectivity on the target device |
| Software | Docker installed on the target device |
Step 1: Configure the device
Enable Wi-Fi and SSH
The device requires an internet connection to download artifacts needed for the sample application. If SSH and Wi-Fi are already configured, skip this step. Follow Set up an SSH connection to enable Wi-Fi and SSH on the device.Enable camera support (CamX)
If you plan to use camera input, enable CamX on the platform:The device will reboot after this step. Wait for it to come back online before continuing.
Step 2: Set up the Docker environment
Pull the QIM SDK container image
On the target device, pull the latest QIM SDK Docker image:Create required directories
Create directories for storing artifacts, configuration files, models, and media:Clone the SDK tools repository
On your host machine, clone the QIM SDK Debian repository:Copy configuration files to the device
Transfer the CDI and environment files from your host machine to the target device:Replace
For instance, if the target device is Qualcomm Dragonwing™ RB3 Gen 2, then replace
<hardware> with the appropriate identifier for your target device (check the repository for available options) and <IP_ADDRESS> with your device’s IP address.For instance, if the target device is Qualcomm Dragonwing™ RB3 Gen 2, then replace
<hardware> with qcs6490.Start the container
Launch the QIM SDK container on the target device:Access the container as root
To verify you are logged in as root, run
whoami inside the container. The output should be root.Step 3: Install dependencies
Inside the container (as root), install the LiteRT runtime and required packages.Install Python tooling
Create a virtual environment and install Python packages
Install GStreamer and GTK dependencies
These packages are required for video display output via Wayland:Step 4: Download the application and model artifacts
Still inside the container, set up the object detection application:Create the application directory
Download the application script
Download the model, labels, and sample video
Exit the root shell
You need to exit and re-enter the container as theqimsdk user to run the application:
Step 5: Run the object detection application
Enter the container as the standard user
Activate the Python environment
Run the application
- Output to file
- Output to display
Run the application and save the output as a video file:Once processing is complete, retrieve the output video:To copy the file to your host machine:
Code walkthrough: Object detection with OpenCV and LiteRT
This section explains theobject_detection.py application. Use this as a reference to build custom inference applications with LiteRT on Qualcomm Dragonwing devices.
The postprocessing in the following code is designed for object detection models from Qualcomm AI Hub.
For custom models, update the postprocessing logic to match the model’s output format and requirements.
Import packages
Parse output arguments
Configure model parameters
Load the model with the QNN delegate
The QNN delegate enables inference on the NPU:Set up video capture and preprocessing
Configure the output pipeline
Run inference in the main loop
Read each video frame, run inference, apply NMS, and draw bounding boxes:Clean up resources
Troubleshooting
| Issue | Solution |
|---|---|
docker pull fails | Verify the device has internet access. Check DNS settings and proxy configuration. |
| QNN delegate fails to load | Ensure the CDI and environment files match your hardware. Verify the container was started with --device qualcomm.com/device=qimsdk. |
| No video output on display | Confirm Wayland is running and a display is connected. Try the file output mode first to verify inference works. |
| Model download fails | Check network connectivity. The model is hosted on Hugging Face and may require proxy settings in some environments. |
| Low FPS or slow inference | Verify the model is running on the NPU (HTP backend). Check that backend_type is set to 'htp' in the delegate options. |
Next steps
- Try different models: Replace the YOLOX model with other quantized models from Qualcomm AI Hub for tasks like image classification, pose estimation, or segmentation.
- Use live camera input: Modify the application to use a camera feed instead of a pre-recorded video by changing the
VIDEO_INpath to a device capture source. - Tune detection parameters: Adjust
CONF_THRESandNMS_IOU_THRESto optimize detection accuracy for your use case. - Build custom applications: Use the code walkthrough as a template to create your own inference pipelines targeting the Dragonwing NPU.

