Run a LiteRT model on the NPU - Qualcomm Dragonwing Documentation

Summary

Use the LiteRT runtime with the QNN delegate to accelerate AI inference on the NPU of Qualcomm® Dragonwing™ devices. This guide demonstrates the end-to-end workflow by deploying a quantized object detection model (YOLOX) that processes video input and outputs annotated frames with bounding boxes — either saved to a file or streamed to a display. What you’ll learn:

Configure a Dragonwing device and deploy the QIM SDK Docker environment
Run a pre-built object detection application accelerated on the NPU
Understand the application code to adapt it for your own models and use cases

Prerequisites

Ensure you have the following before proceeding:

Requirement	Details
Hardware	Qualcomm® Dragonwing™ device with NPU support
Host machine	Linux or macOS with SSH client and Docker support
Network	Wi-Fi or Ethernet connectivity on the target device
Software	Docker installed on the target device

Step 1: Configure the device

Enable Wi-Fi and SSH

The device requires an internet connection to download artifacts needed for the sample application. If SSH and Wi-Fi are already configured, skip this step. Follow Set up an SSH connection to enable Wi-Fi and SSH on the device.

Enable camera support (CamX)

If you plan to use camera input, enable CamX on the platform:

echo -n "camx" > /var/data
efivar -n 882f8c2b-9646-435f-8de5-f208ff80c1bd-VendorDtbOverlays -w -f /var/data
efivar -n 882f8c2b-9646-435f-8de5-f208ff80c1bd-VendorDtbOverlays -p
sync
reboot

The device will reboot after this step. Wait for it to come back online before continuing.

Step 2: Set up the Docker environment

Pull the QIM SDK container image

On the target device, pull the latest QIM SDK Docker image:

cd $HOME

docker pull artifacts.codelinaro.org/iot-solutions-microservices/qimsdk:latest

Create required directories

Create directories for storing artifacts, configuration files, models, and media:

mkdir -p /etc/cdi /etc/docker/env /etc/models /etc/labels /etc/media /root/media /root/models /root/labels /root/configs

Clone the SDK tools repository

On your host machine, clone the QIM SDK Debian repository:

git clone https://git.codelinaro.org/clo/le/sdk-tools.git -b imsdk-tools.lnx.1.0
cd sdk-tools/qimsdk-debian/

Copy configuration files to the device

Transfer the CDI and environment files from your host machine to the target device:

scp -r cdi/<hardware>_qli_2x_qimsdk.json root@<IP_ADDRESS>:/etc/cdi/qimsdk.json

scp -r env/<hardware>_qli_2x_qimsdk.env root@<IP_ADDRESS>:/etc/docker/env/qimsdk.env

Replace <hardware> with the appropriate identifier for your target device (check the repository for available options) and <IP_ADDRESS> with your device’s IP address.

For instance, if the target device is Qualcomm Dragonwing™ RB3 Gen 2, then replace <hardware> with qcs6490.

Start the container

Launch the QIM SDK container on the target device:

docker run -it -d \
   --net host \
   --env-file /etc/docker/env/qimsdk.env \
   --device qualcomm.com/device=qimsdk \
   -h qimsdk \
   --name qimsdk \
   artifacts.codelinaro.org/iot-solutions-microservices/qimsdk:latest

Access the container as root

export DOCKER_ID=$(docker ps -aq)
docker exec -it ${DOCKER_ID} sh

To verify you are logged in as root, run whoami inside the container. The output should be root.

Step 3: Install dependencies

Inside the container (as root), install the LiteRT runtime and required packages.

Install Python tooling

apt update
apt install python3-pip python3-venv

Create a virtual environment and install Python packages

python3 -m venv venv-litert-demo --system-site-packages

. venv-litert-demo/bin/activate

pip3 install ai-edge-litert Pillow opencv-python

Install GStreamer and GTK dependencies

These packages are required for video display output via Wayland:

apt install -y libgstreamer1.0-dev gstreamer1.0-plugins-ugly gstreamer1.0-libav \
              gstreamer1.0-alsa gstreamer1.0-gtk3 python3-gi python3-gi-cairo \
              gir1.2-gtk-3.0 python3-full pkg-config cmake libcairo2-dev \
              libgirepository1.0-dev gir1.2-glib-2.0 build-essential python3-dev \
              pkg-config meson

Step 4: Download the application and model artifacts

Still inside the container, set up the object detection application:

Create the application directory

mkdir -p /etc/apps/ && cd /etc/apps/

Download the application script

curl -L https://raw.githubusercontent.com/qualcomm/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/applications/LiteRT/object_detection.py -o /etc/apps/object_detection.py

Download the model, labels, and sample video

curl -L https://raw.githubusercontent.com/qualcomm/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/artifacts/labels/coco_labels.txt -o /etc/labels/coco_labels.txt

curl -L https://raw.githubusercontent.com/qualcomm/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/artifacts/videos/video.mp4 -o /etc/media/video.mp4

curl -L https://huggingface.co/qualcomm/Yolo-X/resolve/v0.30.5/Yolo-X_w8a8.tflite -o /etc/models/yolox_quantized.tflite

Exit the root shell

You need to exit and re-enter the container as the qimsdk user to run the application:

exit

Step 5: Run the object detection application

Enter the container as the standard user

docker exec -it ${DOCKER_ID} bash

Activate the Python environment

. venv-litert-demo/bin/activate

Run the application

cd /etc/apps

Output to file
Output to display

Run the application and save the output as a video file:

python3 object_detection.py --output file

Once processing is complete, retrieve the output video:

exit

docker cp ${DOCKER_ID}:/etc/apps/output_object_detection.mp4 /etc/media/output_object_detection.mp4

To copy the file to your host machine:

scp root@<IP_ADDRESS>:/etc/media/output_object_detection.mp4 .

To stream the output directly to a connected display via Wayland:

python3 object_detection.py --output wayland

Ensure a display is connected to the device and Wayland is running before using this mode.

Code walkthrough: Object detection with OpenCV and LiteRT

This section explains the object_detection.py application. Use this as a reference to build custom inference applications with LiteRT on Qualcomm Dragonwing devices.

The postprocessing in the following code is designed for object detection models from Qualcomm AI Hub. For custom models, update the postprocessing logic to match the model’s output format and requirements.

Import packages

#!/usr/bin/env python3
import cv2
import numpy as np
import argparse
import ai_edge_litert.interpreter as tflite
import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst

Parse output arguments

parser = argparse.ArgumentParser(description="Run object detection and output to file or Wayland.")
parser.add_argument("--output", choices=["file", "wayland"], default="file",
                    help="Choose output mode: 'file' (default) or 'wayland'")
args = parser.parse_args()

Configure model parameters

MODEL_PATH = "/etc/models/yolox_quantized.tflite"  # YOLOX quantized model
LABEL_PATH = "/etc/labels/coco_labels.txt"
VIDEO_IN = "/etc/media/video.mp4"
VIDEO_OUT = "output_object_detection.mp4"
DELEGATE_PATH = "libQnnTFLiteDelegate.so"

FRAME_W, FRAME_H = 1600, 900
FPS_OUT = 30
CONF_THRES = 0.25
NMS_IOU_THRES = 0.50
BOX_SCALE = 3.2108588218688965
BOX_ZP = 31.0
SCORE_SCALE = 0.0038042240776121616

Load the model with the QNN delegate

The QNN delegate enables inference on the NPU:

delegate_options = {'backend_type': 'htp'}
delegate = tflite.load_delegate(DELEGATE_PATH, delegate_options)
interpreter = tflite.Interpreter(model_path=MODEL_PATH, experimental_delegates=[delegate])
interpreter.allocate_tensors()

in_det = interpreter.get_input_details()
out_det = interpreter.get_output_details()
in_h, in_w = in_det[0]["shape"][1:3]

labels = [l.strip() for l in open(LABEL_PATH)]

Set up video capture and preprocessing

cap = cv2.VideoCapture(VIDEO_IN)
sx, sy = FRAME_W / in_w, FRAME_H / in_h
frame_rs = np.empty((FRAME_H, FRAME_W, 3), np.uint8)
input_tensor = np.empty((1, in_h, in_w, 3), np.uint8)

Configure the output pipeline

if args.output == "file":
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    out_writer = cv2.VideoWriter(VIDEO_OUT, fourcc, FPS_OUT, (FRAME_W, FRAME_H))
else:
    Gst.init(None)
    pipeline = Gst.parse_launch(
        'appsrc name=src is-live=true block=true format=time caps=video/x-raw,format=BGR,width=1600,height=900,framerate=30/1 ! videoconvert ! waylandsink'
    )
    appsrc = pipeline.get_by_name('src')
    pipeline.set_state(Gst.State.PLAYING)

frame_cnt = 0

Run inference in the main loop

Read each video frame, run inference, apply NMS, and draw bounding boxes:

while True:
    ok, frame = cap.read()
    if not ok:
        break
    frame_cnt += 1

    cv2.resize(frame, (FRAME_W, FRAME_H), dst=frame_rs)
    cv2.resize(frame_rs, (in_w, in_h), dst=input_tensor[0])

    interpreter.set_tensor(in_det[0]['index'], input_tensor)
    interpreter.invoke()

    boxes_q = interpreter.get_tensor(out_det[0]['index'])[0]
    scores_q = interpreter.get_tensor(out_det[1]['index'])[0]
    classes_q = interpreter.get_tensor(out_det[2]['index'])[0]

    boxes = BOX_SCALE * (boxes_q.astype(np.float32) - BOX_ZP)
    scores = SCORE_SCALE * scores_q.astype(np.float32)
    classes = classes_q.astype(np.int32)

    mask = scores >= CONF_THRES
    if np.any(mask):
        boxes_f = boxes[mask]
        scores_f = scores[mask]
        classes_f = classes[mask]

        x1, y1, x2, y2 = boxes_f.T
        boxes_cv2 = np.column_stack((x1, y1, x2 - x1, y2 - y1))

        idx_cv2 = cv2.dnn.NMSBoxes(
            bboxes=boxes_cv2.tolist(),
            scores=scores_f.tolist(),
            score_threshold=CONF_THRES,
            nms_threshold=NMS_IOU_THRES
        )

        if len(idx_cv2):
            idx = idx_cv2.flatten()
            sel_boxes = boxes_f[idx]
            sel_scores = scores_f[idx]
            sel_classes = classes_f[idx]

            sel_boxes[:, [0, 2]] *= sx
            sel_boxes[:, [1, 3]] *= sy
            sel_boxes = sel_boxes.astype(np.int32)

            sel_boxes[:, [0, 2]] = np.clip(sel_boxes[:, [0, 2]], 0, FRAME_W - 1)
            sel_boxes[:, [1, 3]] = np.clip(sel_boxes[:, [1, 3]], 0, FRAME_H - 1)

            for (x1i, y1i, x2i, y2i), sc, cl in zip(sel_boxes, sel_scores, sel_classes):
                cv2.rectangle(frame_rs, (x1i, y1i), (x2i, y2i), (0, 255, 0), 2)
                lab = labels[cl] if cl < len(labels) else str(cl)
                cv2.putText(frame_rs, f"{lab} {sc:.2f}", (x1i, max(10, y1i - 5)),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    if args.output == "file":
        out_writer.write(frame_rs)
    else:
        data = frame_rs.tobytes()
        buf = Gst.Buffer.new_allocate(None, len(data), None)
        buf.fill(0, data)
        buf.duration = Gst.util_uint64_scale_int(1, Gst.SECOND, FPS_OUT)
        timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) * Gst.MSECOND
        buf.pts = buf.dts = int(timestamp)
        appsrc.emit('push-buffer', buf)

Clean up resources

cap.release()
if args.output == "file":
    out_writer.release()
    print(f"Done - processed video saved to {VIDEO_OUT}")
else:
    appsrc.emit('end-of-stream')
    pipeline.set_state(Gst.State.NULL)
    print("Done - video streamed to Wayland sink")

Troubleshooting

Issue	Solution
`docker pull` fails	Verify the device has internet access. Check DNS settings and proxy configuration.
QNN delegate fails to load	Ensure the CDI and environment files match your hardware. Verify the container was started with `--device qualcomm.com/device=qimsdk`.
No video output on display	Confirm Wayland is running and a display is connected. Try the `file` output mode first to verify inference works.
Model download fails	Check network connectivity. The model is hosted on Hugging Face and may require proxy settings in some environments.
Low FPS or slow inference	Verify the model is running on the NPU (HTP backend). Check that `backend_type` is set to `'htp'` in the delegate options.

Next steps

Try different models: Replace the YOLOX model with other quantized models from Qualcomm AI Hub for tasks like image classification, pose estimation, or segmentation.
Use live camera input: Modify the application to use a camera feed instead of a pre-recorded video by changing the VIDEO_IN path to a device capture source.
Tune detection parameters: Adjust CONF_THRES and NMS_IOU_THRES to optimize detection accuracy for your use case.
Build custom applications: Use the code walkthrough as a template to create your own inference pipelines targeting the Dragonwing NPU.

​Summary

​Prerequisites

​Step 1: Configure the device

​Enable Wi-Fi and SSH

​Enable camera support (CamX)

​Step 2: Set up the Docker environment

​Pull the QIM SDK container image

​Create required directories

​Clone the SDK tools repository

​Copy configuration files to the device

​Start the container

​Access the container as root

​Step 3: Install dependencies

​Install Python tooling

​Create a virtual environment and install Python packages

​Install GStreamer and GTK dependencies

​Step 4: Download the application and model artifacts

​Create the application directory

​Download the application script

​Download the model, labels, and sample video

​Exit the root shell

​Step 5: Run the object detection application

​Enter the container as the standard user

​Activate the Python environment

​Run the application

​Code walkthrough: Object detection with OpenCV and LiteRT

​Import packages

​Parse output arguments

​Configure model parameters

​Load the model with the QNN delegate

​Set up video capture and preprocessing

​Configure the output pipeline

​Run inference in the main loop

​Clean up resources

​Troubleshooting

​Next steps

Summary

Prerequisites

Step 1: Configure the device

Enable Wi-Fi and SSH

Enable camera support (CamX)

Step 2: Set up the Docker environment

Pull the QIM SDK container image

Create required directories

Clone the SDK tools repository

Copy configuration files to the device

Start the container

Access the container as root

Step 3: Install dependencies

Install Python tooling

Create a virtual environment and install Python packages

Install GStreamer and GTK dependencies

Step 4: Download the application and model artifacts

Create the application directory

Download the application script

Download the model, labels, and sample video

Exit the root shell

Step 5: Run the object detection application

Enter the container as the standard user

Activate the Python environment

Run the application

Code walkthrough: Object detection with OpenCV and LiteRT

Import packages

Parse output arguments

Configure model parameters

Load the model with the QNN delegate

Set up video capture and preprocessing

Configure the output pipeline

Run inference in the main loop

Clean up resources

Troubleshooting

Next steps