Skip to main content

Summary

Use the LiteRT runtime with the QNN delegate to accelerate AI inference on the NPU of Qualcomm® Dragonwing™ devices. This guide demonstrates the end-to-end workflow by deploying a quantized object detection model (YOLOX) that processes video input and outputs annotated frames with bounding boxes — either saved to a file or streamed to a display. What you’ll learn:
  • Configure a Dragonwing device and deploy the QIM SDK Docker environment
  • Run a pre-built object detection application accelerated on the NPU
  • Understand the application code to adapt it for your own models and use cases

Prerequisites

Ensure you have the following before proceeding:
RequirementDetails
HardwareQualcomm® Dragonwing™ device with NPU support
Host machineLinux or macOS with SSH client and Docker support
NetworkWi-Fi or Ethernet connectivity on the target device
SoftwareDocker installed on the target device

Step 1: Configure the device

Enable Wi-Fi and SSH

The device requires an internet connection to download artifacts needed for the sample application. If SSH and Wi-Fi are already configured, skip this step. Follow Set up an SSH connection to enable Wi-Fi and SSH on the device.

Enable camera support (CamX)

If you plan to use camera input, enable CamX on the platform:
echo -n "camx" > /var/data
efivar -n 882f8c2b-9646-435f-8de5-f208ff80c1bd-VendorDtbOverlays -w -f /var/data
efivar -n 882f8c2b-9646-435f-8de5-f208ff80c1bd-VendorDtbOverlays -p
sync
reboot
The device will reboot after this step. Wait for it to come back online before continuing.

Step 2: Set up the Docker environment

Pull the QIM SDK container image

On the target device, pull the latest QIM SDK Docker image:
cd $HOME
docker pull artifacts.codelinaro.org/iot-solutions-microservices/qimsdk:latest

Create required directories

Create directories for storing artifacts, configuration files, models, and media:
mkdir -p /etc/cdi /etc/docker/env /etc/models /etc/labels /etc/media /root/media /root/models /root/labels /root/configs

Clone the SDK tools repository

On your host machine, clone the QIM SDK Debian repository:
git clone https://git.codelinaro.org/clo/le/sdk-tools.git -b imsdk-tools.lnx.1.0
cd sdk-tools/qimsdk-debian/

Copy configuration files to the device

Transfer the CDI and environment files from your host machine to the target device:
scp -r cdi/<hardware>_qli_2x_qimsdk.json root@<IP_ADDRESS>:/etc/cdi/qimsdk.json
scp -r env/<hardware>_qli_2x_qimsdk.env root@<IP_ADDRESS>:/etc/docker/env/qimsdk.env
Replace <hardware> with the appropriate identifier for your target device (check the repository for available options) and <IP_ADDRESS> with your device’s IP address.

For instance, if the target device is Qualcomm Dragonwing™ RB3 Gen 2, then replace <hardware> with qcs6490.

Start the container

Launch the QIM SDK container on the target device:
docker run -it -d \
   --net host \
   --env-file /etc/docker/env/qimsdk.env \
   --device qualcomm.com/device=qimsdk \
   -h qimsdk \
   --name qimsdk \
   artifacts.codelinaro.org/iot-solutions-microservices/qimsdk:latest

Access the container as root

export DOCKER_ID=$(docker ps -aq)
docker exec -it ${DOCKER_ID} sh
To verify you are logged in as root, run whoami inside the container. The output should be root.

Step 3: Install dependencies

Inside the container (as root), install the LiteRT runtime and required packages.

Install Python tooling

apt update
apt install python3-pip python3-venv

Create a virtual environment and install Python packages

python3 -m venv venv-litert-demo --system-site-packages
. venv-litert-demo/bin/activate
pip3 install ai-edge-litert Pillow opencv-python

Install GStreamer and GTK dependencies

These packages are required for video display output via Wayland:
apt install -y libgstreamer1.0-dev gstreamer1.0-plugins-ugly gstreamer1.0-libav \
              gstreamer1.0-alsa gstreamer1.0-gtk3 python3-gi python3-gi-cairo \
              gir1.2-gtk-3.0 python3-full pkg-config cmake libcairo2-dev \
              libgirepository1.0-dev gir1.2-glib-2.0 build-essential python3-dev \
              pkg-config meson

Step 4: Download the application and model artifacts

Still inside the container, set up the object detection application:

Create the application directory

mkdir -p /etc/apps/ && cd /etc/apps/

Download the application script

curl -L https://raw.githubusercontent.com/qualcomm/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/applications/LiteRT/object_detection.py -o /etc/apps/object_detection.py

Download the model, labels, and sample video

curl -L https://raw.githubusercontent.com/qualcomm/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/artifacts/labels/coco_labels.txt -o /etc/labels/coco_labels.txt
curl -L https://raw.githubusercontent.com/qualcomm/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/artifacts/videos/video.mp4 -o /etc/media/video.mp4
curl -L https://huggingface.co/qualcomm/Yolo-X/resolve/v0.30.5/Yolo-X_w8a8.tflite -o /etc/models/yolox_quantized.tflite

Exit the root shell

You need to exit and re-enter the container as the qimsdk user to run the application:
exit

Step 5: Run the object detection application

Enter the container as the standard user

docker exec -it ${DOCKER_ID} bash

Activate the Python environment

. venv-litert-demo/bin/activate

Run the application

cd /etc/apps
Run the application and save the output as a video file:
python3 object_detection.py --output file
Once processing is complete, retrieve the output video:
exit
docker cp ${DOCKER_ID}:/etc/apps/output_object_detection.mp4 /etc/media/output_object_detection.mp4
To copy the file to your host machine:
scp root@<IP_ADDRESS>:/etc/media/output_object_detection.mp4 .

Code walkthrough: Object detection with OpenCV and LiteRT

This section explains the object_detection.py application. Use this as a reference to build custom inference applications with LiteRT on Qualcomm Dragonwing devices.
The postprocessing in the following code is designed for object detection models from Qualcomm AI Hub. For custom models, update the postprocessing logic to match the model’s output format and requirements.

Import packages

#!/usr/bin/env python3
import cv2
import numpy as np
import argparse
import ai_edge_litert.interpreter as tflite
import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst

Parse output arguments

parser = argparse.ArgumentParser(description="Run object detection and output to file or Wayland.")
parser.add_argument("--output", choices=["file", "wayland"], default="file",
                    help="Choose output mode: 'file' (default) or 'wayland'")
args = parser.parse_args()

Configure model parameters

MODEL_PATH = "/etc/models/yolox_quantized.tflite"  # YOLOX quantized model
LABEL_PATH = "/etc/labels/coco_labels.txt"
VIDEO_IN = "/etc/media/video.mp4"
VIDEO_OUT = "output_object_detection.mp4"
DELEGATE_PATH = "libQnnTFLiteDelegate.so"

FRAME_W, FRAME_H = 1600, 900
FPS_OUT = 30
CONF_THRES = 0.25
NMS_IOU_THRES = 0.50
BOX_SCALE = 3.2108588218688965
BOX_ZP = 31.0
SCORE_SCALE = 0.0038042240776121616

Load the model with the QNN delegate

The QNN delegate enables inference on the NPU:
delegate_options = {'backend_type': 'htp'}
delegate = tflite.load_delegate(DELEGATE_PATH, delegate_options)
interpreter = tflite.Interpreter(model_path=MODEL_PATH, experimental_delegates=[delegate])
interpreter.allocate_tensors()

in_det = interpreter.get_input_details()
out_det = interpreter.get_output_details()
in_h, in_w = in_det[0]["shape"][1:3]

labels = [l.strip() for l in open(LABEL_PATH)]

Set up video capture and preprocessing

cap = cv2.VideoCapture(VIDEO_IN)
sx, sy = FRAME_W / in_w, FRAME_H / in_h
frame_rs = np.empty((FRAME_H, FRAME_W, 3), np.uint8)
input_tensor = np.empty((1, in_h, in_w, 3), np.uint8)

Configure the output pipeline

if args.output == "file":
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    out_writer = cv2.VideoWriter(VIDEO_OUT, fourcc, FPS_OUT, (FRAME_W, FRAME_H))
else:
    Gst.init(None)
    pipeline = Gst.parse_launch(
        'appsrc name=src is-live=true block=true format=time caps=video/x-raw,format=BGR,width=1600,height=900,framerate=30/1 ! videoconvert ! waylandsink'
    )
    appsrc = pipeline.get_by_name('src')
    pipeline.set_state(Gst.State.PLAYING)

frame_cnt = 0

Run inference in the main loop

Read each video frame, run inference, apply NMS, and draw bounding boxes:
while True:
    ok, frame = cap.read()
    if not ok:
        break
    frame_cnt += 1

    cv2.resize(frame, (FRAME_W, FRAME_H), dst=frame_rs)
    cv2.resize(frame_rs, (in_w, in_h), dst=input_tensor[0])

    interpreter.set_tensor(in_det[0]['index'], input_tensor)
    interpreter.invoke()

    boxes_q = interpreter.get_tensor(out_det[0]['index'])[0]
    scores_q = interpreter.get_tensor(out_det[1]['index'])[0]
    classes_q = interpreter.get_tensor(out_det[2]['index'])[0]

    boxes = BOX_SCALE * (boxes_q.astype(np.float32) - BOX_ZP)
    scores = SCORE_SCALE * scores_q.astype(np.float32)
    classes = classes_q.astype(np.int32)

    mask = scores >= CONF_THRES
    if np.any(mask):
        boxes_f = boxes[mask]
        scores_f = scores[mask]
        classes_f = classes[mask]

        x1, y1, x2, y2 = boxes_f.T
        boxes_cv2 = np.column_stack((x1, y1, x2 - x1, y2 - y1))

        idx_cv2 = cv2.dnn.NMSBoxes(
            bboxes=boxes_cv2.tolist(),
            scores=scores_f.tolist(),
            score_threshold=CONF_THRES,
            nms_threshold=NMS_IOU_THRES
        )

        if len(idx_cv2):
            idx = idx_cv2.flatten()
            sel_boxes = boxes_f[idx]
            sel_scores = scores_f[idx]
            sel_classes = classes_f[idx]

            sel_boxes[:, [0, 2]] *= sx
            sel_boxes[:, [1, 3]] *= sy
            sel_boxes = sel_boxes.astype(np.int32)

            sel_boxes[:, [0, 2]] = np.clip(sel_boxes[:, [0, 2]], 0, FRAME_W - 1)
            sel_boxes[:, [1, 3]] = np.clip(sel_boxes[:, [1, 3]], 0, FRAME_H - 1)

            for (x1i, y1i, x2i, y2i), sc, cl in zip(sel_boxes, sel_scores, sel_classes):
                cv2.rectangle(frame_rs, (x1i, y1i), (x2i, y2i), (0, 255, 0), 2)
                lab = labels[cl] if cl < len(labels) else str(cl)
                cv2.putText(frame_rs, f"{lab} {sc:.2f}", (x1i, max(10, y1i - 5)),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    if args.output == "file":
        out_writer.write(frame_rs)
    else:
        data = frame_rs.tobytes()
        buf = Gst.Buffer.new_allocate(None, len(data), None)
        buf.fill(0, data)
        buf.duration = Gst.util_uint64_scale_int(1, Gst.SECOND, FPS_OUT)
        timestamp = cap.get(cv2.CAP_PROP_POS_MSEC) * Gst.MSECOND
        buf.pts = buf.dts = int(timestamp)
        appsrc.emit('push-buffer', buf)

Clean up resources

cap.release()
if args.output == "file":
    out_writer.release()
    print(f"Done - processed video saved to {VIDEO_OUT}")
else:
    appsrc.emit('end-of-stream')
    pipeline.set_state(Gst.State.NULL)
    print("Done - video streamed to Wayland sink")

Troubleshooting

IssueSolution
docker pull failsVerify the device has internet access. Check DNS settings and proxy configuration.
QNN delegate fails to loadEnsure the CDI and environment files match your hardware. Verify the container was started with --device qualcomm.com/device=qimsdk.
No video output on displayConfirm Wayland is running and a display is connected. Try the file output mode first to verify inference works.
Model download failsCheck network connectivity. The model is hosted on Hugging Face and may require proxy settings in some environments.
Low FPS or slow inferenceVerify the model is running on the NPU (HTP backend). Check that backend_type is set to 'htp' in the delegate options.

Next steps

  • Try different models: Replace the YOLOX model with other quantized models from Qualcomm AI Hub for tasks like image classification, pose estimation, or segmentation.
  • Use live camera input: Modify the application to use a camera feed instead of a pre-recorded video by changing the VIDEO_IN path to a device capture source.
  • Tune detection parameters: Adjust CONF_THRES and NMS_IOU_THRES to optimize detection accuracy for your use case.
  • Build custom applications: Use the code walkthrough as a template to create your own inference pipelines targeting the Dragonwing NPU.