End-to-end examples
Here’s a list of example applications (in Python) that implement models from AI Hub, ready to run on the NPU of your Dragonwing board: To run other models, keep reading!Finding supported models
Models in AI Hub are categorized by the supported Qualcomm chipset. To see models that will run on your development kit:Go to the model list
Go to the model list.
Select your chipset
Under ‘Chipset’, select:
- RB3 Gen 2 Vision Kit: ‘Qualcomm QCS6490 (Proxy)’
- RUBIK Pi 3: ‘Qualcomm QCS6490 (Proxy)’
- IQ-9075 EVK: ‘Qualcomm QCS9075 (Proxy)’
Deploying a model to NPU (Python)
As an example, let’s deploy the Lightweight-Face-Detection model.Running the example repository
All AI Hub models come with an example repository. This is a good starting point, as it shows exactly how to run the model. It shows what the input to your network should look like, and how to interpret the output (here, to map the output tensor to bounding boxes). The example repositories do NOT run on the NPU or GPU yet - they run without acceleration. Let’s see what our input/output should look like before we move this model to the NPU. On the AI Hub page for Lightweight-Face-Detection, click “Model repository”. This links you to a README file with instructions on how to run the example repository. To deploy this model, open the terminal on your development board, or an ssh session to your development board:Download a test image
Download an image with a face (640x480 resolution, JPG format) onto your development board:

Input resolution: AI Hub models require correctly sized inputs. You can find the required resolution under “Technical Details > Input resolution” (in HEIGHT x WIDTH (here 480x640 => 640x480 for wxh)); or inspect the size of the input tensor in the TFLite or ONNX file.
Run the example
Follow the instructions under ‘Example & Usage’ for the Facial Landmark Detection model:You can find the output image in
We have a working model. For reference, on the IQ9-EVK, running this model takes 106.86ms per inference.
out/FaceDetLitebNet_output.png.If you’re connected over ssh, you can copy the output image back to your host computer via:
Porting the model to NPU
Now that we have a working reference model, let’s run it on the NPU. There are three parts that you need to implement:- Preprocess the data — convert the image into features that you can pass to the neural network.
- Run inference — export the model to ONNX or TFLite, and run the model through LiteRT or ONNX Runtime.
- Postprocess the output — convert the output of the neural network to bounding boxes of faces.
Preprocessing inputs
For image models most AI Hub models take a matrix of(HEIGHT, WIDTH, CHANNELS) (LiteRT) or (CHANNELS, HEIGHT, WIDTH) (ONNX) scaled from 0..1. If you have 1 channel, convert the image to grayscale first. If your model is quantized (most likely) you’ll also need to read zero_point and scale, and scale the pixels accordingly (this is easy in LiteRT as they contain the quantization parameters, but ONNX does not have these). Typically you’ll end up with data scaled linearly 0..255 (uint8) or -128..127 (int8) for quantized models - so that’s relatively easy. A function that demonstrates all this in Python can be found below in the example code (def load_image_litert).
Postprocessing outputs
The same applies to postprocessing. For example, there’s no standard way of mapping the output of a neural network to bounding boxes (to detect faces in this case). For Lightweight-Face-Detection you can find the code here: face_det_lite/app.py#L77. If you’re targeting Python, it’s often easiest to copy the postprocessing code into your application; as AI Hub has a lot of dependencies that you might not want. In addition, the postprocessing code operates on PyTorch tensors, and your inference runs under LiteRT or ONNX Runtime; thus, you’ll need to change some small aspects. We’ll show this just below in the end-to-end example.End-to-end example (Python)
With the explanation behind us, let’s look at some code.Download the model
The NPU only supports uint8/int8 quantized models. Fortunately AI Hub contains pre-quantized and optimized models already. You can either:
-
Download the model for this tutorial (mirrored on CDN):
-
Or, for any other model, download the model from AI Hub and push to your development board:
- Go to Lightweight-Face-Detection.
- Click “Download model”.
-
Select “TFLite” for runtime, and “w8a8” for precision.
If your model is only available in ONNX format, see Run models using ONNX Runtime for instructions. The same principles as in this tutorial apply.

- Download the model.
-
If you’re not downloading the model directly on your Dragonwing development board, push the model over ssh:
Create the inference script
Create a new file
face_detection.py. This file contains the model invocation, plus the preprocessing and postprocessing code from the AI Hub example (see inline comments).face_detection.py (full source)
face_detection.py (full source)
Run on NPU

