> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Qualcomm AI Hub

> Deploy pretrained AI models from Qualcomm AI Hub to the NPU of your Dragonwing board using LiteRT or ONNX Runtime.

Qualcomm [AI Hub](https://aihub.qualcomm.com) contains a large collection of pretrained AI models that are optimized to run on Dragonwing hardware.

## End-to-end examples

Here's a list of example applications (in Python) that implement models from AI Hub, ready to run on the NPU of your Dragonwing board:

* [Lightweight-Face-Detection](https://github.com/qualcomm/ai-hub-models/tree/main/qai_hub_models/models/face_det_lite)

To run other models, keep reading!

## Finding supported models

Models in AI Hub are categorized by the supported Qualcomm chipset. To see models that will run on your development kit:

<Steps>
  <Step title="Go to the model list">
    Go to the [model list](https://aihub.qualcomm.com/iot/models).
  </Step>

  <Step title="Select your chipset">
    Under 'Chipset', select:

    * RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490 (Proxy)'
    * RUBIK Pi 3: 'Qualcomm QCS6490 (Proxy)'
    * IQ-9075 EVK: 'Qualcomm QCS9075 (Proxy)'
  </Step>

  <Step title="Filter for quantized models">
    Under 'Model precision', select: 'Quantized'. The NPU on your Dragonwing board only runs quantized models.
  </Step>
</Steps>

## Deploying a model to NPU (Python)

As an example, let's deploy the [Lightweight-Face-Detection](https://aihub.qualcomm.com/iot/models/face_det_lite) model.

### Running the example repository

All AI Hub models come with an example repository. This is a good starting point, as it shows exactly how to *run* the model. It shows what the input to your network should look like, and how to interpret the output (here, to map the output tensor to bounding boxes). The example repositories do *NOT* run on the NPU or GPU yet - they run without acceleration. Let's see what our input/output should look like before we move this model to the NPU.

On the AI Hub page for [Lightweight-Face-Detection](https://aihub.qualcomm.com/iot/models/face_det_lite), click "Model repository". This links you to a [README](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/face_det_lite/README.md) file with instructions on how to run the example repository.

To deploy this model, open the terminal on your development board, or an ssh session to your development board:

<Steps>
  <Step title="Set up environment">
    Create a new `venv` and install some base packages:

    ```shell theme={null}
    mkdir -p ~/aihub-demo
    cd ~/aihub-demo

    python3 -m venv .venv
    source .venv/bin/activate

    pip3 install numpy setuptools Cython shapely
    ```
  </Step>

  <Step title="Download a test image">
    Download an image with a face (640x480 resolution, JPG format) onto your development board:

    ```shell theme={null}
    wget https://cdn.edgeimpulse.com/qc-ai-docs/example-images/three-people-640-480.jpg
    ```

    <Frame caption="Input image with three people [source](https://www.pexels.com/photo/three-people-looking-excited-5622566/)">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-prod/images/ai-workflows/aihub-three-people-in.jpg" />
    </Frame>

    <Note>**Input resolution:** AI Hub models require correctly sized inputs. You can find the required resolution under "Technical Details > Input resolution" (in *HEIGHT x WIDTH* (here 480x640 => 640x480 for wxh)); or inspect the size of the input tensor in the TFLite or ONNX file.</Note>
  </Step>

  <Step title="Run the example">
    Follow the instructions under 'Example & Usage' for the Facial Landmark Detection model:

    ```shell theme={null}
    # Install the example (add --no-build-isolation)
    pip3 install --no-build-isolation "qai-hub-models[face-det-lite]"

    # Run the example
    #    Use --help to see all options
    python3 -m qai_hub_models.models.face_det_lite.demo --quantize w8a8 --image ./three-people-640-480.jpg --output-dir out/
    ```

    You can find the output image in `out/FaceDetLitebNet_output.png`.

    If you're connected over ssh, you can copy the output image back to your host computer via:

    ```shell theme={null}
    # Find IP via: ifconfig | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'
    # Then: (replace 192.168.1.148 by the IP address of your development kit)

    scp ubuntu@192.168.1.148:~/aihub-demo/out/FaceDetLitebNet_output.png ~/Downloads/FaceDetLitebNet_output.png
    ```

    <Frame caption="Lightweight face detection output">
      <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-prod/images/ai-workflows/aihub-three-people-annotated.png" />
    </Frame>

    We have a working model. For reference, on the IQ9-EVK, running this model takes 106.86ms per inference.
  </Step>
</Steps>

### Porting the model to NPU

Now that we have a working reference model, let's run it on the NPU. There are three parts that you need to implement:

1. **Preprocess** the data — convert the image into features that you can pass to the neural network.
2. **Run inference** — export the model to ONNX or TFLite, and run the model through [LiteRT](/ai-workflows/lite-rt) or [ONNX Runtime](/ai-workflows/onnxruntime).
3. **Postprocess** the output — convert the output of the neural network to bounding boxes of faces.

The model is straightforward, as you can read in the [LiteRT](/ai-workflows/lite-rt) and [ONNX Runtime](/ai-workflows/onnxruntime) pages. However, the pre- and post-processing code might not be...

#### Preprocessing inputs

For image models most AI Hub models take a matrix of `(HEIGHT, WIDTH, CHANNELS)` (LiteRT) or `(CHANNELS, HEIGHT, WIDTH)` (ONNX) scaled from 0..1. If you have 1 channel, convert the image to grayscale first. If your model is quantized (most likely) you'll also need to read zero\_point and scale, and scale the pixels accordingly (this is easy in LiteRT as they contain the quantization parameters, but ONNX does not have these). Typically you'll end up with data scaled linearly 0..255 (uint8) or -128..127 (int8) for quantized models - so that's relatively easy. A function that demonstrates all this in Python can be found below in the example code (`def load_image_litert`).

<Warning>
  *HOWEVER...* This is not guaranteed; and this is where the AI Hub example code comes in. Every AI Hub example contains the exact code used to scale inputs. In our current example - Lightweight-Face-Detection - the input is shaped `(480, 640, 1)`. However, if you look at the [preprocessing code](https://github.com/quic/ai-hub-models/blob/8cdeb11df6cc835b9b0b0cf9b602c7aa83ebfaf8/qai_hub_models/models/face_det_lite/app.py#L70) the data is not converted to grayscale, but instead only the blue channel of an RGB image is taken:

  ```python theme={null}
  img_array = img_array.astype("float32") / 255.0
  img_array = img_array[np.newaxis, ...]
  img_tensor = torch.Tensor(img_array)
  img_tensor = img_tensor[:, :, :, -1]        # HERE WE TAKE BLUE CHANNEL, NOT CONVERT TO GRAYSCALE
  ```

  These kind of things are very easy to get wrong. So if you see non-matching results between your implementation and the AI Hub example: read the code. This applies even more for non-image inputs (e.g. audio). Use the demo code to understand what the model actually expects.
</Warning>

#### Postprocessing outputs

The same applies to postprocessing. For example, there's no standard way of mapping the output of a neural network to bounding boxes (to detect faces in this case). For Lightweight-Face-Detection you can find the code here: [face\_det\_lite/app.py#L77](https://github.com/quic/ai-hub-models/blob/8cdeb11df6cc835b9b0b0cf9b602c7aa83ebfaf8/qai_hub_models/models/face_det_lite/app.py#L77).

If you're targeting Python, it's often easiest to copy the postprocessing code into your application; as AI Hub has a lot of dependencies that you might not want. In addition, the postprocessing code operates on PyTorch tensors, and your inference runs under LiteRT or ONNX Runtime; thus, you'll need to change some small aspects. We'll show this just below in the end-to-end example.

### End-to-end example (Python)

With the explanation behind us, let's look at some code.

<Steps>
  <Step title="Set up base requirements">
    Open a terminal on your development board:

    ```shell theme={null}
    # Create a new fresh directory
    mkdir -p ~/aihub-npu
    cd ~/aihub-npu

    # Create a new venv
    python3 -m venv .venv
    source .venv/bin/activate

    # Install the LiteRT runtime (to run models) and Pillow (to parse images)
    pip3 install ai-edge-litert==1.3.0 Pillow

    # Download an example image
    wget https://cdn.edgeimpulse.com/qc-ai-docs/example-images/three-people-640-480.jpg
    ```
  </Step>

  <Step title="Download the model">
    The NPU only supports uint8/int8 quantized models. Fortunately AI Hub contains pre-quantized and optimized models already. You can either:

    * Download the model for this tutorial (mirrored on CDN):

      ```shell theme={null}
      wget https://cdn.edgeimpulse.com/qc-ai-docs/models/face_det_lite-lightweight-face-detection-w8a8.tflite
      ```

    * Or, for any other model, download the model from AI Hub and push to your development board:

      1. Go to [Lightweight-Face-Detection](https://aihub.qualcomm.com/iot/models/face_det_lite).

      2. Click "Download model".

      3. Select "TFLite" for runtime, and "w8a8" for precision.

             <Frame caption="Downloading w8a8 quantized model from AI Hub in TFLite format">
               <img src="https://mintlify.s3.us-west-1.amazonaws.com/qualcomm-prod/images/ai-workflows/aihub-download.png" />
             </Frame>

         If your model is only available in ONNX format, see [Run models using ONNX Runtime](/ai-workflows/onnxruntime) for instructions. The same principles as in this tutorial apply.

      4. Download the model.

      5. If you're not downloading the model directly on your Dragonwing development board, push the model over ssh:

         ```shell theme={null}
         # Find your board's IP
         ifconfig | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'

         # Push the .tflite file (replace IP)
         scp face_det_lite-lightweight-face-detection-w8a8.tflite ubuntu@192.168.1.253:~/face_det_lite-lightweight-face-detection-w8a8.tflite
         ```
  </Step>

  <Step title="Create the inference script">
    Create a new file `face_detection.py`. This file contains the model invocation, plus the preprocessing and postprocessing code from the AI Hub example (see inline comments).

    <Accordion title="face_detection.py (full source)">
      ```python theme={null}
      import numpy as np
      from ai_edge_litert.interpreter import Interpreter, load_delegate
      from PIL import Image, ImageDraw
      import os, time, sys

      def curr_ms():
          return round(time.time() * 1000)

      # Paths
      IMAGE_IN = 'three-people-640-480.jpg'
      IMAGE_OUT = 'three-people-640-480-overlay.jpg'
      MODEL_PATH = 'face_det_lite-lightweight-face-detection-w8a8.tflite'

      # If we pass in --use-npu we offload to NPU
      use_npu = True if len(sys.argv) >= 2 and sys.argv[1] == '--use-npu' else False

      experimental_delegates = []
      if use_npu:
          experimental_delegates = [load_delegate("libQnnTFLiteDelegate.so", options={"backend_type":"htp"})]

      # Load TFLite model and allocate tensors
      interpreter = Interpreter(
          model_path=MODEL_PATH,
          experimental_delegates=experimental_delegates
      )
      interpreter.allocate_tensors()

      # Get input and output tensor details
      input_details = interpreter.get_input_details()
      output_details = interpreter.get_output_details()

      # === BEGIN PREPROCESSING ===

      # Load an image (using Pillow) and make it in the right format that the interpreter expects (e.g. quantize)
      # All AI Hub image models use 0..1 inputs to start.
      def load_image_litert(interpreter, path, single_channel_behavior: str = 'grayscale'):
          d = interpreter.get_input_details()[0]
          shape = [int(x) for x in d["shape"]]  # e.g. [1, H, W, C] or [1, C, H, W]
          dtype = d["dtype"]
          scale, zp = d.get("quantization", (0.0, 0))

          if len(shape) != 4 or shape[0] != 1:
              raise ValueError(f"Unexpected input shape: {shape}")

          # Detect layout
          if shape[1] in (1, 3):   # [1, C, H, W]
              layout, C, H, W = "NCHW", shape[1], shape[2], shape[3]
          elif shape[3] in (1, 3): # [1, H, W, C]
              layout, C, H, W = "NHWC", shape[3], shape[1], shape[2]
          else:
              raise ValueError(f"Cannot infer layout from shape {shape}")

          # Load & resize
          img = Image.open(path).convert("RGB").resize((W, H), Image.BILINEAR)
          arr = np.array(img)
          if C == 1:
              if single_channel_behavior == 'grayscale':
                  gray = np.asarray(Image.fromarray(arr).convert('L'))
              elif single_channel_behavior in ('red', 'green', 'blue'):
                  ch_idx = {'red': 0, 'green': 1, 'blue': 2}[single_channel_behavior]
                  gray = arr[:, :, ch_idx]
              else:
                  raise ValueError(f"Invalid single_channel_behavior: {single_channel_behavior}")
              arr = gray[..., np.newaxis]

          # HWC -> correct layout
          if layout == "NCHW":
              arr = np.transpose(arr, (2, 0, 1))  # (C,H,W)

          # Scale 0..1 (all AI Hub image models use this)
          arr = (arr / 255.0).astype(np.float32)

          # Quantize if needed
          if scale and float(scale) != 0.0:
              q = np.rint(arr / float(scale) + int(zp))
              if dtype == np.uint8:
                  arr = np.clip(q, 0, 255).astype(np.uint8)
              else:
                  arr = np.clip(q, -128, 127).astype(np.int8)

          return np.expand_dims(arr, 0)  # add batch

      # This model looks like grayscale, but AI Hub inference actually takes the BLUE channel
      # see https://github.com/quic/ai-hub-models/blob/8cdeb11df6cc835b9b0b0cf9b602c7aa83ebfaf8/qai_hub_models/models/face_det_lite/app.py#L70
      input_data = load_image_litert(interpreter, IMAGE_IN, single_channel_behavior='blue')

      # === END PREPROCESSING (input_data contains right data) ===

      # Set tensor and run inference
      interpreter.set_tensor(input_details[0]['index'], input_data)

      # Run once to warmup
      interpreter.invoke()

      # Then run 10x
      start = curr_ms()
      for i in range(0, 10):
          interpreter.invoke()
      end = curr_ms()

      # === BEGIN POSTPROCESSING ===

      # Grab 3 output tensors and dequantize
      q_output_0 = interpreter.get_tensor(output_details[0]['index'])
      scale_0, zero_point_0 = output_details[0]['quantization']
      hm = ((q_output_0.astype(np.float32) - zero_point_0) * scale_0)[0]

      q_output_1 = interpreter.get_tensor(output_details[1]['index'])
      scale_1, zero_point_1 = output_details[1]['quantization']
      box = ((q_output_1.astype(np.float32) - zero_point_1) * scale_1)[0]

      q_output_2 = interpreter.get_tensor(output_details[2]['index'])
      scale_2, zero_point_2 = output_details[2]['quantization']
      landmark = ((q_output_2.astype(np.float32) - zero_point_2) * scale_2)[0]

      # Taken from https://github.com/quic/ai-hub-models/blob/8cdeb11df6cc835b9b0b0cf9b602c7aa83ebfaf8/qai_hub_models/utils/bounding_box_processing.py#L369
      def get_iou(boxA: np.ndarray, boxB: np.ndarray) -> float:
          xA = max(boxA[0], boxB[0])
          yA = max(boxA[1], boxB[1])
          xB = min(boxA[2], boxB[2])
          yB = min(boxA[3], boxB[3])
          inter_area = max(0, xB - xA + 1) * max(0, yB - yA + 1)
          boxA_area = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
          boxB_area = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
          return inter_area / float(boxA_area + boxB_area - inter_area)

      # Taken from https://github.com/quic/ai-hub-models/blob/8cdeb11df6cc835b9b0b0cf9b602c7aa83ebfaf8/qai_hub_models/models/face_det_lite/utils.py
      class BBox:
          def __init__(self, label, xyrb, score=0, landmark=None, rotate=False):
              self.label = label
              self.score = score
              self.landmark = landmark
              self.x, self.y, self.r, self.b = xyrb
              self.rotate = rotate
              minx = min(self.x, self.r)
              maxx = max(self.x, self.r)
              miny = min(self.y, self.b)
              maxy = max(self.y, self.b)
              self.x, self.y, self.r, self.b = minx, miny, maxx, maxy

          @property
          def width(self): return self.r - self.x + 1
          @property
          def height(self): return self.b - self.y + 1
          @property
          def box(self): return [self.x, self.y, self.r, self.b]
          @box.setter
          def box(self, newvalue): self.x, self.y, self.r, self.b = newvalue
          @property
          def haslandmark(self): return self.landmark is not None
          @property
          def xywh(self): return [self.x, self.y, self.width, self.height]

      def nms(objs, iou=0.5):
          if objs is None or len(objs) <= 1:
              return objs
          objs = sorted(objs, key=lambda obj: obj.score, reverse=True)
          keep = []
          flags = [0] * len(objs)
          for index, obj in enumerate(objs):
              if flags[index] != 0:
                  continue
              keep.append(obj)
              for j in range(index + 1, len(objs)):
                  if flags[j] == 0 and get_iou(np.array(obj.box), np.array(objs[j].box)) > iou:
                      flags[j] = 1
          return keep

      def detect(hm, box, landmark, threshold=0.2, nms_iou=0.2, stride=8):
          def _sigmoid(x):
              out = np.empty_like(x, dtype=np.float32)
              np.negative(x, out=out)
              np.exp(out, out=out)
              out += 1.0
              np.divide(1.0, out, out=out)
              return out

          def _maxpool3x3_same(x_hw):
              H, W = x_hw.shape
              pad = 1
              xpad = np.pad(x_hw, ((pad, pad), (pad, pad)), mode='constant', constant_values=-np.inf)
              s0, s1 = xpad.strides
              shape = (H, W, 3, 3)
              strides = (s0, s1, s0, s1)
              windows = np.lib.stride_tricks.as_strided(xpad, shape=shape, strides=strides, writeable=False)
              return windows.max(axis=(2, 3))

          def _topk_desc(values_flat, k):
              if k <= 0:
                  return np.array([], dtype=values_flat.dtype), np.array([], dtype=np.int64)
              k = min(k, values_flat.size)
              idx_part = np.argpartition(-values_flat, k - 1)[:k]
              order = np.argsort(-values_flat[idx_part])
              idx_sorted = idx_part[order]
              return values_flat[idx_sorted], idx_sorted

          hm = _sigmoid(hm.astype(np.float32, copy=False))
          hm_hw = hm[..., 0]
          hm_pool = _maxpool3x3_same(hm_hw)
          keep = (hm_hw >= hm_pool)
          candidate_scores = np.where(keep, hm_hw, 0.0).ravel()
          num_candidates = int(keep.sum())
          k = min(num_candidates, 2000)
          scores_k, flat_idx_k = _topk_desc(candidate_scores, k)

          H, W = hm_hw.shape
          ys = (flat_idx_k // W).astype(np.int32)
          xs = (flat_idx_k %  W).astype(np.int32)

          objs = []
          for cx, cy, score in zip(xs, ys, scores_k):
              if score < threshold:
                  break
              x, y, r, b = box[cy, cx].astype(np.float32, copy=False)
              cxcycxcy = np.array([cx, cy, cx, cy], dtype=np.float32)
              xyrb = (cxcycxcy + np.array([-x, -y, r, b], dtype=np.float32)) * float(stride)
              xyrb = xyrb.astype(np.int32, copy=False).tolist()
              x5y5 = landmark[cy, cx].astype(np.float32, copy=False)
              x5y5 = x5y5 + np.array([cx]*5 + [cy]*5, dtype=np.float32)
              x5y5 *= float(stride)
              box_landmark = list(zip(x5y5[:5].tolist(), x5y5[5:].tolist()))
              objs.append(BBox("0", xyrb=xyrb, score=float(score), landmark=box_landmark))

          if nms_iou != -1:
              return nms(objs, iou=nms_iou)
          return objs

      dets = detect(hm, box, landmark, threshold=0.55, nms_iou=-1, stride=8)
      res = []
      for n in range(0, len(dets)):
          xmin, ymin, w, h = dets[n].xywh
          score = dets[n].score
          L, R, T, B = int(xmin), int(xmin + w), int(ymin), int(ymin + h)
          W, H = int(w), int(h)

          if L < 0: L = 0
          if T < 0: T = 0
          if R >= 640: R = 640 - 1
          if B >= 480: B = 480 - 1

          b_Left = L - int(W * 0.05)
          b_Top = T - int(H * 0.05)
          b_Width = int(W * 1.1)
          b_Height = int(H * 1.1)

          if b_Left >= 0 and b_Top >= 0 and b_Width - 1 + b_Left < 640 and b_Height - 1 + b_Top < 480:
              L, T, W, H = b_Left, b_Top, b_Width, b_Height
              R, B = W - 1 + L, H - 1 + T

          print(f'Found face: x={L}, y={T}, w={W}, h={H}, score={score}')
          res.append([L, T, W, H, score])

      # === END POSTPROCESSING ===

      # Create output image with bounding boxes
      input_reshaped = input_data.reshape(input_data.shape[1:])
      if input_reshaped.shape[2] == 1:
          input_reshaped = np.squeeze(input_reshaped, axis=-1)

      img_out = Image.fromarray(input_reshaped).convert("RGB")
      draw = ImageDraw.Draw(img_out)
      for bb in res:
          L, T, W, H, score = bb
          draw.rectangle([L, T, L + W, T + H], outline="#00FF00", width=3)
      img_out.save(IMAGE_OUT)

      print('')
      print(f'Inference took (on average): {(end - start) / 10}ms. per image')
      ```
    </Accordion>
  </Step>

  <Step title="Run on CPU">
    ```shell theme={null}
    python3 face_detection.py

    # INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    # Found face: x=120, y=186, w=62, h=79, score=0.8306506276130676
    # Found face: x=311, y=125, w=66, h=81, score=0.8148472309112549
    # Found face: x=424, y=173, w=64, h=86, score=0.8093323111534119
    #
    # Inference took (on average): 29.8ms. per image
    ```

    This already brings down our time per inference from 106.86ms to 29.8ms.
  </Step>

  <Step title="Run on NPU">
    ```shell theme={null}
    python3 face_detection.py --use-npu

    # Found face: x=312, y=125, w=64, h=81, score=0.8202381134033203
    # Found face: x=120, y=186, w=62, h=78, score=0.8202381134033203
    # Found face: x=421, y=173, w=67, h=86, score=0.8093323111534119
    #
    # Inference took (on average): 1.5ms. per image
    ```

    By quantizing this model and porting it to the NPU we've sped the model up 71x. You're not limited to Python either — the [LiteRT](/ai-workflows/lite-rt) page has C++ examples as well.
  </Step>
</Steps>
