> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Run AI models using SNPE or QNN

> Run inference on converted AI models using SNPE or QNN on CPU, GPU, or HTP on Qualcomm Dragonwing IoT platforms.

<Tabs>
  <Tab title="Neural Processing Engine">
    ## Deploy a model using Neural Processing Engine

    A model DLC (quantized or non-quantized) can be deployed via a SNPE enabled app (an application written using SNPE C/C++ APIs). SNPE offers APIs to load a DLC, select a runtime to run the model, and perform inference, etc.

    SNPE provides a prebuilt [snpe-net-run](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_tools.html#snpe-net-run) tool (application written using C APIs) that can load an arbitrary model DLC and run it on provided inputs.

    * **Model file:** DLC model file generated by the SNPE converter tool or `snpe-dlc-graph-prepare` tool (if running on HTP).
    * **Input list:** Text file, like the `input_list.txt` file used during quantization, except input raw files in this list are used for inference. For simplicity in this example, the same input list used for quantization is used for inference.
    * **Runtime:** User must select a specific runtime to run the model on target. Available runtime options are CPU, GPU, and DSP (HTP).

    <Note>
      See `snpe-net-run --help` for more details.
    </Note>

    ### Run SNPE .dlc: x86 host computer

    Converted SNPE .dlc can be run using the `snpe-net-run` tool which takes a .dlc and input list as arguments. Running SNPE DLC on x86 is purely for debugging purposes.

    For example, the following command loads the `inception_v3.dlc` model and runs the model on x86 CPU. It generates output files to `/output_x86/`.

    <Note>
      The default runtime in SNPE is CPU. When executing a model using `snpe-net-run`, there is no need to specify the CPU runtime.
    </Note>

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-net-run --container ~/models/inception_v3.dlc --input_list ~/models/input_list.txt --output_dir ~/models/output_x86
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-run-host.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=6ef09339329645d36fee43c493672696" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-run-host.png" />
    </Frame>

    ### Prepare a SNPE model to run on target

    To run the model on target, `snpe-net-run` requires the model .dlc, SNPE runtime libraries, and input list to run inferences to generate outputs.

    <Note>
      Before running on target, ensure that the SNPE SDK binaries and libraries are pushed to the target.
    </Note>

    Use artifacts from `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2` and `${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2`

    Use the correct DSP Hexagon architecture libraries for Qualcomm evaluation kits:

    * **IQ-8275:** Use v75 libraries
    * **IQ-9075:** Use v73 libraries
    * **Qualcomm Dragonwing™ RB3 Gen 2:** Use v68 libraries

    | File                                           | Source location                                 |
    | ---------------------------------------------- | ----------------------------------------------- |
    | snpe-net-run                                   | `${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2`    |
    | libSNPE.so                                     | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/`   |
    | libSnpeHtpPrepare.so                           | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2`    |
    | libSnpeHtpV68Stub.so                           | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2`    |
    | libSnpeHtpV68Skel.so                           | `${QAIRT_ROOT}/lib/hexagon-<dsp-arch>/unsigned` |
    | Inception\_v3\_quantized\_with\_htp\_cache.dlc | `~/models`                                      |
    | Inception\_v3.dlc                              | `~/models`                                      |
    | input\_list.txt                                | `~/models`                                      |
    | Inception V3 sample input images               | `/tmp/RandomInputsForInceptionV3`               |

    The following `scp` commands need to be executed on the host computer to copy SNPE SDK libraries and binaries to the device.

    <Note>
      Replace `<dsp-arch>` with `75` for IQ-8275, `73` for IQ-9075, and `68` for Qualcomm Dragonwing™ RB3 Gen 2.
    </Note>

    ```shell theme={null}
    scp ${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2/snpe-net-run \
        ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libSNPE.so \
        ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libSnpeHtpV<dsp-arch>Stub.so \
        ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libSnpeHtpPrepare.so \
        ${QAIRT_ROOT}/lib/hexagon-v<dsp-arch>/unsigned/libSnpeHtpV<dsp-arch>Skel.so \
        root@[ip-addr]:/opt/
    ```

    ```shell theme={null}
    scp ~/models/inception_v3.dlc \
        ~/models/inception_v3_quantized_with_htp_cache.dlc \
        ~/models/input_list.txt \
        root@[ip-addr]:/opt/
    ```

    ```shell theme={null}
    scp -r /tmp/RandomInputsForInceptionV3 root@[ip-addr]:/tmp/
    ```

    ```shell theme={null}
    ssh root@[ip-addr]
    ```

    ```shell theme={null}
    cd /opt
    ```

    The above steps prepared the device for model execution. The following sections provide details on running the model on the available runtimes.

    ### Run SNPE .dlc: Arm-based CPU

    Setup environment variables before executing the model to ensure SNPE binaries and libraries are accessible to run the model.

    <Note>
      The default runtime in SNPE is CPU. When executing a model using `snpe-net-run`, there is no need to specify the CPU runtime.
    </Note>

    ```shell theme={null}
    export LD_LIBRARY_PATH=/opt:$LD_LIBRARY_PATH
    ```

    ```shell theme={null}
    export PATH=/opt/:$PATH
    ```

    ```shell theme={null}
    snpe-net-run --container inception_v3.dlc --input_list input_list.txt --output_dir output_cpu
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-run-cpu.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=1ec08606f802e1f8ab58819c3ab29c12" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-run-cpu.png" />
    </Frame>

    ### Run SNPE DLC: GPU

    To run a model on the GPU runtime, use `inception_v3.dlc` and save outputs in `output_gpu`.

    In the terminal of the target device, run the following command to enable the GPU delegate and backend:

    ```shell theme={null}
    ln -sf /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so
    export OCL_ICD_FILENAMES=/usr/lib/libOpenCL_adreno.so.1
    ```

    <Note>
      To run on GPU, specify the runtime with the `--use_gpu` command line argument.
    </Note>

    ```shell theme={null}
    export LD_LIBRARY_PATH=/opt:$LD_LIBRARY_PATH
    ```

    ```shell theme={null}
    export PATH=/opt:$PATH
    ```

    ```shell theme={null}
    snpe-net-run --container inception_v3.dlc --input_list input_list.txt --output_dir output_gpu --use_gpu
    ```

    ### Run SNPE DLC: HTP

    To run a model on the HTP backend, use `inception_v3_quantized_with_htp_cache.dlc` and save outputs in `output_htp`.

    <Note>
      To run on HTP, specify the runtime with the `--use_dsp` command line argument.
    </Note>

    ```shell theme={null}
    export LD_LIBRARY_PATH=/opt:$LD_LIBRARY_PATH
    ```

    ```shell theme={null}
    export PATH=/opt:$PATH
    ```

    ```shell theme={null}
    export ADSP_LIBRARY_PATH="/opt;/usr/lib/rfsa/adsp;/dsp"
    ```

    ```shell theme={null}
    snpe-net-run --container inception_v3_quantized_with_htp_cache.dlc --input_list input_list.txt --output_dir output_htp --use_dsp
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-run-htp.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=7df1cf021e3e505010f97a4b138a2ae0" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-run-htp.png" />
    </Frame>

    ### Validate output

    For each input raw file fed to `snpe-net-run` (via the input\_list file), an output folder is generated that contains output tensor saved as raw file(s) whose size will match the model's output layer as shown in the image below.

    <Note>
      The [Netron](https://netron.app/) tool was used to visualize the model.
    </Note>

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-validate-output.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=cf5904d6d40ef92c2f223bad379bb182" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-validate-output.png" />
    </Frame>

    In the `inception_v3` example, the output raw file is a binary file that contains probabilities for 1000 classification classes.

    We use a Python script to read the file as a NumPy array to perform postprocessing and output validation. The example below checks whether HTP prediction is the same as CPU prediction.

    1. Copy the output files from the target device to the host computer so the outputs can be validated.

       ```shell theme={null}
       scp -r root@<ip-addr of the target device>:/opt/output_cpu <path>
       ```

       ```shell theme={null}
       scp -r root@<ip-addr of the target device>:/opt/output_htp <path>
       ```

           <Note>
             Use `<path>` from the above commands in the following (`compare.py`) python script.
           </Note>

    2. Once outputs from on-device inference are copied to the host computer, prepare a script to load the output tensors saved in `output_htp` and `output_cpu` to compare.

       In this example, both `output_htp` and `output_cpu` are copied to the `~/models` directory.

       The following is an example script to compare output from one of the example inputs used. Outputs from `snpe-net-run` can be loaded into NumPy `ndarrays` using the `numpy.fromfile(…)` API.

           <Note>
             By default, `snpe-net-run` saves the output tensors to NumPy files in **float32** format.
           </Note>

       ```python theme={null}
       # python postprocessing script (compare.py)

       import numpy as np

       htp_output_file_path = "<path>/output_htp/Result_1/875.raw"
       cpu_output_file_path = "<path>/output_cpu/Result_1/875.raw"

       htp_output = np.fromfile(htp_output_file_path, dtype=np.float32)
       htp_output = htp_output.reshape(1,1000)

       cpu_output = np.fromfile(cpu_output_file_path, dtype=np.float32)
       cpu_output = cpu_output.reshape(1,1000)

       # np.argmax gives the cls_id with highest probability from tensor.

       cls_id_htp = np.argmax(htp_output)
       cls_id_cpu = np.argmax(cpu_output)

       # Let's compare CPU output vs HTP output

       print("Cpu prediction {} \n Htp Prediction {}".format(cls_id_cpu, cls_id_htp))
       ```

    **Output:**

    ```
    (qairt) ██████████: ~$ python compare.py

    CPU prediction 879
    HTP prediction 21
    ```

    ### Deployment using SNPE APIs

    SNPE SDK provides C/C++ APIs to create/develop applications that run a model on chosen hardware (CPU, GPU, or HTP) with acceleration. See the [sample application](https://docs.qualcomm.com/doc/80-63442-10/topic/usergroup8.html) that demonstrates SNPE C/C++ APIs.

    ### Deployment using Qualcomm Intelligent Multimedia SDK (IM SDK)

    To improve the developer experience when building entire use case pipelines (stream from camera, preprocess images, perform inference, etc.), SNPE has been integrated into the Qualcomm IM SDK as a plugin (`qtimlsnpe`).

    The plugin has been developed on top of the SNPE C APIs and provides SNPE capabilities (load and run models). With the `qtimlsnpe` plugin, you can use your converted model DLC in a Qualcomm IM SDK pipeline to realize the use case.

    See the [Qualcomm IM SDK overview](../topic/develop-your-own-application-im-sdk) for instructions on how to deploy SNPE DLC using the Qualcomm IM SDK.
  </Tab>

  <Tab title="AI Engine Direct">
    ## Deploy a model using AI Engine Direct

    A model .so (quantized or non-quantized) can be deployed through a QNN enabled app (an application written using QNN C/C++ APIs). QNN offers APIs to load a model .so dynamically and run the model on hardware with the selected backend.

    QNN provides a prebuilt tool ([qnn-net-run](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html#qnn-net-run)) that can dynamically load this model .so and perform inference on a selected backend using provided inputs.

    For CPU, GPU, or HTP execution, `qnn-net-run` requires the following arguments:

    * **Model file:** .so file generated by `qnn-model-lib-generator`
    * **Backend file:** .so file for the targeted backend
      * `libQnnCpu.so` for the CPU backend.
      * `libQnnGpu.so` for the GPU backend.
      * `libQnnHtp.so` for the HTP backend.
    * **Input list:** Text file like the input\_list.txt file used during quantization, except input raw files in this list are used for inference. For simplicity, this example uses the same input list used for quantization for inference.

    ### Run QNN .so: x86 host computer

    Converted QNN .so can be run using the `qnn-net-run` tool which takes a model .so, backend library .so, and input list as arguments.

    For example, the following command loads the `libinception_v3.so` model and runs the model on x86 CPU. It generates output files to `~/models/output_x86/`.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/qnn-net-run --model ~/models/libs/x86_64-linux-clang/libinception_v3.so --backend ${QAIRT_ROOT}/lib/x86_64-linux-clang/libQnnCpu.so --input_list ~/models/input_list.txt --output_dir ~/models/output_qnn_x86
    ```

    ### Prepare a QNN model to run on target

    To run the model on target, `qnn-net-run` requires the model .so, QNN binaries and runtime libraries, and input list to run inferences to generate outputs.

    <Note>
      Before running on target, ensure that the QNN SDK binaries and runtime libraries are pushed to the target.
    </Note>

    Use artifacts from `${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2` and `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2`

    Use the correct DSP Hexagon architecture libraries for Qualcomm evaluation kits:

    * **IQ-8275:** Use v75 libraries
    * **IQ-9075:** Use v73 libraries
    * **Qualcomm Dragonwing™ RB3 Gen 2:** Use v68 libraries

    | File                           | Source location                                           |
    | ------------------------------ | --------------------------------------------------------- |
    | qnn-net-run                    | `${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2`              |
    | libQnnHtp.so                   | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libQnnHtp.so` |
    | libQnnCpu.so                   | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libQnnCpu.so` |
    | libQnnGpu.so                   | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libQnnGpu.so` |
    | libQnnHtpPrepare.so            | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2`              |
    | libQnnHtpV68Stub.so            | `${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2`              |
    | libinception\_v3\_quantized.so | `~/models/libs/aarch64-oe-linux-gcc11.2`                  |
    | libinception\_v3.so            | `~/models/libs/aarch64-oe-linux-gcc11.2`                  |
    | libQnnHtpV68Skel.so            | `${QAIRT_ROOT}/lib/hexagon-<dsp-arch>/unsigned`           |
    | libqnnhtpv68.cat               | `${QAIRT_ROOT}/lib/hexagon-<dsp-arch>/unsigned`           |
    | libQnnSaver.so                 | `${QAIRT_ROOT}/lib/hexagon-<dsp-arch>/unsigned`           |
    | libQnnSystem.so                | `${QAIRT_ROOT}/lib/hexagon-<dsp-arch>/unsigned`           |

    Run the following `scp` commands on the host computer to copy QNN SDK libraries and binaries to the device.

    <Note>
      Replace `<dsp-arch>` with `75` for IQ-8275, `73` for IQ-9075, and `68` for Qualcomm Dragonwing™ RB3 Gen 2.
    </Note>

    ```shell theme={null}
    scp ${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2/qnn-* \
        ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libQnn*.so \
        ${QAIRT_ROOT}/lib/hexagon-v<dsp-arch>/unsigned/* \
        root@[ip-addr]:/opt/
    ```

    ```shell theme={null}
    scp ~/models/libs/aarch64-oe-linux-gcc11.2/* ~/models/input_list.txt root@[ip-addr]:/opt/
    ```

    ```shell theme={null}
    scp -r /tmp/RandomInputsForInceptionV3 root@[ip-addr]:/tmp/
    ```

    ```shell theme={null}
    ssh root@[ip-addr]
    ```

    ```shell theme={null}
    cd /opt/
    ```

    The above steps prepared the device for model execution. The following sections provide details on running the model on the available runtimes.

    ### Run QNN .so: Arm-based CPU

    When running a model on an Arm-based CPU, `qnn-net-run` requires the model .so, backend .so library, and input list to run inference to generate outputs.

    For RB3Gen2 targets, the .so must be cross-compiled with the `aarch64-oe-linux-gcc11.2` toolchain.

    For example, the following commands write output files to `/opt/output_cpu/`.

    ```shell theme={null}
    export LD_LIBRARY_PATH=/opt/:$LD_LIBRARY_PATH
    ```

    ```shell theme={null}
    export PATH=/opt:$PATH
    ```

    ```shell theme={null}
    qnn-net-run --model libinception_v3.so --backend libQnnCpu.so --input_list input_list.txt --output_dir output_cpu
    ```

    ### Run a QNN model on GPU

    In the terminal of the target device, run the following command to enable the GPU delegate and backend:

    ```shell theme={null}
    ln -sf /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so
    export OCL_ICD_FILENAMES=/usr/lib/libOpenCL_adreno.so.1
    ```

    When executing a model on Adreno GPU, `qnn-net-run` requires the model .so, backend .so library (`libQnnGpu.so`), and input list to run inference to generate outputs.

    For example, the following commands write output files to `/opt/output_gpu/`.

    ```shell theme={null}
    export LD_LIBRARY_PATH=/opt/:$LD_LIBRARY_PATH
    ```

    ```shell theme={null}
    export PATH=/opt:$PATH
    ```

    ```shell theme={null}
    qnn-net-run --model libinception_v3.so --backend libQnnGpu.so --input_list input_list.txt --output_dir output_gpu
    ```

    ### Run QNN model on HTP backend

    To run a model on HTP backend, use the `libquantized_inception_v3.so` library and `libQnnHtp.so` backend library and save outputs in the `output_htp` directory.

    ```shell theme={null}
    export LD_LIBRARY_PATH=/opt/:$LD_LIBRARY_PATH
    ```

    ```shell theme={null}
    export PATH=/opt:$PATH
    ```

    ```shell theme={null}
    export ADSP_LIBRARY_PATH="/opt/;/usr/lib/rfsa/adsp;/dsp"
    ```

    ```shell theme={null}
    qnn-net-run --model libinception_v3_quantized.so --backend libQnnHtp.so --input_list input_list.txt --output_dir output_htp
    ```

    ### Validate output

    For each raw input file fed to `qnn-net-run` (through the input\_list file), an output folder is generated that contains raw output file(s) whose size matches the model's output layer as shown in the image below.

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/qnn-validate-output.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=73eb80fcefd55073188178853484808a" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/qnn-validate-output.png" />
    </Frame>

    In this `inception_v3` example, the raw output file is a binary file that contains probability for 1000 classification classes.

    You can use a Python script to read the file as a NumPy array to perform postprocessing and validation of the output. The example script below checks whether HTP prediction is the same as CPU prediction.

    ```shell theme={null}
    scp -r root@<ip-addr of the target device>:/opt/output_cpu <path>
    ```

    ```shell theme={null}
    scp -r root@<ip-addr of the target device>:/opt/output_htp <path>
    ```

    <Note>
      Use the `<path>` from the above commands in the following Python script.
    </Note>

    Once the outputs from the `qnn-net-run` tool are copied from the device to the host computer, create a simple Python script to load outputs from CPU and HTP execution and compare them using NumPy.

    ```python theme={null}
    #python postprocessing script (compare.py)

    import numpy as np

    htp_output_file_path = "<path>/output_htp/Result_1/_875.raw"
    cpu_output_file_path = "<path>/output_cpu/Result_1/_875.raw"

    htp_output = np.fromfile(htp_output_file_path, dtype=np.float32)
    htp_output = htp_output.reshape(1,1000)

    cpu_output = np.fromfile(cpu_output_file_path, dtype=np.float32)
    cpu_output = cpu_output.reshape(1,1000)

    cls_id_htp = np.argmax(htp_output)
    cls_id_cpu = np.argmax(cpu_output)

    # Let's compare CPU output vs HTP output
    print("CPU prediction {} \n HTP prediction {}".format(cls_id_cpu, cls_id_htp))
    ```

    **Output:**

    ```
    (qairt) ██████████: ~$ python compare.py

    CPU prediction 879
    HTP prediction 892
    ```

    ### Deployment using QNN APIs

    The Qualcomm AI Engine Direct SDK provides C/C++ APIs to create/develop applications that can load a compiled .so model and run it on a chosen backend (CPU, GPU, or HTP) with acceleration. See the [sample application](https://docs.qualcomm.com/doc/80-63442-10/topic/sample_app.html) that demonstrates QNN C/C++ APIs.

    ### Deployment using Qualcomm IM SDK

    To improve the developer experience when building entire use case pipelines (stream from camera, preprocess images, perform inference, etc.), QNN has been integrated into the Qualcomm IM SDK as the `qtimlqnn` plugin.

    The plugin has been developed on top of QNN C APIs and provides capabilities to dynamically load and run models. With the `qtimlqnn` plugin, developers can use their converted and compiled model library in the Qualcomm IM SDK pipeline to realize their use case.

    See the [Qualcomm IM SDK overview](../topic/develop-your-own-application-im-sdk) for instructions on how to deploy QNN models using Qualcomm IM SDK.
  </Tab>
</Tabs>
