Skip to main content

Deploy a model using Neural Processing Engine

A model DLC (quantized or non-quantized) can be deployed via a SNPE enabled app (an application written using SNPE C/C++ APIs). SNPE offers APIs to load a DLC, select a runtime to run the model, and perform inference, etc.SNPE provides a prebuilt snpe-net-run tool (application written using C APIs) that can load an arbitrary model DLC and run it on provided inputs.
  • Model file: DLC model file generated by the SNPE converter tool or snpe-dlc-graph-prepare tool (if running on HTP).
  • Input list: Text file, like the input_list.txt file used during quantization, except input raw files in this list are used for inference. For simplicity in this example, the same input list used for quantization is used for inference.
  • Runtime: User must select a specific runtime to run the model on target. Available runtime options are CPU, GPU, and DSP (HTP).
See snpe-net-run --help for more details.

Run SNPE .dlc: x86 host computer

Converted SNPE .dlc can be run using the snpe-net-run tool which takes a .dlc and input list as arguments. Running SNPE DLC on x86 is purely for debugging purposes.For example, the following command loads the inception_v3.dlc model and runs the model on x86 CPU. It generates output files to /output_x86/.
The default runtime in SNPE is CPU. When executing a model using snpe-net-run, there is no need to specify the CPU runtime.
${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-net-run --container ~/models/inception_v3.dlc --input_list ~/models/input_list.txt --output_dir ~/models/output_x86

Prepare a SNPE model to run on target

To run the model on target, snpe-net-run requires the model .dlc, SNPE runtime libraries, and input list to run inferences to generate outputs.
Before running on target, ensure that the SNPE SDK binaries and libraries are pushed to the target.
Use artifacts from ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2 and ${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2Use the correct DSP Hexagon architecture libraries for Qualcomm evaluation kits:
  • IQ-8275: Use v75 libraries
  • IQ-9075: Use v73 libraries
  • Qualcomm Dragonwing™ RB3 Gen 2: Use v68 libraries
FileSource location
snpe-net-run${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2
libSNPE.so${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/
libSnpeHtpPrepare.so${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2
libSnpeHtpV68Stub.so${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2
libSnpeHtpV68Skel.so${QAIRT_ROOT}/lib/hexagon-<dsp-arch>/unsigned
Inception_v3_quantized_with_htp_cache.dlc~/models
Inception_v3.dlc~/models
input_list.txt~/models
Inception V3 sample input images/tmp/RandomInputsForInceptionV3
The following scp commands need to be executed on the host computer to copy SNPE SDK libraries and binaries to the device.
Replace <dsp-arch> with 75 for IQ-8275, 73 for IQ-9075, and 68 for Qualcomm Dragonwing™ RB3 Gen 2.
scp ${QAIRT_ROOT}/bin/aarch64-oe-linux-gcc11.2/snpe-net-run \
    ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libSNPE.so \
    ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libSnpeHtpV<dsp-arch>Stub.so \
    ${QAIRT_ROOT}/lib/aarch64-oe-linux-gcc11.2/libSnpeHtpPrepare.so \
    ${QAIRT_ROOT}/lib/hexagon-v<dsp-arch>/unsigned/libSnpeHtpV<dsp-arch>Skel.so \
    root@[ip-addr]:/opt/
scp ~/models/inception_v3.dlc \
    ~/models/inception_v3_quantized_with_htp_cache.dlc \
    ~/models/input_list.txt \
    root@[ip-addr]:/opt/
scp -r /tmp/RandomInputsForInceptionV3 root@[ip-addr]:/tmp/
ssh root@[ip-addr]
cd /opt
The above steps prepared the device for model execution. The following sections provide details on running the model on the available runtimes.

Run SNPE .dlc: Arm-based CPU

Setup environment variables before executing the model to ensure SNPE binaries and libraries are accessible to run the model.
The default runtime in SNPE is CPU. When executing a model using snpe-net-run, there is no need to specify the CPU runtime.
export LD_LIBRARY_PATH=/opt:$LD_LIBRARY_PATH
export PATH=/opt/:$PATH
snpe-net-run --container inception_v3.dlc --input_list input_list.txt --output_dir output_cpu

Run SNPE DLC: GPU

To run a model on the GPU runtime, use inception_v3.dlc and save outputs in output_gpu.In the terminal of the target device, run the following command to enable the GPU delegate and backend:
ln -sf /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so
export OCL_ICD_FILENAMES=/usr/lib/libOpenCL_adreno.so.1
To run on GPU, specify the runtime with the --use_gpu command line argument.
export LD_LIBRARY_PATH=/opt:$LD_LIBRARY_PATH
export PATH=/opt:$PATH
snpe-net-run --container inception_v3.dlc --input_list input_list.txt --output_dir output_gpu --use_gpu

Run SNPE DLC: HTP

To run a model on the HTP backend, use inception_v3_quantized_with_htp_cache.dlc and save outputs in output_htp.
To run on HTP, specify the runtime with the --use_dsp command line argument.
export LD_LIBRARY_PATH=/opt:$LD_LIBRARY_PATH
export PATH=/opt:$PATH
export ADSP_LIBRARY_PATH="/opt;/usr/lib/rfsa/adsp;/dsp"
snpe-net-run --container inception_v3_quantized_with_htp_cache.dlc --input_list input_list.txt --output_dir output_htp --use_dsp

Validate output

For each input raw file fed to snpe-net-run (via the input_list file), an output folder is generated that contains output tensor saved as raw file(s) whose size will match the model’s output layer as shown in the image below.
The Netron tool was used to visualize the model.
In the inception_v3 example, the output raw file is a binary file that contains probabilities for 1000 classification classes.We use a Python script to read the file as a NumPy array to perform postprocessing and output validation. The example below checks whether HTP prediction is the same as CPU prediction.
  1. Copy the output files from the target device to the host computer so the outputs can be validated.
    scp -r root@<ip-addr of the target device>:/opt/output_cpu <path>
    
    scp -r root@<ip-addr of the target device>:/opt/output_htp <path>
    
    Use <path> from the above commands in the following (compare.py) python script.
  2. Once outputs from on-device inference are copied to the host computer, prepare a script to load the output tensors saved in output_htp and output_cpu to compare. In this example, both output_htp and output_cpu are copied to the ~/models directory. The following is an example script to compare output from one of the example inputs used. Outputs from snpe-net-run can be loaded into NumPy ndarrays using the numpy.fromfile(…) API.
    By default, snpe-net-run saves the output tensors to NumPy files in float32 format.
    # python postprocessing script (compare.py)
    
    import numpy as np
    
    htp_output_file_path = "<path>/output_htp/Result_1/875.raw"
    cpu_output_file_path = "<path>/output_cpu/Result_1/875.raw"
    
    htp_output = np.fromfile(htp_output_file_path, dtype=np.float32)
    htp_output = htp_output.reshape(1,1000)
    
    cpu_output = np.fromfile(cpu_output_file_path, dtype=np.float32)
    cpu_output = cpu_output.reshape(1,1000)
    
    # np.argmax gives the cls_id with highest probability from tensor.
    
    cls_id_htp = np.argmax(htp_output)
    cls_id_cpu = np.argmax(cpu_output)
    
    # Let's compare CPU output vs HTP output
    
    print("Cpu prediction {} \n Htp Prediction {}".format(cls_id_cpu, cls_id_htp))
    
Output:
(qairt) ██████████: ~$ python compare.py

CPU prediction 879
HTP prediction 21

Deployment using SNPE APIs

SNPE SDK provides C/C++ APIs to create/develop applications that run a model on chosen hardware (CPU, GPU, or HTP) with acceleration. See the sample application that demonstrates SNPE C/C++ APIs.

Deployment using Qualcomm Intelligent Multimedia SDK (IM SDK)

To improve the developer experience when building entire use case pipelines (stream from camera, preprocess images, perform inference, etc.), SNPE has been integrated into the Qualcomm IM SDK as a plugin (qtimlsnpe).The plugin has been developed on top of the SNPE C APIs and provides SNPE capabilities (load and run models). With the qtimlsnpe plugin, you can use your converted model DLC in a Qualcomm IM SDK pipeline to realize the use case.See the Qualcomm IM SDK overview for instructions on how to deploy SNPE DLC using the Qualcomm IM SDK.