Skip to main content

Port a model using Qualcomm Neural Processing Engine SDK

Model conversion

A pretrained floating point, 32-bit precision model from PyTorch, ONNX, TensorFlow, or TFLite is input to SNPE converter tools (snpe-<framework>-to-dlc) to convert the model to a Qualcomm-specific intermediate representation of the model called a deep learning container (DLC).In addition to the input model from a source framework, the converters require additional details about the input model, such as the input node name, its corresponding input dimensions, and any output tensor names (for models with multiple outputs).Refer to converters for all available configurable parameters or see the command line help by running:
snpe-<framework>-to-dlc --help
Output:
required arguments:

-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
    The names and dimensions of the network input layers specified in the format
    [input_name comma-separated-dimensions], for example:
    'data' 1,224,224,3
     Note that the quotes should always be included in order to handle special
     characters, spaces, etc.
     For multiple inputs specify multiple --input_dim on the command line like:
     --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
--out_node OUT_NAMES, --out_name OUT_NAMES
     Name of the graph's output Tensor Names. Multiple output names should be
     provided separately like:
     --out_name out_1 --out_name out_2
--input_network INPUT_NETWORK, -i INPUT_NETWORK
     Path to the source framework model.
If the yaml package is not present in your working environment, install it using the following command:
pip install pyyaml
The following example uses an ONNX model (inception_v3_opset16.onnx) downloaded from the ONNX Model Zoo.Download the model as inception_v3.onnx to your workspace. In this example, we download the model to the ~/models directory.Run the following command to generate the inception_v3.dlc model.
${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-onnx-to-dlc --input_network ~/models/inception_v3.onnx --output_path ~/models/inception_v3.dlc --input_dim 'x' 1,3,299,299

Model quantization

To run a model on Hexagon Tensor Processor (HTP), the converted DLC must be quantized. SNPE offers a tool (snpe-dlc-quant) to quantize a DLC model to INT8/INT16 DLC using its own quantization algorithm. For more information about SNPE quantization, see Quantized models.The quantization process in SNPE requires two steps:
  1. Quantization of weights and biases within the model. Quantization of weights and biases is a static step, i.e., no additional input data is required from the user.
  2. Quantization of activation layers (or layers with no weights). Quantizing activation layers requires a set of input images from a training dataset as calibration data. These calibration dataset images are input as a list of preprocessed image files in .raw format. The file sizes of these input .raw files must match the input size of the model.
Inputs to snpe-dlc-quant are a converted DLC model and a plain text file with the paths to the calibration dataset images. This input list holds paths to preprocessed images saved as NumPy arrays in .raw format. The size of the preprocessed image must match the input resolution of the model.The output of the snpe-dlc-quant tool is a quantized DLC.
[ --input_dlc=<val> ]
             Path to the dlc container
             containing the model for which fixed-point encoding metadata should be generated.
             This argument is required.
[ --input_list=<val> ] Path to a file
             specifying the trial inputs. This file should be a plain text file, containing one
             or more absolute file paths per line. These files will be taken to constitute the
             trial set. Each path is expected to point to a binary file containing one trial
             input in the 'raw' format, ready to be consumed by the tool without any further
             modifications. This is similar to how input is provided to snpe-net-run
             application.
[ --output_dlc=<val> ] Path at which the
             metadata-included quantized model container should be written. If this argument is
             omitted, the quantized model will be written at
             <unquantized_model_name>_quantized.dlc.
Use Netron graph visualization tool to identify the model’s input/output layer dimensions.
For demo purposes, we can evaluate the quantization process with random input files. The input file can be generated using a simple Python script shown below for the inception_v3.onnx model. Save the script as generate_random_input.py in your workspace ~/models/ directory and run it using python ~/models/generate_random_input.py on your host computer.The following example Python code creates an input_list that holds paths to calibration dataset images used to quantize the model.
import os
import numpy as np

input_path_list =[]
BASE_PATH = "/tmp/RandomInputsForInceptionV3"

if not os.path.exists(BASE_PATH):
    os.mkdir(BASE_PATH)

# generate 10 random inputs and save as raw
NUM_IMAGES = 10

#binary files
for img in range(NUM_IMAGES):
    filename = "input_{}.raw".format(img)
    randomTensor = np.random.random((1, 299, 299, 3)).astype(np.float32)
    filename = os.path.join(BASE_PATH, filename)
    randomTensor.tofile(filename)
    input_path_list.append(filename)

#for saving as input_list text
with open("input_list.txt", "w") as f:
    for path in input_path_list:
        f.write(path)
        f.write("\n")
The above script generates 10 sample input files saved in the /tmp/RandomInputsForInceptionV3/ directory and an input_list.txt file that contains the path to each sample generated.Now that all needed inputs to the snpe-dlc-quant tool are available, the model can be quantized.
${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quant --input_dlc ~/models/inception_v3.dlc --output_dlc ~/models/inception_v3_quantized.dlc --input_list ~/models/input_list.txt
This generates a quantized inception_v3 DLC model (inception_v3_quantized.dlc). By default, the model is quantized for INT8 bit width.Customize the quantization to use 16-bit instead of default INT8 by specifying the --act_bitwidth 16 and/or --weights_bitwidth 16 options to the snpe-dlc-quant tool.Refer to the snpe-dlc-quant tool documentation, or run snpe-dlc-quant --help to view all available customizations including quantization modes, optimizations, etc.

Model optimization

Quantized model DLC requires a graph preparation step that optimizes the model for execution on HTP. To prepare the model DLC to execute on HTP, SNPE provides a snpe-dlc-graph-prepare tool that takes a quantized model and hardware-specific details, such as chipset, as input.
Optimizations for hardware, such as HTP, depend on the specific version of HTP present on the chipset. To ensure the correct set of optimizations are applied to the execution graph for optimal utilization of the HTP, it is important to provide the correct chipset ID to the snpe-dlc-graph-prepare tool.
Based on the HTP version and chipset ID, the tool creates a cache that contains an execution strategy to execute model DLC on the HTP hardware. Without this step, there will be additional overhead during network initialization as the SNPE runtime will have to create an execution strategy on the fly.
${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-dlc-graph-prepare --input_dlc ~/models/inception_v3_quantized.dlc --output_dlc ~/models/inception_v3_quantized_with_htp_cache.dlc --htp_socs qcs6490

HTP cache information

Once the snpe-dlc-graph-prepare step is completed, the HTP cache record is added to the DLC. This cache information can be viewed using the snpe-dlc-info tool.
${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-dlc-info -i ~/models/inception_v3_quantized_with_htp_cache.dlc