> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Convert and quantize AI models

> Convert and quantize AI models from PyTorch, TensorFlow, ONNX, or LiteRT for deployment on Qualcomm hardware using SNPE or QNN tools.

<Tabs>
  <Tab title="Neural Processing Engine">
    ## Port a model using Qualcomm Neural Processing Engine SDK

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-model-porting-flow.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=fe4f06da8bfcb05e031d8e3f1af56bff" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-model-porting-flow.png" />
    </Frame>

    ### Model conversion

    A pretrained floating point, 32-bit precision model from PyTorch, ONNX, TensorFlow, or TFLite is input to SNPE converter tools (`snpe-<framework>-to-dlc`) to convert the model to a Qualcomm-specific intermediate representation of the model called a deep learning container (DLC).

    In addition to the input model from a source framework, the converters require additional details about the input model, such as the input node name, its corresponding input dimensions, and any output tensor names (for models with multiple outputs).

    Refer to [converters](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_tools.html) for all available configurable parameters or see the command line help by running:

    ```shell theme={null}
    snpe-<framework>-to-dlc --help
    ```

    **Output:**

    ```
    required arguments:

    -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
        The names and dimensions of the network input layers specified in the format
        [input_name comma-separated-dimensions], for example:
        'data' 1,224,224,3
         Note that the quotes should always be included in order to handle special
         characters, spaces, etc.
         For multiple inputs specify multiple --input_dim on the command line like:
         --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
    --out_node OUT_NAMES, --out_name OUT_NAMES
         Name of the graph's output Tensor Names. Multiple output names should be
         provided separately like:
         --out_name out_1 --out_name out_2
    --input_network INPUT_NETWORK, -i INPUT_NETWORK
         Path to the source framework model.
    ```

    <Note>
      If the `yaml` package is not present in your working environment, install it using the following command:

      ```shell theme={null}
      pip install pyyaml
      ```
    </Note>

    The following example uses an ONNX model (*inception\_v3\_opset16.onnx*) downloaded from the [ONNX Model Zoo](https://github.com/onnx/models/blob/main/Computer_Vision/inception_v3_Opset16_timm/inception_v3_Opset16.onnx).

    Download the model as `inception_v3.onnx` to your workspace. In this example, we download the model to the `~/models` directory.

    Run the following command to generate the `inception_v3.dlc` model.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-onnx-to-dlc --input_network ~/models/inception_v3.onnx --output_path ~/models/inception_v3.dlc --input_dim 'x' 1,3,299,299
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-model-conversion.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=ef6a19676caaf9eae17a32cb56701f42" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-model-conversion.png" />
    </Frame>

    ### Model quantization

    To run a model on Hexagon Tensor Processor (HTP), the converted DLC must be quantized. SNPE offers a tool (`snpe-dlc-quant`) to quantize a DLC model to INT8/INT16 DLC using its own quantization algorithm. For more information about SNPE quantization, see [Quantized models](https://docs.qualcomm.com/doc/80-63442-10/topic/quantized_models.html).

    The quantization process in SNPE requires two steps:

    1. **Quantization of weights and biases within the model.**

       Quantization of weights and biases is a static step, i.e., no additional input data is required from the user.

    2. **Quantization of activation layers (or layers with no weights).**

       Quantizing activation layers requires a set of input images from a training dataset as calibration data. These calibration dataset images are input as a list of preprocessed image files in `.raw` format. The file sizes of these input `.raw` files must match the input size of the model.

    Inputs to `snpe-dlc-quant` are a converted DLC model and a plain text file with the paths to the calibration dataset images. This input list holds paths to preprocessed images saved as NumPy arrays in `.raw` format. The size of the preprocessed image must match the input resolution of the model.

    The output of the `snpe-dlc-quant` tool is a quantized DLC.

    ```
    [ --input_dlc=<val> ]
                 Path to the dlc container
                 containing the model for which fixed-point encoding metadata should be generated.
                 This argument is required.
    [ --input_list=<val> ] Path to a file
                 specifying the trial inputs. This file should be a plain text file, containing one
                 or more absolute file paths per line. These files will be taken to constitute the
                 trial set. Each path is expected to point to a binary file containing one trial
                 input in the 'raw' format, ready to be consumed by the tool without any further
                 modifications. This is similar to how input is provided to snpe-net-run
                 application.
    [ --output_dlc=<val> ] Path at which the
                 metadata-included quantized model container should be written. If this argument is
                 omitted, the quantized model will be written at
                 <unquantized_model_name>_quantized.dlc.
    ```

    <Note>
      Use [Netron](https://github.com/lutzroeder/netron) graph visualization tool to identify the model's input/output layer dimensions.
    </Note>

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-model-quant-properties.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=4df5f96b82cdc7519d0a0b6289114f6e" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-model-quant-properties.png" />
    </Frame>

    For demo purposes, we can evaluate the quantization process with random input files. The input file can be generated using a simple Python script shown below for the `inception_v3.onnx` model. Save the script as `generate_random_input.py` in your workspace `~/models/` directory and run it using `python ~/models/generate_random_input.py` on your host computer.

    The following example Python code creates an input\_list that holds paths to calibration dataset images used to quantize the model.

    ```python theme={null}
    import os
    import numpy as np

    input_path_list =[]
    BASE_PATH = "/tmp/RandomInputsForInceptionV3"

    if not os.path.exists(BASE_PATH):
        os.mkdir(BASE_PATH)

    # generate 10 random inputs and save as raw
    NUM_IMAGES = 10

    #binary files
    for img in range(NUM_IMAGES):
        filename = "input_{}.raw".format(img)
        randomTensor = np.random.random((1, 299, 299, 3)).astype(np.float32)
        filename = os.path.join(BASE_PATH, filename)
        randomTensor.tofile(filename)
        input_path_list.append(filename)

    #for saving as input_list text
    with open("input_list.txt", "w") as f:
        for path in input_path_list:
            f.write(path)
            f.write("\n")
    ```

    The above script generates 10 sample input files saved in the `/tmp/RandomInputsForInceptionV3/` directory and an `input_list.txt` file that contains the path to each sample generated.

    Now that all needed inputs to the `snpe-dlc-quant` tool are available, the model can be quantized.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quant --input_dlc ~/models/inception_v3.dlc --output_dlc ~/models/inception_v3_quantized.dlc --input_list ~/models/input_list.txt
    ```

    This generates a quantized inception\_v3 DLC model (`inception_v3_quantized.dlc`). By default, the model is quantized for INT8 bit width.

    Customize the quantization to use 16-bit instead of default INT8 by specifying the `--act_bitwidth 16` and/or `--weights_bitwidth 16` options to the `snpe-dlc-quant` tool.

    Refer to the [snpe-dlc-quant](https://docs.qualcomm.com/doc/80-63442-10/topic/SNPE_general_tools.html#snpe-dlc-quant) tool documentation, or run `snpe-dlc-quant --help` to view all available customizations including quantization modes, optimizations, etc.

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-model-quantization.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=a724f830bf9d7552eace6c8c10c7e0e5" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-model-quantization.png" />
    </Frame>

    #### Model optimization

    Quantized model DLC requires a graph preparation step that optimizes the model for execution on HTP. To prepare the model DLC to execute on HTP, SNPE provides a `snpe-dlc-graph-prepare` tool that takes a quantized model and hardware-specific details, such as chipset, as input.

    <Note>
      Optimizations for hardware, such as HTP, depend on the specific version of HTP present on the chipset. To ensure the correct set of optimizations are applied to the execution graph for optimal utilization of the HTP, it is important to provide the correct chipset ID to the `snpe-dlc-graph-prepare` tool.
    </Note>

    Based on the HTP version and chipset ID, the tool creates a cache that contains an execution strategy to execute model DLC on the HTP hardware. Without this step, there will be additional overhead during network initialization as the SNPE runtime will have to create an execution strategy on the fly.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-dlc-graph-prepare --input_dlc ~/models/inception_v3_quantized.dlc --output_dlc ~/models/inception_v3_quantized_with_htp_cache.dlc --htp_socs qcs6490
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-model-optimization.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=fe7de1520a22727014afbda949ddb6ff" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-model-optimization.png" />
    </Frame>

    #### HTP cache information

    Once the `snpe-dlc-graph-prepare` step is completed, the HTP cache record is added to the DLC. This cache information can be viewed using the `snpe-dlc-info` tool.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/snpe-dlc-info -i ~/models/inception_v3_quantized_with_htp_cache.dlc
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/snpe-cache.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=71a85bb8e4ca9d677375f69aea5c2766" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/snpe-cache.png" />
    </Frame>
  </Tab>

  <Tab title="AI Engine Direct">
    ## Port a model using AI Engine Direct

    ### Model conversion and quantization

    A pretrained FP32 model from PyTorch, ONNX, TensorFlow, or TFLite is input to the QNN converter tool (`qnn-<framework>-converter`) to convert to a QNN graph representation in the form of a high-level readable C++ graph.

    When accelerating the model on HTP, the model must be quantized. Model quantization can be done in the same step as conversion. A calibration dataset must be provided to perform this quantization step to perform static quantization.

    To enable quantization along with conversion, use the `--input_list INPUT_LIST` option for static quantization.

    For more information, see [quantization support](https://docs.qualcomm.com/doc/80-63442-10/topic/quantization.html).

    The following example uses an ONNX model (`inception_v3_opset16.onnx`) downloaded from the [ONNX Model Zoo](https://github.com/onnx/models/blob/main/Computer_Vision/inception_v3_Opset16_timm/inception_v3_Opset16.onnx).

    Download the model as `inception_v3.onnx` to your workspace. In this example, the model is downloaded to the `~/models` directory.

    #### Model conversion: CPU backend

    To convert the model to run on x86/Arm-based CPU, run the following command to generate `inception_v3.cpp` and `inception_v3.bin`.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter --input_network ~/models/inception_v3.onnx --output_path ~/models/inception_v3.cpp --input_dim 'x' 1,3,299,299
    ```

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/qnn-conversion-output.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=2ddaf9dc7a08bc525626c3f913e0af72" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/qnn-conversion-output.png" />
    </Frame>

    The `inception_v3.cpp` file contains a high-level graph representation of the converted model.

    The `inception_v3.bin` file contains weights/biases from the model.

    #### Model conversion and quantization: HTP backend

    To run the model on HTP, the quantization step is required.

    For quantization in AI Engine Direct (QNN) SDK, a representative dataset of 50 to 200 images from a training dataset are provided to the QNN converter as a calibration dataset. The images in the calibration dataset are preprocessed (resized, normalized, etc.) and saved as NumPy arrays in `.raw` format. The size of these input `.raw` files must match the input size of the model.

    <Note>
      Use the Netron graph visualization tool to identify the model's input/output layer dimensions.
    </Note>

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/qnn-conversion-model-props.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=7cbbb9024d52a2353789c264ff29cd41" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/qnn-conversion-model-props.png" />
    </Frame>

    For demonstration purposes, you can evaluate the quantization process with random input files. The input files can be generated using the Python script shown below for the `inception_v3.onnx` model. Save the script as `generate_random_input.py` in the `~/models/` directory and run it using `python ~/models/generate_random_input.py`.

    The following Python code creates an input\_list that contains the calibration dataset used to quantize the model.

    ```python theme={null}
    import os
    import numpy as np

    input_path_list =[]

    BASE_PATH = "/tmp/RandomInputsForInceptionV3"

    if not os.path.exists(BASE_PATH):
        os.mkdir(BASE_PATH)

    # generate 10 random inputs and save as raw
    NUM_IMAGES = 10

    #binary files
    for img in range(NUM_IMAGES):
        filename = "input_{}.raw".format(img)
        randomTensor = np.random.random((1, 299, 299, 3)).astype(np.float32)
        filename = os.path.join(BASE_PATH, filename)
        randomTensor.tofile(filename)
        input_path_list.append(filename)

    #for saving as input_list text
    with open("input_list.txt", "w") as f:
        for path in input_path_list:
            f.write(path)
            f.write("\n")
    ```

    Run the following command to convert and quantize.

    By default, the model is quantized for INT8 bit width. You can specify `--act_bitwidth 16` and/or `--weights_bitwidth 16` to use INT16 quantization.

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter --input_network ~/models/inception_v3.onnx --output_path ~/models/inception_v3_quantized.cpp --input_list ~/models/input_list.txt --input_dim "x" 1,3,299,299
    ```

    This generates `inception_v3_quantized.cpp` and `inception_v3_quantized.bin` files in the `~/models/` directory.

    See [qnn-\<framework>-converter](https://docs.qualcomm.com/doc/80-63442-10/topic/general_tools.html) or run `qnn-<framework>-converter --help` to view all available customizations to quantization, including quantization modes, optimizations, etc.

    ### Model compilation

    Once the conversion/quantization step is complete, `qnn-model-lib-generator` is used to compile the generated C++ graph into a shared object (.so) enabling the model to be dynamically loaded by an application to perform inference.

    For x86, the Clang compiler toolchain is used to compile the C++ graph into a .so library. For a Linux Embedded device, such as Qualcomm Dragonwing™ RB3 Gen 2 and IQ-9075, the appropriate compiler toolchain (`aarch64-oe-linux-gcc11.2`) must be used.

    #### Compiling a model to run on x86

    1. Install the GNU standard C++ library development package for GCC version 12.

       ```shell theme={null}
       sudo apt install libstdc++-12-dev
       ```

       This package includes standard library headers (like `<limits>`, `<vector>`, and `<string>`), static libraries, and support files necessary to compile and link C++ programs.

    2. Generate a shared object model to run on an x86-based Linux machine.

       ```shell theme={null}
       ${QAIRT_ROOT}/bin/x86_64-linux-clang/qnn-model-lib-generator -c ~/models/inception_v3.cpp -b ~/models/inception_v3.bin -o ~/models/libs/ -t x86_64-linux-clang
       ```

       This generates `inception_v3.so` using the Clang-14 compiler toolchain to compile the C++ graph to a QNN model `.so` compatible with the x86 host computer.

    <Frame>
      <img src="https://mintcdn.com/qualcomm-prod/WwC9kmcnKl9Ef7de/Key-Documents/AI-Developer-Workflow/_images/qnn-port-x86.png?fit=max&auto=format&n=WwC9kmcnKl9Ef7de&q=85&s=65188e2dda9de1b73a5e90102e852d04" width="55%" data-path="Key-Documents/AI-Developer-Workflow/_images/qnn-port-x86.png" />
    </Frame>

    #### Compiling a model to run on target

    When compiling a model for on-device execution (aarch64 architecture), it is important to use the right cross-compiler toolchain to ensure the compiled shared object (.so) is compatible with the device OS.

    The following steps install the cross-compiler toolchain required to compile a model cpp file to a .so library. Instructions to install the appropriate cross-compiler toolchain are available under [Download and install the Platform SDK](https://dragonwingdocs.qualcomm.com/Key-Documents/Flash-Guide/obtain-prebuilts#esdk).

    After installing the platform SDK, setup the cross-compiler environment in a new command line terminal.

    1. Source the environment setup script under `$SDK_ROOT`.

       ```shell theme={null}
       source $SDK_ROOT/environment-setup-armv8a-qcom-linux
       ```

    2. Check if the environment is properly setup.

       ```shell theme={null}
       echo $SDKTARGETSYSROOT
       ```

       ```shell theme={null}
       echo $TARGET_PREFIX
       ```

       If the above environmental variables were not populated, repeat Step 1 in a new command line terminal.

    3. Setup the QAIRT environment.

       ```shell theme={null}
       source ${QAIRT_ROOT}/bin/envsetup.sh
       ```

    #### Compiling a model to run on Arm-based CPU

    Once the cross-compiler is setup, use the following command to generate `libinception_v3.so` in `~/model/libs/aarch64-oe-linux-gcc11.2`. Provide this location to the `qnn-model-lib-generator` tool through a command line argument.

    <Note>
      The compiler toolchain used here is `aarch64-oe-linux-gcc11.2`.
    </Note>

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/qnn-model-lib-generator -c ~/models/inception_v3.cpp -b ~/models/inception_v3.bin -o ~/models/libs -t aarch64-oe-linux-gcc11.2
    ```

    #### Compiling a model to run on HTP

    To run the model on HTP, the following command generates `libinception_v3_quantized.so` in `~/models/libs/aarch64-oe-linux-gcc11.2`.

    <Note>
      The compiler toolchain used here is `aarch64-oe-linux-gcc11.2`.
    </Note>

    ```shell theme={null}
    ${QAIRT_ROOT}/bin/x86_64-linux-clang/qnn-model-lib-generator -c ~/models/inception_v3_quantized.cpp -b ~/models/inception_v3_quantized.bin -o ~/models/libs/ -t aarch64-oe-linux-gcc11.2
    ```
  </Tab>
</Tabs>
