Benchmark a LiteRT model - Qualcomm Dragonwing Documentation

The LiteRT open-source framework provides the benchmark_model tool to measure model execution performance on hardware using delegates. This tool is installed on the target device along with other LiteRT artifacts. The tool measures and reports the following performance metrics:

Initialization time
Inference time (warm-up and steady state)
Memory usage during initialization
Overall memory usage

Prerequisites

Before running the benchmark, ensure you have the following:

An Ubuntu 22.04 host computer
A Qualcomm development kit

Set up model files

Download the sample model, label files, and a test image:
- BMP test image
- MobileNet LiteRT model

On the host computer, download and extract the MobileNet model archives:

wget http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz

tar -xvf mobilenet_v1_1.0_224_quant.tgz

wget https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz

tar -xvf mobilenet_v1_1.0_224_frozen.tgz

On the target device, create the artifacts directory:

ssh root@<IP_ADDRESS_OF_TARGET_DEVICE>

mount -o remount,rw /

mkdir -p /etc/artifacts

exit

From the host computer, copy the model, image, and label files to the device:

scp mobilenet_v1_1.0_224_quant.tflite grace_hopper.bmp mobilenet_v1_1.0_224/labels.txt mobilenet_v1_1.0_224.tflite root@<IP_ADDRESS_OF_TARGET_DEVICE>:/etc/artifacts

On the target device, set up the GPU libraries:

export OCL_ICD_FILENAMES=/usr/lib/libOpenCL_adreno.so.1

ln -sf /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so

The sample applications use the MobileNet v1 model, trained on the ImageNet dataset with 1000 classes, as an example classification model.

Benchmark on CPU

The label_image sample application is cross-compiled with the LiteRT library and installed on the target device. The source code is available on the TensorFlow GitHub repository. To benchmark using the XNNPACK delegate on the CPU:

ssh root@<IP_ADDRESS_OF_TARGET_DEVICE>

cd /etc/artifacts

benchmark_model --graph=/etc/artifacts/mobilenet_v1_1.0_224_quant.tflite \
                --enable_op_profiling=true \
                --use_xnnpack=true \
                --num_threads=4 \
                --max_secs=300 \
                --profiling_output_csv_file=/etc/artifacts/mobilenet_v1_1.0_224_quant_xnnpack_performance.csv

Benchmark on GPU

To benchmark using the GPU delegate:

ssh root@<IP_ADDRESS_OF_TARGET_DEVICE>

cd /etc/artifacts

benchmark_model --graph=/etc/artifacts/mobilenet_v1_1.0_224_quant.tflite \
                --enable_op_profiling=true \
                --use_gpu=true \
                --num_runs=100 \
                --warmup_runs=10 \
                --max_secs=300 \
                --profiling_output_csv_file=/etc/artifacts/mobilenet_v1_1.0_224_GPU_Delegate_performance.csv

Benchmark on NPU using the QAIRT delegate

The Qualcomm AI Runtime delegate uses the Qualcomm AI Runtime API and its backends to accelerate models on the Adreno GPU and the Hexagon Tensor Processor. To use the QAIRT external delegate, ensure the following libraries are available on the device:

libQnnTFLiteDelegate.so — QNN delegate library
Libraries from the Qualcomm AI Engine Direct SDK

You can customize model execution to use a specific backend through external delegate options:

libQnnGpu.so — Run the QNN delegate on the GPU
libQnnHtp.so — Run the QNN delegate on the Hexagon Tensor Processor
libQnnDsp.so — Run the QNN delegate on the DSP

To benchmark on the Hexagon Tensor Processor using the QNN external delegate:

QCS6490/QCS5430, IQ-9075, and QCS8275
IQ-615

benchmark_model --graph=/etc/artifacts/mobilenet_v1_1.0_224_quant.tflite \
                --external_delegate_path=/usr/lib/libQnnTFLiteDelegate.so \
                --external_delegate_options='backend_type:htp;library_path:/usr/lib/libQnnHtp.so;skel_library_dir:/usr/lib/rfsa/adsp;htp_precision:0;htp_performance_mode:2'

benchmark_model --graph=/usr/share/label_image/mobilenet_v1_1.0_224_quant.tflite \
                --external_delegate_path=libQnnTFLiteDelegate.so \
                --external_delegate_options='backend_type:dsp;library_path:/usr/lib/libQnnDsp.so;skel_library_dir:/usr/lib/dsp/adsp'

Benchmarking is currently failing on IQ-615.

The benchmark output includes the following statistics:

Delegate creation status
Average inference time on the hardware using the delegate
Memory footprint of the model execution

benchmark_model tool output showing delegate status, inference time, and memory footprint

​Prerequisites

​Set up model files

​Benchmark on CPU

​Benchmark on GPU

​Benchmark on NPU using the QAIRT delegate

Prerequisites

Set up model files

Benchmark on CPU

Benchmark on GPU

Benchmark on NPU using the QAIRT delegate