CPU delegate
The XNNPACK delegate uses the XNNPACK library to accelerate LiteRT models on CPUs. XNNPACK is an open-source library from Google that:- Provides an optimized implementation of neural network operators for Arm CPUs.
- Uses low-level CPU instructions, such as the Arm® Neon™ instruction set, to optimize operators for efficient execution.
GPU delegate
The GPU open-source delegate accelerates LiteRT models on vendor-specific GPUs, including the Qualcomm Adreno GPU. It uses OpenCL kernels to run neural network operations within a LiteRT model execution graph on the GPU, improving parallel-processing performance. The GPU delegate supports the following model precisions on the Adreno GPU:- 16-bit floating-point
- 32-bit floating-point
HTP delegate
The Qualcomm AI Runtime delegate is a proprietary delegate designed for hardware acceleration on Qualcomm platforms. It is based on the external delegate interface of LiteRT and can offload part or all of a LiteRT model to specialized Qualcomm hardware, including the Adreno GPU and the NPU. This delegate improves model execution performance and power efficiency by reducing the CPU workload. It uses the existing Qualcomm AI Runtime APIs and available backends to accelerate models. For more information, see Qualcomm AI Runtime (QAIRT) SDK. The Qualcomm AI Runtime delegate supports both 32-bit floating-point and int8 precision on available hardware. You can build applications using the following interfaces:- Qualcomm AI Runtime delegate interface
- LiteRT external delegate interface
qtimltflite GStreamer plugin uses the QNN delegate. For more information, see Leverage external delegate.
