> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Supported LiteRT runtimes

> Run LiteRT models on Qualcomm Dragonwing IoT platforms using the CPU (XNNPACK), GPU, or NPU (Qualcomm AI Runtime) delegate.

LiteRT supports hardware acceleration through delegates, which offload graph execution to specialized hardware. The following delegates are available on Qualcomm Dragonwing IoT platforms.

## CPU delegate

The XNNPACK delegate uses the XNNPACK library to accelerate LiteRT models on CPUs. XNNPACK is an open-source library from Google that:

* Provides an optimized implementation of neural network operators for Arm CPUs.
* Uses low-level CPU instructions, such as the Arm® Neon™ instruction set, to optimize operators for efficient execution.

The XNNPACK delegate supports models in both 32-bit floating-point and int8 formats. For more information, see [XNNPACK back-end for TensorFlow Lite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md).

To run a LiteRT model using the XNNPACK delegate, see [Deploy a LiteRT model](../topic/deploy-a-litert-model#deploy-as-a-native-application).

## GPU delegate

The GPU open-source delegate accelerates LiteRT models on vendor-specific GPUs, including the Qualcomm Adreno GPU. It uses OpenCL kernels to run neural network operations within a LiteRT model execution graph on the GPU, improving parallel-processing performance.

The GPU delegate supports the following model precisions on the Adreno GPU:

* 16-bit floating-point
* 32-bit floating-point

For more information, see [GPU delegates for LiteRT](https://www.tensorflow.org/lite/performance/gpu).

To run a LiteRT model using the GPU delegate, see [Deploy a LiteRT model](../topic/deploy-a-litert-model#deploy-as-a-native-application).

## HTP delegate

The Qualcomm AI Runtime delegate is a proprietary delegate designed for hardware acceleration on Qualcomm platforms. It is based on the [external delegate interface](https://ai.google.dev/edge/litert/performance/implementing_delegate#option_2_leverage_external_delegate) of LiteRT and can offload part or all of a LiteRT model to specialized Qualcomm hardware, including the Adreno GPU and the NPU.

This delegate improves model execution performance and power efficiency by reducing the CPU workload. It uses the existing Qualcomm AI Runtime APIs and available backends to accelerate models. For more information, see [Qualcomm AI Runtime (QAIRT) SDK](https://docs.qualcomm.com/doc/80-63442-10).

The Qualcomm AI Runtime delegate supports both 32-bit floating-point and int8 precision on available hardware.

You can build applications using the following interfaces:

* Qualcomm AI Runtime delegate interface
* LiteRT external delegate interface

Both interfaces are available in standalone LiteRT applications. If you deploy LiteRT models using the Qualcomm IM SDK, the `qtimltflite` GStreamer plugin uses the QNN delegate. For more information, see [Leverage external delegate](https://ai.google.dev/edge/litert/performance/implementing_delegate#option_2_leverage_external_delegate).
