> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> An overview of the tools, runtimes, and frameworks for building AI applications on Qualcomm platforms, with a guided path for choosing how to prepare and run your model.

This guide covers AI application development using the tools, runtimes, and frameworks supported on Qualcomm Dragonwing IoT platforms. It is intended for developers who want to train or fine-tune models, prepare them for deployment, and build AI applications on Qualcomm hardware.

Most decisions in this guide come down to two questions:

* **How will you prepare your model?** Convert, quantize, compile, or fine-tune it for the target hardware.
* **How will you run inference?** The runtime or integration path that executes the model on the device.

You can bring pretrained models from ONNX, PyTorch, TensorFlow, or LiteRT and run them efficiently across the Qualcomm Kryo™ CPU, Adreno™ GPU, and Hexagon™ NPU (HTP).

## Choose your journey

Use the flowchart below to find the path that fits your application. It walks you through whether you already have a model, how to **prepare** it (purple), and how to **run inference** (blue), then links you to the right guide. Every highlighted box is clickable.

```mermaid theme={null}
flowchart TD
    Q1{"Do you already have<br/>a trained model?"}

    Q1 -->|"No, just show me what it can do"| RunPrebuilt["Run prebuilt AI models & apps<br/>LiteRT on NPU · Qdemo UI · IM SDK sample · QDC"]
    Q1 -->|"Yes / bring my own<br/>ONNX · PyTorch · TF · LiteRT"| GenQ{"Generative AI?<br/>LLM · diffusion"}

    GenQ -->|"Yes"| GenAI["GenAI workflow<br/>prepare → run → Genie"]
    GenQ -->|"No, classic ML / vision"| Prep{"Prepare the model"}

    Prep -->|"Ready model, or BYOM in the cloud"| AIHub["Qualcomm AI Hub<br/>ready models · BYOM<br/>(compile/convert in the cloud)"]
    Prep -->|"Convert + quantize my own (local)"| QAIRT["QAIRT SDK"]
    Prep -->|"Recover quantized accuracy"| AIMET["AIMET (PTQ / QAT)"]
    Prep -->|"Fine-tune with my own data"| EI["Edge Impulse"]

    AIHub --> Runtime{"Run inference"}
    QAIRT --> Runtime
    AIMET --> Runtime
    EI --> Runtime

    Runtime -->|"Python / quick"| LiteRT["LiteRT + AI Engine Direct delegate"]
    Runtime -->|"C++ / low-level control"| QAIRTcpp["QAIRT SDK C++ APIs"]
    Runtime -->|"Camera / video + AI pipeline"| IMSDK["Qualcomm IM SDK<br/>GStreamer · zero-copy · GPU pre/post"]

    click RunPrebuilt "/Key-Documents/AI-Developer-Workflow/map/run-prebuilt-models-and-apps"
    click GenAI "/Key-Documents/AI-Developer-Workflow/map/run-on-device-genai"
    click AIHub "/Key-Documents/AI-Developer-Workflow/topic/ai-hub"
    click QAIRT "/Key-Documents/AI-Developer-Workflow/topic/qairt"
    click AIMET "/Key-Documents/AI-Developer-Workflow/topic/aimet"
    click EI "/Key-Documents/AI-Developer-Workflow/topic/edge-impulse"
    click LiteRT "/Key-Documents/AI-Developer-Workflow/topic/litert-overview"
    click QAIRTcpp "/Key-Documents/AI-Developer-Workflow/topic/develop-your-own-application-qairt-cpp"
    click IMSDK "/Key-Documents/AI-Developer-Workflow/topic/develop-your-own-application-im-sdk"

    classDef prep fill:#31017D,stroke:#31017D,color:#fff;
    classDef run fill:#3253DC,stroke:#3253DC,color:#fff;
    class AIHub,QAIRT,AIMET,EI,GenAI prep;
    class RunPrebuilt,LiteRT,QAIRTcpp,IMSDK run;
```

<Note>
  On-device generative AI availability depends on your Qualcomm Linux release. See the [GenAI workflow](../map/run-on-device-genai) page for the current support status.
</Note>

## AI architecture

The following diagram illustrates the AI application development architecture on Qualcomm platforms.

<img src="https://mintcdn.com/qualcomm-prod/Sb9VrG0-ITL9uwLF/Key-Documents/AI-Developer-Workflow/_images/ai-app-development-overview_QLI.png?fit=max&auto=format&n=Sb9VrG0-ITL9uwLF&q=85&s=21a1be3bfc4cbddef11604fc19fd7beb" alt="AI/ML developer workflow architecture" width="781" height="930" data-path="Key-Documents/AI-Developer-Workflow/_images/ai-app-development-overview_QLI.png" />

## Prepare your model

Before a model runs efficiently on Qualcomm hardware, it is converted to an executable format and, for the Hexagon NPU (HTP), quantized to a supported precision. (LiteRT models are an exception: they run directly through the AI Engine Direct delegate.) Choose the preparation tool that matches your starting point.

| Tool                                                                                                          | Use it to                                                                                                                                                                                      | Output                                                 |
| ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
| [Qualcomm AI Hub](../topic/ai-hub)                                                                            | Download a preoptimized model, or **bring your own model (BYOM)** and have QAIRT **compile, convert, and quantize it in the cloud** for your target chipset, with no local toolchain required. | Ready-to-run LiteRT or Qualcomm AI Engine Direct model |
| [Qualcomm AI Runtime SDK (QAIRT)](../topic/qairt)                                                             | The **local** alternative to AI Hub BYOM: convert, quantize, and compile models from TensorFlow, PyTorch, LiteRT, or ONNX yourself. Integrates AI Engine Direct and the Neural Processing SDK. | Quantized model / compiled context binary              |
| [Qualcomm AI Model Efficiency Toolkit (AIMET)](https://quic.github.io/aimet-pages/releases/latest/index.html) | Recover accuracy lost during quantization using post-training quantization (PTQ) and quantization-aware training (QAT).                                                                        | Higher-accuracy quantized model                        |
| [Edge Impulse](../topic/edge-impulse)                                                                         | Build, train, or fine-tune models from your own audio, image, and sensor data.                                                                                                                 | Trained model in your chosen format                    |

## Run inference

After your model is prepared, choose how to execute it on the device. The runtime you pick depends on your language, model format, and whether you are building a full camera/video pipeline.

| Runtime / integration                                                                                        | Best for                                                                                                                                                                |
| ------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [LiteRT](../topic/litert-overview)                                                                           | High-performance on-device inference from Python or C++, using Qualcomm AI Engine Direct delegates.                                                                     |
| [QAIRT SDK C++ APIs](../topic/develop-your-own-application-qairt-cpp)                                        | Low-level C++ control over model execution and the inference backend.                                                                                                   |
| [Qualcomm Intelligent Multimedia SDK (IM SDK)](../topic/develop-your-own-application-im-sdk)                 | High-performance camera, video, and vision pipelines that combine capture, preprocessing, inference, and rendering, with zero-copy buffers and GPU pre/post-processing. |
| [Qualcomm® GenAI Inference Engine (Genie)](https://docs.qualcomm.com/doc/80-63442-10/topic/index_Genie.html) | Running and orchestrating on-device generative AI (LLMs, multimodal) workflows.                                                                                         |

<Note>
  Building a robotics application? The [Qualcomm® Intelligent Robotics (QIR) SDK](https://www.thundercomm.com/rubik-pi-3/en/docs/rubik-pi-3-user-manual/1.0.0-u/Application%20Development%20and%20Execution%20Guide/Robotics-Sample-Applications/Robotics%20Sample%20Applications/) adds ROS-based modules and hardware-accelerated nodes on top of these runtimes.
</Note>

## AI hardware

Qualcomm platforms include the following hardware accelerators for AI inference:

* **Qualcomm Kryo™ CPU** — High-performance CPU with best-in-class power efficiency.
* **Qualcomm Adreno™ GPU**: Balanced power and performance for AI workloads, accelerated with OpenCL kernels. Also used for model **pre- and post-processing** (for example, the IM SDK runs resize, color conversion, and overlay on the GPU).
* **Qualcomm Hexagon™ Tensor Processor (HTP)**: Also known as NPU/DSP/HMX. Optimized for low-power, high-performance AI **inference**. For best performance, quantize pretrained models to a supported precision.
