| Execution method | Details | Key considerations | |||
|---|---|---|---|---|---|
| Generative AI Inference Extensions (Genie) | Qualcomm-provided framework designed for simplified execution of complex GenAI models such as LLMs and multimodal models. | Handles orchestration of multiple binaries, memory management, and hardware acceleration across CPU, GPU, and NPU. | Ideal for developers who want plug-and-play deployment with minimal integration effort. | Optimized for Qualcomm hardware, delivering low latency and power efficiency. | Best for quick deployment and production-ready applications. |
| Qualcomm AI Runtime SDK (QAIRT) | Provides low-level APIs for executing QAIRT binaries directly. | Offers fine-grained control for developers who need custom execution flows or profiling. | Suitable for advanced use cases where performance tuning is critical. | Optimized for Qualcomm hardware, delivering low latency and power efficiency. | Best for custom workflows and profiling. |
| Open-source runtimes | Models can also be executed using open-source runtimes. | This approach is useful for developers who want to maintain compatibility with existing open-source ecosystems while leveraging Qualcomm optimizations. | Open-source runtimes provide portability, but may require additional tuning for Qualcomm platforms. | Best for research or hybrid environments. |
Run a Generative AI (GenAI) model
Run a prepared GenAI model on a Qualcomm Dragonwing IoT device using Genie, the Qualcomm AI Runtime SDK, or open-source runtimes.
Once a GenAI model is prepared and optimized for deployment, you can run the model on the target device. Qualcomm platforms offer multiple execution paths to meet different needs, ranging from high-level abstractions to low-level control.
The following table summarizes the GenAI model execution approaches.
Prepare a GenAI model using a Jupyter notebookUse GenAI models with Qualcomm Generative AI (GenAI) Inference Extensions (Genie)

