Run a Generative AI (GenAI) model - Qualcomm Dragonwing Documentation

Once a GenAI model is prepared and optimized for deployment, you can run the model on the target device. Qualcomm platforms offer multiple execution paths to meet different needs, ranging from high-level abstractions to low-level control. The following table summarizes the GenAI model execution approaches.

Execution method	Details	Key considerations
Generative AI Inference Extensions (Genie)	Qualcomm-provided framework designed for simplified execution of complex GenAI models such as LLMs and multimodal models.	Handles orchestration of multiple binaries, memory management, and hardware acceleration across CPU, GPU, and NPU.	Ideal for developers who want plug-and-play deployment with minimal integration effort.	Optimized for Qualcomm hardware, delivering low latency and power efficiency.	Best for quick deployment and production-ready applications.
Qualcomm AI Runtime SDK (QAIRT)	Provides low-level APIs for executing QAIRT binaries directly.	Offers fine-grained control for developers who need custom execution flows or profiling.	Suitable for advanced use cases where performance tuning is critical.	Optimized for Qualcomm hardware, delivering low latency and power efficiency.	Best for custom workflows and profiling.
Open-source runtimes	Models can also be executed using open-source runtimes.	This approach is useful for developers who want to maintain compatibility with existing open-source ecosystems while leveraging Qualcomm optimizations.	Open-source runtimes provide portability, but may require additional tuning for Qualcomm platforms.	Best for research or hybrid environments.