AI Workflow Overview - Qualcomm Dragonwing Documentation

This section outlines a modular, hands-on approach to AI development using Qualcomm®-supported tools, runtimes, and frameworks.
Whether you’re training models, deploying pre-trained networks, or building multimodal AI workflows, this guide offers a modular, hands-on approach. The document covers:

Model creation and training with Edge Impulse and Qualcomm® AI Hub
Model conversion and inference using TensorFlow, LiteRT / TensorFlow Lite, and ONNX Runtime with NPU acceleration
Running optimized AI models via context binaries (.bin) and DLC (.dlc) files using Qualcomm AI tools
Local execution of large language and vision-language models using Llama.cpp
Deployment of LLM/VLM workloads using a containerized OpenAI‑compatible API service
Workflow orchestration and multimodal AI pipelines with Qualcomm® Genie
Speech transcription, translation, and language identification using Whisper on NPU or CPU
Sample applications and vision pipelines using Qualcomm® IMSDK
Robotics and intelligent system development using Qualcomm® QIRP SDK

Each section is designed to be standalone, so you can jump directly into the tools and flows that match your project needs. The goal is to provide clear, reusable examples and practical insights for integrating AI into real-world edge applications.

Choose your journey

Use the flowchart below to find the path that fits your application. It walks you through whether you already have a model, how to prepare it, and how to run inference, then links you to the right Ubuntu workflow page. Every highlighted box is clickable.

On-device generative AI availability depends on your Dragonwing development board and model. Start with LLMs using Genie, and use Llama.cpp as a fallback where Genie model support is not available.

Application Development & Execution Flow Summary

Flow	Purpose
Edge Impulse	Build and train AI models using audio, image and other sensor data - or bringing your own model in a variety of formats.
Qualcomm® AI Hub	Qualcomm® AI Hub simplifies deploying AI models for vision, audio, and speech applications to edge devices. You can optimize, validate, and deploy your own AI models on hosted Qualcomm platform devices within minutes.
Convert TensorFlow models	Quantize and convert TensorFlow/Keras models (.keras, .h5) to `.tflite` format for NPU deployment.
Run LiteRT/TFLite models	Execute `.tflite` models on the NPU (Python or C++) using AI Engine Direct delegates. Works with models from TensorFlow, AI Hub, or Edge Impulse.
ONNX	ONNX enables cross-platform AI deployment by exporting models. On Dragonwing devices, ONNX Runtime with AI Engine Direct allows execution on the NPU for maximum performance.
Run Context Binaries	Context binaries (.bin) and .dlc files are used by Qualcomm AI tools such as Genie, VoiceAI ASR, and QAI AppBuilder to run optimized AI models efficiently on target hardware.
Llama.cpp	Execute large language models locally using a C++ backend optimized for GPU and quantized formats.
Qualcomm® Genie	Orchestrate AI microservices and multimodal workflows using Qualcomm’s generative AI runtime.
Whisper	Enables speech transcription, translation, and language identification on Dragonwing using NPU (VoiceAI ASR) or CPU (whisper.cpp).
Qualcomm® IMSDK	Qualcomm IMSDK is a multimedia and AI SDK for building high-performance vision pipelines on Qualcomm Linux platforms.It includes GStreamer plugins, AI runtime integration, and messaging support to accelerate robotics, surveillance, and embedded AI development.

​Choose your journey

​Application Development & Execution Flow Summary

Choose your journey

Application Development & Execution Flow Summary