Skip to main content
This section outlines a modular, hands-on approach to AI development using Qualcomm®-supported tools, runtimes, and frameworks.
Whether you’re training models, deploying pre-trained networks, or building multimodal AI workflows, this guide offers a modular, hands-on approach.
The document covers:
  • Model creation and training with Edge Impulse and Qualcomm® AI Hub
  • Model conversion and inference using TensorFlow, LiteRT / TensorFlow Lite, and ONNX Runtime with NPU acceleration
  • Running optimized AI models via context binaries (.bin) and DLC (.dlc) files using Qualcomm AI tools
  • Local execution of large language and vision-language models using Llama.cpp
  • Deployment of LLM/VLM workloads using a containerized OpenAI‑compatible API service
  • Workflow orchestration and multimodal AI pipelines with Qualcomm® Genie
  • Speech transcription, translation, and language identification using Whisper on NPU or CPU
  • Sample applications and vision pipelines using Qualcomm® IMSDK
  • Robotics and intelligent system development using Qualcomm® QIRP SDK
Each section is designed to be standalone, so you can jump directly into the tools and flows that match your project needs. The goal is to provide clear, reusable examples and practical insights for integrating AI into real-world edge applications.

Choose your journey

Use the flowchart below to find the path that fits your application. It walks you through whether you already have a model, how to prepare it, and how to run inference, then links you to the right Ubuntu workflow page. Every highlighted box is clickable.
On-device generative AI availability depends on your Dragonwing development board and model. Start with LLMs using Genie, and use Llama.cpp as a fallback where Genie model support is not available.

Application Development & Execution Flow Summary

FlowPurpose
Edge ImpulseBuild and train AI models using audio, image and other sensor data - or bringing your own model in a variety of formats.
Qualcomm® AI HubQualcomm® AI Hub simplifies deploying AI models for vision, audio, and speech applications to edge devices. You can optimize, validate, and deploy your own AI models on hosted Qualcomm platform devices within minutes.
Convert TensorFlow modelsQuantize and convert TensorFlow/Keras models (.keras, .h5) to .tflite format for NPU deployment.
Run LiteRT/TFLite modelsExecute .tflite models on the NPU (Python or C++) using AI Engine Direct delegates. Works with models from TensorFlow, AI Hub, or Edge Impulse.
ONNXONNX enables cross-platform AI deployment by exporting models. On Dragonwing devices, ONNX Runtime with AI Engine Direct allows execution on the NPU for maximum performance.
Run Context BinariesContext binaries (.bin) and .dlc files are used by Qualcomm AI tools such as Genie, VoiceAI ASR, and QAI AppBuilder to run optimized AI models efficiently on target hardware.
Llama.cppExecute large language models locally using a C++ backend optimized for GPU and quantized formats.
Qualcomm® GenieOrchestrate AI microservices and multimodal workflows using Qualcomm’s generative AI runtime.
WhisperEnables speech transcription, translation, and language identification on Dragonwing using NPU (VoiceAI ASR) or CPU (whisper.cpp).
Qualcomm® IMSDKQualcomm IMSDK is a multimedia and AI SDK for building high-performance vision pipelines on Qualcomm Linux platforms.It includes GStreamer plugins, AI runtime integration, and messaging support to accelerate robotics, surveillance, and embedded AI development.