Some models from AI Hub are released as context binaries (.bin files) or as Deep Learning Container (.dlc) files. Context binaries contain the model, plus hardware optimizations; and can be run with Qualcomm tools that directly use the Qualcomm® AI Runtime SDK. Examples of this are Genie (to run LLMs) and VoiceAI ASR (to run voice transcription); but you can also run context binaries directly from Python using QAI AppBuilder. .dlc files are a portable representation that are converted to context binaries for your specific target at runtime.
.bin files are not portable: Context binaries (.bin) are not portable. They are tied to both the AI Engine Direct SDK version and your hardware target.
Finding supported models
Models in context binary format can be found in a few places:
-
Qualcomm AI Hub:
-
Under ‘Chipset’, select:
- RB3 Gen 2 Vision Kit: ‘Qualcomm QCS6490 (Proxy)’
- RUBIK Pi 3: ‘Qualcomm QCS6490 (Proxy)’
- IQ-9075 EVK: ‘Qualcomm QCS9075 (Proxy)’
-
Under ‘Runtime’, select “Qualcomm® AI Runtime”.
-
Aplux model zoo:
-
Under ‘Chipset’, select:
- RB3 Gen 2 Vision Kit: ‘Qualcomm QCS6490’
- RUBIK Pi 3: ‘Qualcomm QCS6490’
- IQ-9075 EVK: ‘Qualcomm QCS9075’
Note that the NPU only supports quantized models. Floating point models (or layers) will be automatically moved back to the CPU.
Example: Inception-v3 (Python)
Here’s how you can run an image classification model (downloaded from AI Hub) on the NPU using QAI AppBuilder. Open the terminal on your development board, or an SSH
session to your development board, and:
-
Build the AppBuilder wheel with QNN bindings:
# Build dependency
sudo apt update && sudo apt install -y yq cmake
wget https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.40.0.251030/v2.40.0.251030.zip
unzip v2.40.0.251030.zip
cd v2.40.0.251030/qairt
source bin/envsetup.sh
# Clone the repository (verified on this commit, you might be able to move to the main branch)
git clone https://github.com/quic/ai-engine-direct-helper
cd ai-engine-direct-helper
git checkout fb765f776261bd2cf55d949745eeb9e3d8278493
git submodule update --init --recursive
# Create a new venv
python3.12 -m venv .venv
source .venv/bin/activate
# Build the wheel
pip3 install setuptools
python setup.py bdist_wheel
# Deactivate the venv
deactivate
export APPBUILDER_WHEEL=$PWD/dist/qai_appbuilder-*-cp312-cp312-linux_aarch64.whl
-
Now create a new folder for the application:
mkdir -p ~/context-binary-demo
cd ~/context-binary-demo
# Create a new venv
python3.12 -m venv .venv
source .venv/bin/activate
# Install the QAI AppBuilder plus some other dependencies
pip3 install $APPBUILDER_WHEEL
pip3 install numpy==2.3.3 Pillow==11.3.0
-
Create a new file
context_demo.py and add:
import os, urllib.request, time, numpy as np, argparse
from qai_appbuilder import (QNNContext, Runtime, LogLevel, ProfilingLevel, PerfProfile, QNNConfig)
from PIL import Image
def download_file_if_not_exists(path, url):
if not os.path.exists(path):
os.makedirs(os.path.dirname(path), exist_ok=True)
print(f"Downloading {path} from {url}...")
urllib.request.urlretrieve(url, path)
return path
# Path to your model/label/test image (will be download automatically, Inception-v3 from https://aihub.qualcomm.com/models/inception_v3)
MODEL_PATH = download_file_if_not_exists('models/Inception-v3_w8a8.dlc', 'https://huggingface.co/qualcomm/Inception-v3/resolve/v0.41.1/Inception-v3_w8a8.dlc')
LABELS_PATH = download_file_if_not_exists('models/inception_v3_labels.txt', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/inception_v3_labels.txt')
IMAGE_PATH = download_file_if_not_exists('images/samoyed-square.jpg', 'https://cdn.edgeimpulse.com/qc-ai-docs/example-images/samoyed-square.jpg')
# Parse labels
with open(LABELS_PATH, 'r') as f:
labels = [line for line in f.read().splitlines() if line.strip()]
# Set up the QNN config (/usr/lib => where all QNN libraries are installed)
QNNConfig.Config('/usr/lib', Runtime.HTP, LogLevel.WARN, ProfilingLevel.BASIC)
# Create a new context (name, path to .bin file)
ctx = QNNContext(os.path.basename(MODEL_PATH), MODEL_PATH)
# Load and preprocess image, input is scaled 0..1 (f32), no need to quantize yourself
def load_image(path, input_shape):
# Expected input shape: [1, height, width, channels]
_, height, width, channels = input_shape
# expects unquantized input 0..1
img = Image.open(path).convert("RGB").resize((width, height))
img_np = np.array(img, dtype=np.float32)
img_np = img_np / 255
# add batch dim
img_np = np.expand_dims(img_np, axis=0)
return img_np
# Load image from disk and resize to the required model input (ctx.getInputShapes()[0] -> input shape for tensor 0)
input_data = load_image(IMAGE_PATH, ctx.getInputShapes()[0])
print('input_data', input_data.shape)
# Run inference once to warmup
f_output = ctx.Inference(input_data)[0]
# Then run 10x
start = time.perf_counter()
for i in range(0, 10):
f_output = ctx.Inference(input_data)[0]
end = time.perf_counter()
# Image classification models in AI Hub miss a Softmax() layer at the end of the model, so add it manually
def softmax(x, axis=-1):
# subtract max for numerical stability
x_max = np.max(x, axis=axis, keepdims=True)
e_x = np.exp(x - x_max)
return e_x / np.sum(e_x, axis=axis, keepdims=True)
# show top-5 predictions
scores = softmax(f_output[0])
top_k = scores.argsort()[-5:][::-1]
print("\nTop-5 predictions:")
for i in top_k:
print(f"Class {labels[i]}: score={scores[i]}")
print('')
print(f'Inference took (on average): {((end - start) * 1000) / 10:.4g}ms. per image')
-
Run the example:
python3 context_demo.py
# Top-5 predictions:
# Class Samoyed: score=0.9999812841415405
# Class white wolf: score=8.22735091787763e-06
# Class Great Pyrenees: score=4.002702098659938e-06
# Class Arctic fox: score=1.6263725228782278e-06
# Class Eskimo dog: score=1.3582930478150956e-06
#
# Inference took (on average): 5.931ms. per image
Great! You now have ran a model in context binary format on the NPU.