> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Run context binaries (.bin/.dlc)

Some models from [AI Hub](/ai-workflows/ai-hub) are released as context binaries (`.bin` files) or as Deep Learning Container (`.dlc`) files. Context binaries contain the model, plus hardware optimizations; and can be run with Qualcomm tools that directly use the Qualcomm® AI Runtime SDK. Examples of this are [Genie](/ai-workflows/genie) (to run LLMs) and [VoiceAI ASR](/ai-workflows/whisper) (to run voice transcription); but you can also run context binaries directly from Python using QAI AppBuilder. `.dlc` files are a portable representation that are converted to context binaries for your specific target at runtime.

<Tip>**.bin files are not portable:** Context binaries (`.bin`) are not portable. They are tied to both the AI Engine Direct SDK version and your hardware target.</Tip>

## Finding supported models

Models in context binary format can be found in a few places:

* [Qualcomm AI Hub](https://aihub.qualcomm.com/models):

  1. Under 'Chipset', select:

     * RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490 (Proxy)'
     * RUBIK Pi 3: 'Qualcomm QCS6490 (Proxy)'
     * IQ-9075 EVK: 'Qualcomm QCS9075 (Proxy)'

  2. Under 'Runtime', select "Qualcomm® AI Runtime".

* [Aplux model zoo](https://aiot.aidlux.com/en/models):

  1. Under 'Chipset', select:

     * RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490'
     * RUBIK Pi 3: 'Qualcomm QCS6490'
     * IQ-9075 EVK: 'Qualcomm QCS9075'

<Warning>Note that the NPU only supports quantized models. Floating point models (or layers) will be automatically moved back to the CPU.</Warning>

## Example: Inception-v3 (Python)

Here's how you can run an image classification model (downloaded from [AI Hub](https://aihub.qualcomm.com/models/inception_v3)) on the NPU using QAI AppBuilder. Open the terminal on your development board, or an SSH
session to your development board, and:

1. Build the AppBuilder wheel with QNN bindings:

   ```
   # Build dependency
   sudo apt update && sudo apt install -y yq cmake

   wget https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.40.0.251030/v2.40.0.251030.zip
   unzip v2.40.0.251030.zip
   cd v2.40.0.251030/qairt
   source bin/envsetup.sh

   # Clone the repository (verified on this commit, you might be able to move to the main branch)
   git clone https://github.com/quic/ai-engine-direct-helper
   cd ai-engine-direct-helper
   git checkout fb765f776261bd2cf55d949745eeb9e3d8278493
   git submodule update --init --recursive

   # Create a new venv
   python3.12 -m venv .venv
   source .venv/bin/activate

   # Build the wheel
   pip3 install setuptools
   python setup.py bdist_wheel

   # Deactivate the venv
   deactivate

   export APPBUILDER_WHEEL=$PWD/dist/qai_appbuilder-*-cp312-cp312-linux_aarch64.whl
   ```

2. Now create a new folder for the application:

   ```
   mkdir -p ~/context-binary-demo
   cd ~/context-binary-demo

   # Create a new venv
   python3.12 -m venv .venv
   source .venv/bin/activate

   # Install the QAI AppBuilder plus some other dependencies
   pip3 install $APPBUILDER_WHEEL
   pip3 install numpy==2.3.3 Pillow==11.3.0
   ```

3. Create a new file `context_demo.py` and add:

   ```python theme={null}
   import os, urllib.request, time, numpy as np, argparse
   from qai_appbuilder import (QNNContext, Runtime, LogLevel, ProfilingLevel, PerfProfile, QNNConfig)
   from PIL import Image

   def download_file_if_not_exists(path, url):
      if not os.path.exists(path):
         os.makedirs(os.path.dirname(path), exist_ok=True)
         print(f"Downloading {path} from {url}...")
         urllib.request.urlretrieve(url, path)
      return path

   # Path to your model/label/test image (will be download automatically, Inception-v3 from https://aihub.qualcomm.com/models/inception_v3)
   MODEL_PATH = download_file_if_not_exists('models/Inception-v3_w8a8.dlc', 'https://huggingface.co/qualcomm/Inception-v3/resolve/v0.41.1/Inception-v3_w8a8.dlc')
   LABELS_PATH = download_file_if_not_exists('models/inception_v3_labels.txt', 'https://cdn.edgeimpulse.com/qc-ai-docs/models/inception_v3_labels.txt')
   IMAGE_PATH = download_file_if_not_exists('images/samoyed-square.jpg', 'https://cdn.edgeimpulse.com/qc-ai-docs/example-images/samoyed-square.jpg')

   # Parse labels
   with open(LABELS_PATH, 'r') as f:
      labels = [line for line in f.read().splitlines() if line.strip()]

   # Set up the QNN config (/usr/lib => where all QNN libraries are installed)
   QNNConfig.Config('/usr/lib', Runtime.HTP, LogLevel.WARN, ProfilingLevel.BASIC)

   # Create a new context (name, path to .bin file)
   ctx = QNNContext(os.path.basename(MODEL_PATH), MODEL_PATH)

   # Load and preprocess image, input is scaled 0..1 (f32), no need to quantize yourself
   def load_image(path, input_shape):
      # Expected input shape: [1, height, width, channels]
      _, height, width, channels = input_shape

      # expects unquantized input 0..1
      img = Image.open(path).convert("RGB").resize((width, height))
      img_np = np.array(img, dtype=np.float32)
      img_np = img_np / 255
      # add batch dim
      img_np = np.expand_dims(img_np, axis=0)
      return img_np

   # Load image from disk and resize to the required model input (ctx.getInputShapes()[0] -> input shape for tensor 0)
   input_data = load_image(IMAGE_PATH, ctx.getInputShapes()[0])

   print('input_data', input_data.shape)

   # Run inference once to warmup
   f_output = ctx.Inference(input_data)[0]

   # Then run 10x
   start = time.perf_counter()
   for i in range(0, 10):
      f_output = ctx.Inference(input_data)[0]
   end = time.perf_counter()

   # Image classification models in AI Hub miss a Softmax() layer at the end of the model, so add it manually
   def softmax(x, axis=-1):
      # subtract max for numerical stability
      x_max = np.max(x, axis=axis, keepdims=True)
      e_x = np.exp(x - x_max)
      return e_x / np.sum(e_x, axis=axis, keepdims=True)

   # show top-5 predictions
   scores = softmax(f_output[0])
   top_k = scores.argsort()[-5:][::-1]
   print("\nTop-5 predictions:")
   for i in top_k:
      print(f"Class {labels[i]}: score={scores[i]}")

   print('')
   print(f'Inference took (on average): {((end - start) * 1000) / 10:.4g}ms. per image')
   ```

4. Run the example:

   ```
   python3 context_demo.py

   # Top-5 predictions:
   # Class Samoyed: score=0.9999812841415405
   # Class white wolf: score=8.22735091787763e-06
   # Class Great Pyrenees: score=4.002702098659938e-06
   # Class Arctic fox: score=1.6263725228782278e-06
   # Class Eskimo dog: score=1.3582930478150956e-06
   #
   # Inference took (on average): 5.931ms. per image
   ```

Great! You now have ran a model in context binary format on the NPU.
