Use AI Hub to optimize a model - Qualcomm Dragonwing Documentation

For quick prototyping of models on Qualcomm AI hardware, AI Hub provides a way to optimize, validate, and deploy machine learning models on-device for vision, audio, and speech use cases.

Set up your environment

Set up your Python environment. Install miniconda on your host machine.
- Windows
- macOS/Linux
When the installation finishes, open an Anaconda prompt from the Start menu.
When the installation finishes, open a new shell window.
Set up a Python virtual environment for AI Hub:
```
conda activate 
```
```
conda create python=3.10 -n qai_hub 
```
```
conda activate qai_hub
```
Install git.
```
sudo apt-get install git
```

Install the AI Hub Python client.

pip3 install qai-hub

pip3 install "qai-hub[torch]"

Sign in to AI Hub. Go to AI Hub and sign in with your Qualcomm ID to view information about jobs you create. Once signed in, go to Account > Settings > API Token to obtain the API token used to configure your client.
Configure the client with your API token using the following command in your terminal.
```
qai-hub configure --api_token <INSERT_API_TOKEN>
```

Choose an AI Hub workflow

Try a preoptimized model

Go to AI Hub Model Zoo to access preoptimized models available for Qualcomm evaluation kits.
Filter models for your EVK by selecting the matching chipset in the left pane. For example, select Qualcomm QCS6490 for the Qualcomm Dragonwing™ RB3 Gen 2, or Qualcomm QCS8300 for the Qualcomm Dragonwing™ IQ-8275 EVK.
Select a model from the filtered view to go to the model page.
On the model page, select the runtime and precision.
Select Download to download the model. The downloaded model is preoptimized and ready for deployment. See Run inference for more information.

Bring your own model

Select a pretrained model in PyTorch or ONNX format.
Submit a model for compilation or optimization to AI Hub using Python APIs. When submitting a compilation job, select a device or chipset for your EVK and the target runtime. For Qualcomm Dragonwing™ RB3 Gen 2, the LiteRT runtime is supported.
Chipset Runtime CPU GPU HTP
Qualcomm Dragonwing™ RB3 Gen 2 LiteRT INT8,FP16, FP32 FP16,FP32 INT8,INT16
On submission, AI Hub generates a unique ID for the job. You can use this job ID to view job details.
AI Hub optimizes the model based on your device and runtime selections.
- Optionally, you can submit a job to profile or run inference on the optimized model (using Python APIs) on a real device provisioned from a device farm.
  - Profiling: Benchmarks the model on a provisioned device and provides statistics, including average inference times at the layer level, runtime configuration, etc.
  - Inference: Performs inference using an optimized model on data submitted as part of the inference job by running the model on a provisioned device.
Each submitted job is available for review in the AI Hub portal. A completed compilation job provides a downloadable link to the optimized model, which can then be deployed on a local development device such as Qualcomm Dragonwing™ RB3 Gen 2.

Chipset	Runtime	CPU	GPU	HTP
Qualcomm Dragonwing™ RB3 Gen 2	LiteRT	INT8,FP16, FP32	FP16,FP32	INT8,INT16

The following example, taken from the AI Hub documentation, uploads a pretrained MobileNet V2 model from PyTorch to AI Hub and compiles it to an optimized LiteRT model for Qualcomm Dragonwing™ RB3 Gen 2.

import qai_hub as hub
import torch
from torchvision.models import mobilenet_v2
import numpy as np

# Using pre-trained MobileNet
torch_model = mobilenet_v2(pretrained=True)
torch_model.eval()

# Trace model (for on-device deployment)
input_shape = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
traced_torch_model = torch.jit.trace(torch_model, example_input)

# Compile and optimize the model for a specific device
compile_job = hub.submit_compile_job(
    model=traced_torch_model,
    device=hub.Device("Dragonwing RB3 Gen 2 Vision Kit"),
    input_specs=dict(image=input_shape),
    #compile_options="--target_runtime tflite",
)

# Profiling Job
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Dragonwing RB3 Gen 2 Vision Kit"),
)

sample = np.random.random((1, 3, 224, 224)).astype(np.float32)

# Inference Job
inference_job = hub.submit_inference_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Dragonwing RB3 Gen 2 Vision Kit"),
    inputs=dict(image=[sample]),
)

# Download model
compile_job.download_target_model(filename="/tmp/mobilenetv2.tflite")

To deactivate a previously activated qai_hub environment, use the following command.

conda deactivate

Once the model is downloaded, it is ready for deployment. See Run inference for next steps. For more details about the AI Hub workflow and APIs, see the AI Hub documentation, explore the AI Hub tutorial videos, or watch the following video about how to profile models in AI Hub.

​Set up your environment

​Choose an AI Hub workflow

​Try a preoptimized model

​Bring your own model

Set up your environment

Choose an AI Hub workflow

Try a preoptimized model

Bring your own model