Running on a specific NPU (QCS9075) - Qualcomm Dragonwing Documentation

The Dragonwing IQ-9075 EVK contains two powerful NPUs that can be used for running models. By default the LLM/VLM models run on NPU0, but you have the ability to run the model on NPU1 as well. To do this, go into the model directory you configured during model setup and you should see a file called htp_backend_ext_config.json

In this example, we will look at the file ~/models/qwen3_4b_instruct_2507/htp_backend_ext_config.json:

htp_backend_ext_config.json

{
    "devices": [
        {
            "soc_model": 43,
            "dsp_arch": "v73",
            "cores": [
                {
                    "core_id": 0,
                    "perf_profile": "burst",
                    "rpc_control_latency": 100
                }
            ]
        }
    ],
    "memory": {
        "mem_type": "shared_buffer"
    },
    "context": {
        "weight_sharing_enabled": true
    }
}

To specify which NPU to run on, add the "device_id" line below and set to 0 (NPU0 - default) or 1 (NPU1) before starting the container.

htp_backend_ext_config.json configured to NPU1

{
    "devices": [
        {
            "device_id": 1,
            "soc_model": 43,
            "dsp_arch": "v73",
            "cores": [
                {
                    "core_id": 0,
                    "perf_profile": "burst",
                    "rpc_control_latency": 100
                }
            ]
        }
    ],
    "memory": {
        "mem_type": "shared_buffer"
    },
    "context": {
        "weight_sharing_enabled": true
    }
}

This will only work on the IQ-9075 EVK as it has dual NPUs. Devices such as the RB3 Gen2 Vision Kit and the RUBIK Pi 3 only have a single NPU, so this section does not apply.

Using with Langchain Whisper

⌘I