> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# QRB ROS NN Inference

[`qrb_ros_nn_inference`](https://github.com/qualcomm-qrb-ros/qrb_ros_nn_inference) is a generic ROS 2 node that loads a neural-network model and runs inference on the Hexagon HTP NPU via the [Qualcomm AI Engine Direct (QNN)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html) SDK. You point it at a model file, it subscribes to an input topic, and publishes raw inference results on an output topic — no model-specific wiring required.

<Note>
  This is the "can I run **my** model on the NPU from ROS?" answer in one node. Drop in a `.tflite` / `.so` / `.bin` exported from [Qualcomm AI Hub](https://aihub.qualcomm.com) (or your own QNN export), point the node at it, and publish / subscribe.
</Note>

## What it is

Underneath, the node wraps `qrb_inference_manager` — a small C++ library that calls the QNN APIs and the [QNN delegate for TensorFlow Lite](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/overview.html). The ROS layer just adds a subscription, a publication, and parameter-driven backend selection.

```mermaid theme={null}
flowchart LR
    M[".tflite / .so / .bin<br/>(AI Hub or custom)"] --> N
    IN["input topic<br/>(image, tensor, etc.)"] --> N["qrb_ros_nn_inference<br/>node"]
    N --> Q["qrb_inference_manager<br/>(QNN SDK + TFLite delegate)"]
    Q --> H["Hexagon HTP NPU"]
    H --> Q
    Q --> N
    N --> OUT["output topic<br/>(raw model output)"]
    style H fill:#31017D,stroke:#31017D,color:#fff
```

## Supported model formats

| Format    | When to use                                                           |
| --------- | --------------------------------------------------------------------- |
| `.tflite` | TFLite models exported from AI Hub or trained locally.                |
| `.so`     | Pre-compiled QNN binaries (best HTP performance, locked to a target). |
| `.bin`    | QNN context binaries.                                                 |

<Warning>
  Per upstream README: `.tflite` inference is **not supported on `qrb_ros_nn_inference` 1.1.0-jazzy**. If you're on that release and need TFLite, build from source on `main` or use the hand-rolled approach in [`npu-workflows.mdx`](./npu-workflows).
</Warning>

## Quick start

<Steps>
  <Step title="Install on Qualcomm Ubuntu">
    ```bash theme={null}
    sudo add-apt-repository ppa:ubuntu-qcom-iot/qcom-ppa
    sudo add-apt-repository ppa:ubuntu-qcom-iot/qirp
    sudo apt update
    sudo apt install ros-jazzy-qrb-ros-nn-inference
    ```
  </Step>

  <Step title="Run with your model">
    ```bash theme={null}
    ros2 run qrb_ros_nn_inference qrb_ros_nn_inference \
      --ros-args \
        -p model_path:=/path/to/your_model.so \
        -p backend_option:=htp
    ```

    Then publish your input on the configured input topic and subscribe to the output topic. See the [upstream API reference](https://github.com/qualcomm-qrb-ros/qrb_ros_nn_inference#-apis) for the full parameter list.
  </Step>
</Steps>

## Why this helps

| Alternative                                                          | Short take                                                                                           |
| -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| Hand-rolled TFLite node — see [`npu-workflows.mdx`](./npu-workflows) | Maximum control; you own preprocessing, delegate loading, topic wiring. Useful for learning.         |
| CPU-only TFLite / ONNX node                                          | Works everywhere, but no NPU — defeats the point of Qualcomm hardware.                               |
| `qrb_ros_samples` packaged pipelines                                 | Model-specific wrappers (object detection, segmentation, etc.); less flexible than a generic loader. |

A generic ROS 2 node that loads a `.tflite` / `.so` / `.bin` model on the Hexagon HTP NPU via QNN. For evaluators: drop in your own model, configure the node, and you're publishing inference results on a ROS topic. Pre/post-processing is up to you (or use [`qrb_ros_tensor_process`](https://github.com/qualcomm-qrb-ros/qrb_ros_tensor_process) for YOLO-shaped tensors).

## Related

* [`npu-workflows.mdx`](./npu-workflows) — hand-rolled depth-estimation pipeline using the QNN TFLite delegate directly. Useful if you want to understand what `qrb_ros_nn_inference` automates.
* [`qrb-ros-samples.mdx`](./qrb-ros-samples) — model-specific reference pipelines built around this node.
* Upstream: [`qualcomm-qrb-ros/qrb_ros_nn_inference`](https://github.com/qualcomm-qrb-ros/qrb_ros_nn_inference).