Skip to main content
qrb_ros_nn_inference is a generic ROS 2 node that loads a neural-network model and runs inference on the Hexagon HTP NPU via the Qualcomm AI Engine Direct (QNN) SDK. You point it at a model file, it subscribes to an input topic, and publishes raw inference results on an output topic — no model-specific wiring required.
This is the “can I run my model on the NPU from ROS?” answer in one node. Drop in a .tflite / .so / .bin exported from Qualcomm AI Hub (or your own QNN export), point the node at it, and publish / subscribe.

What it is

Underneath, the node wraps qrb_inference_manager — a small C++ library that calls the QNN APIs and the QNN delegate for TensorFlow Lite. The ROS layer just adds a subscription, a publication, and parameter-driven backend selection.

Supported model formats

FormatWhen to use
.tfliteTFLite models exported from AI Hub or trained locally.
.soPre-compiled QNN binaries (best HTP performance, locked to a target).
.binQNN context binaries.
Per upstream README: .tflite inference is not supported on qrb_ros_nn_inference 1.1.0-jazzy. If you’re on that release and need TFLite, build from source on main or use the hand-rolled approach in npu-workflows.mdx.

Quick start

1

Install on Qualcomm Ubuntu

sudo add-apt-repository ppa:ubuntu-qcom-iot/qcom-ppa
sudo add-apt-repository ppa:ubuntu-qcom-iot/qirp
sudo apt update
sudo apt install ros-jazzy-qrb-ros-nn-inference
2

Run with your model

ros2 run qrb_ros_nn_inference qrb_ros_nn_inference \
  --ros-args \
    -p model_path:=/path/to/your_model.so \
    -p backend_option:=htp
Then publish your input on the configured input topic and subscribe to the output topic. See the upstream API reference for the full parameter list.

Why this helps

AlternativeShort take
Hand-rolled TFLite node — see npu-workflows.mdxMaximum control; you own preprocessing, delegate loading, topic wiring. Useful for learning.
CPU-only TFLite / ONNX nodeWorks everywhere, but no NPU — defeats the point of Qualcomm hardware.
qrb_ros_samples packaged pipelinesModel-specific wrappers (object detection, segmentation, etc.); less flexible than a generic loader.
A generic ROS 2 node that loads a .tflite / .so / .bin model on the Hexagon HTP NPU via QNN. For evaluators: drop in your own model, configure the node, and you’re publishing inference results on a ROS topic. Pre/post-processing is up to you (or use qrb_ros_tensor_process for YOLO-shaped tensors).