Depth Estimation on the NPU - Qualcomm Dragonwing Documentation

This page is a hand-rolled version of sample_depth_estimation from the qrb_ros_samples catalog. Instead of installing the packaged sample, you build the ROS 2 node yourself — subscribing to a camera topic, running a quantized TFLite model on the Hexagon HTP NPU via the Qualcomm QNN delegate, and publishing a colorized depth image plus the raw inverse-depth map. The goal is to show end-to-end how the pieces fit so you can build your own nodes for any model.

Why build your own instead of using the sample? The sample catalog is a great starting point, but you’ll eventually hit a model or pipeline shape it doesn’t cover. This page walks through the same pattern the samples use — QNN delegate loading, preprocessing, inference, postprocess, publish — wired against stock sensor_msgs / cv_bridge and a model from Qualcomm AI Hub. Once you’ve seen it once, you can swap MiDaS v2 for any AI Hub model and reuse the same scaffolding. Both this page and sample_depth_estimation target the same Hexagon HTP NPU — this one just exposes every wire.

Where each stage runs

Stage	Where it runs	Notes
Camera capture	ISP	Camera hardware block.
Color convert (YUYV → BGR)	CPU	Done inside `v4l2_camera`. Can be offloaded to GPU via IM SDK GStreamer plugins.
Preprocess (resize, normalize)	CPU	`cv2.resize` + NumPy in `midas_tflite.py`.
Inference	NPU (Hexagon HTP)	via the QNN TFLite delegate (`libQnnTFLiteDelegate.so`, backend `htp`). This is the Qualcomm differentiator.
Postprocess (colorize)	CPU	`cv2.applyColorMap`.
Publish	CPU	`rclpy` + `cv_bridge`.

How this differs from a stock ROS 2 TFLite node

Stock TFLite runs on CPU (or OpenCL GPU at best). The Hexagon HTP NPU is only reachable through the Qualcomm QNN delegate or the QNN SDK, which is what this pipeline loads.
Every node boundary is a memcpy. cv_bridge + v4l2_camera allocate and copy the full frame on each hop. For a hardware‑to‑hardware pipeline (camera ISP → NPU) that copy is avoidable — see qrb_ros_transport for DMA‑buf fd passing.
NVIDIA Isaac ROS / Intel OpenVINO packages target different silicon (NVIDIA GPU / Intel VPU) and won’t run on Qualcomm hardware.

Prerequisites: Complete the Software Setup and TurtleBot3 workflows before starting.

Pre and post processing in this workflow — image decoding, resizing, color conversion, and visualization — runs on the CPU. For GPU‑accelerated pre and post processing, use the GStreamer plugins in IM SDK.

Install prerequisites

Install the Python TFLite runtime, verify the QNN TFLite delegate, and install the camera driver.TFLite runtime

pip install ai-edge-litert

If ai-edge-litert is unavailable for your platform, fall back to tflite-runtime:

pip install tflite-runtime

QNN TFLite delegate — confirm the shared library is present:

ls /usr/lib/libQnnTFLiteDelegate.so

If the file is missing, install the Qualcomm AI SDK or QNN runtime package for your device before continuing. The library must be at /usr/lib/libQnnTFLiteDelegate.so (the default path the node loads from).Camera driver

sudo apt install -y ros-jazzy-v4l2-camera

Get the model

All models on Qualcomm AI Hub are compiled and validated for your specific target device before you download them.

Go to https://aihub.qualcomm.com/models/midas.
Select Export and choose your target device — IQ‑8275 EVK for IQ8, or IQ‑9075 EVK for IQ9.
Select TFLite as the runtime and INT8 quantization (w8a8).
Download the exported .tflite file — it will be named midas-midas-v2-w8a8.tflite.

After downloading, confirm the filename ends in .tflite. If AI Hub only shows an ONNX export option for your device, TFLite is not available for that combination — see the AI Workflows section for how to run ONNX models on the NPU instead.

Scaffold the package

ros2 pkg create generates all the boilerplate — package.xml, setup.cfg, setup.py, the ament resource marker, and __init__.py.

Navigate to your workspace root before running these commands. All paths in the remaining steps are relative to your workspace root.

The package name must be the first positional argument after ros2 pkg create — place it immediately after create, before any flags. Putting it after --dependencies causes the CLI to treat it as another dependency, and the command will fail with no package created.

cd src
ros2 pkg create midas_depth_ros \
  --build-type ament_python \
  --dependencies rclpy sensor_msgs std_msgs cv_bridge

Create the remaining directories and move the model into place:

mkdir -p midas_depth_ros/launch
mkdir -p midas_depth_ros/config
mkdir -p midas_depth_ros/models

mv ~/Downloads/midas-midas-v2-w8a8.tflite midas_depth_ros/models/

Update package.xml

ros2 pkg create added the ROS dependencies already. Append these two Python system dependencies inside the <package> block:

<exec_depend>python3-numpy</exec_depend>
<exec_depend>python3-opencv</exec_depend>

After adding these dependencies, re‑run rosdep install to resolve them:

cd ..
rosdep install --from-paths src --ignore-src -r -y

Replace setup.py

The generated setup.py needs updated data_files to install the launch file, config, and model.

from setuptools import setup
from glob import glob

package_name = 'midas_depth_ros'

setup(
    name=package_name,
    version='0.1.0',
    packages=[package_name],
    data_files=[
        ('share/ament_index/resource_index/packages', ['resource/' + package_name]),
        ('share/' + package_name, ['package.xml']),
        ('share/' + package_name + '/launch', glob('launch/*.py')),
        ('share/' + package_name + '/config', glob('config/*.yaml')),
        ('share/' + package_name + '/models', glob('models/*.tflite')),
    ],
    install_requires=['setuptools'],
    zip_safe=True,
    maintainer='maintainer',
    maintainer_email='you@example.com',
    description='MiDaS monocular depth estimation (TFLite) for ROS 2 on the Hexagon HTP NPU.',
    entry_points={
        'console_scripts': [
            'midas_depth_node = midas_depth_ros.midas_depth_node:main',
        ],
    },
)

Write the source files

The package has two source files: a TFLite wrapper class that handles delegate loading, preprocessing, inference, and visualization; and a ROS 2 node that wires the camera subscription, runs inference, and publishes results.

midas_depth_ros/midas_tflite.py

TFLite wrapper that handles QNN delegate loading, input preprocessing, inference, and depth colorization:

"""MiDaS v2 (w8a8) TFLite runner for monocular depth estimation.

Input  : [1, H, W, 3] uint8 or float32 (auto-detected from interpreter)
Output : [1, H, W] or [1, H, W, 1] depth (inverse-depth, higher = closer)
"""
import os
import numpy as np
import cv2

try:
    from tflite_runtime.interpreter import Interpreter, load_delegate
except ImportError:
    try:
        from tensorflow.lite.python.interpreter import Interpreter  # type: ignore
        try:
            from tensorflow.lite.python.interpreter import load_delegate  # type: ignore
        except ImportError:
            load_delegate = None
    except ImportError:
        from ai_edge_litert.interpreter import Interpreter  # type: ignore
        try:
            from ai_edge_litert.interpreter import load_delegate  # type: ignore
        except ImportError:
            load_delegate = None


class MidasTFLite:
    def __init__(self, model_path,
                 use_qnn_delegate=False, qnn_delegate_path=None, qnn_backend='htp'):
        if not os.path.isfile(model_path):
            raise FileNotFoundError(model_path)

        delegates = []
        self.delegate_active = False
        self.delegate_error = None
        if use_qnn_delegate:
            if load_delegate is None:
                self.delegate_error = 'load_delegate unavailable in this TFLite runtime'
            elif not qnn_delegate_path or not os.path.isfile(qnn_delegate_path):
                self.delegate_error = f'QNN delegate .so not found at {qnn_delegate_path}'
            else:
                try:
                    opts = {'backend_type': qnn_backend}
                    delegates = [load_delegate(qnn_delegate_path, options=opts)]
                    self.delegate_active = True
                except Exception as e:
                    self.delegate_error = f'load_delegate failed: {e}'

        self.interp = Interpreter(model_path=model_path,
                                  experimental_delegates=delegates or None)
        self.interp.allocate_tensors()
        self.inp = self.interp.get_input_details()[0]
        self.out = self.interp.get_output_details()[0]

        shape = self.inp['shape']  # [1, H, W, 3]
        self.in_h = int(shape[1])
        self.in_w = int(shape[2])

    @staticmethod
    def _dequant(arr, detail):
        q = detail.get('quantization', (0.0, 0))
        scale, zp = q[0], q[1]
        if scale and scale > 0:
            return (arr.astype(np.float32) - zp) * scale
        return arr.astype(np.float32)

    def infer(self, bgr):
        """Run inference on a BGR image; return a float32 HxW inverse-depth map
        in the ORIGINAL image resolution."""
        h, w = bgr.shape[:2]
        rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
        resized = cv2.resize(rgb, (self.in_w, self.in_h),
                             interpolation=cv2.INTER_CUBIC)

        dtype = self.inp['dtype']
        if dtype == np.uint8:
            x = resized.astype(np.uint8)
        else:
            x = resized.astype(np.float32) / 255.0
            mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
            std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)
            x = (x - mean) / std

        x = np.expand_dims(x, 0).astype(dtype)

        self.interp.set_tensor(self.inp['index'], x)
        self.interp.invoke()
        raw = self.interp.get_tensor(self.out['index'])
        depth = self._dequant(raw, self.out)

        depth = np.squeeze(depth)  # -> (H, W)
        if depth.ndim != 2:
            depth = depth.reshape(self.in_h, self.in_w)

        return cv2.resize(depth, (w, h), interpolation=cv2.INTER_CUBIC)

    @staticmethod
    def colorize(depth, colormap=cv2.COLORMAP_INFERNO):
        """Normalize to 0..255 and apply a colormap -> BGR uint8 for display."""
        d = depth.astype(np.float32)
        dmin, dmax = float(np.min(d)), float(np.max(d))
        if dmax - dmin < 1e-6:
            norm = np.zeros_like(d, dtype=np.uint8)
        else:
            norm = ((d - dmin) / (dmax - dmin) * 255.0).astype(np.uint8)
        return cv2.applyColorMap(norm, colormap)

midas_depth_ros/midas_depth_node.py

ROS 2 node that wires the camera subscription, inference, and topic publishing together:

"""ROS 2 node: MiDaS monocular depth estimation from a mono RGB camera."""
import os
import time

import numpy as np
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
from ament_index_python.packages import get_package_share_directory

from .midas_tflite import MidasTFLite


class MidasDepthNode(Node):
    def __init__(self):
        super().__init__('midas_depth_node')
        self._declare_params()
        self.bridge = CvBridge()

        model_path = self.get_parameter('model_path').value
        if not model_path:
            model_path = os.path.join(
                get_package_share_directory('midas_depth_ros'),
                'models', 'midas-midas-v2-w8a8.tflite')

        self.midas = MidasTFLite(
            model_path=model_path,
            use_qnn_delegate=bool(self.get_parameter('use_qnn_delegate').value),
            qnn_delegate_path=str(self.get_parameter('qnn_delegate_path').value),
            qnn_backend=str(self.get_parameter('qnn_backend').value),
        )
        self.get_logger().info(
            f'MiDaS loaded: {model_path}  input={self.midas.in_w}x{self.midas.in_h}')
        want_npu = bool(self.get_parameter('use_qnn_delegate').value)
        if want_npu and self.midas.delegate_active:
            self.get_logger().info(
                f"✅ QNN delegate ACTIVE on '{self.get_parameter('qnn_backend').value}' "
                f"backend — inference runs on the NPU.")
        elif want_npu:
            self.get_logger().error(
                f'❌ QNN delegate requested but NOT active — running on CPU. '
                f'Reason: {self.midas.delegate_error}')
        else:
            self.get_logger().warn('QNN delegate disabled — running on CPU.')

        qos = 10
        self.sub = self.create_subscription(
            Image, self.get_parameter('image_topic').value, self.on_image, qos)

        self.pub_vis = self.create_publisher(
            Image, self.get_parameter('depth_image_topic').value, qos)
        self.pub_raw = self.create_publisher(
            Image, self.get_parameter('depth_raw_topic').value, qos)

        self._perf_interval = float(self.get_parameter('perf_log_interval_sec').value)
        self._perf_reset()

    def _declare_params(self):
        defaults = {
            'image_topic':           '/image_raw',
            'depth_image_topic':     '/midas/depth_image',
            'depth_raw_topic':       '/midas/depth',
            'model_path':            '',
            'use_qnn_delegate':      False,
            'qnn_delegate_path':     '/usr/lib/libQnnTFLiteDelegate.so',
            'qnn_backend':           'htp',
            'colormap':              'inferno',
            'perf_log_interval_sec': 2.0,
        }
        for name, value in defaults.items():
            self.declare_parameter(name, value)

    _COLORMAPS = {
        'inferno': 14, 'magma': 13, 'viridis': 16, 'plasma': 15,
        'jet': 2, 'turbo': 20, 'hot': 11, 'bone': 1,
    }

    def _perf_reset(self):
        self._perf_window_start = time.monotonic()
        self._perf_frames       = 0
        self._perf_total_ms     = 0.0
        self._perf_max_ms       = 0.0

    def _perf_maybe_log(self, frame_ms: float):
        self._perf_frames   += 1
        self._perf_total_ms += frame_ms
        if frame_ms > self._perf_max_ms:
            self._perf_max_ms = frame_ms
        elapsed = time.monotonic() - self._perf_window_start
        if elapsed < self._perf_interval:
            return
        fps    = self._perf_frames / elapsed if elapsed > 0 else 0.0
        avg_ms = self._perf_total_ms / self._perf_frames if self._perf_frames else 0.0
        self.get_logger().info(
            f'[perf] {fps:5.2f} Hz  avg {avg_ms:6.2f} ms  max {self._perf_max_ms:6.2f} ms  '
            f'(window {self._perf_frames} frames / {elapsed:.1f}s)')
        self._perf_reset()

    def on_image(self, msg: Image):
        t0 = time.monotonic()
        try:
            bgr = self.bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')
        except Exception as e:
            self.get_logger().error(f'cv_bridge failed: {e}')
            return

        depth = self.midas.infer(bgr)

        cmap_name = str(self.get_parameter('colormap').value).lower()
        cmap      = self._COLORMAPS.get(cmap_name, 14)
        vis       = MidasTFLite.colorize(depth, colormap=cmap)

        vis_msg        = self.bridge.cv2_to_imgmsg(vis, encoding='bgr8')
        vis_msg.header = msg.header
        self.pub_vis.publish(vis_msg)

        raw_msg        = self.bridge.cv2_to_imgmsg(depth.astype(np.float32), encoding='32FC1')
        raw_msg.header = msg.header
        self.pub_raw.publish(raw_msg)

        self._perf_maybe_log((time.monotonic() - t0) * 1000.0)


def main():
    rclpy.init()
    node = MidasDepthNode()
    try:
        rclpy.spin(node)
    finally:
        node.destroy_node()
        rclpy.shutdown()


if __name__ == '__main__':
    main()

Write the launch and config files

from launch import LaunchDescription
from launch_ros.actions import Node
from ament_index_python.packages import get_package_share_directory
import os


def generate_launch_description():
    pkg_share = get_package_share_directory('midas_depth_ros')
    params    = os.path.join(pkg_share, 'config', 'params.yaml')

    return LaunchDescription([
        Node(
            package='midas_depth_ros',
            executable='midas_depth_node',
            name='midas_depth_node',
            output='screen',
            parameters=[params],
        )
    ])

Key parameters:

Parameter	Options	Notes
`use_qnn_delegate`	`true` / `false`	`true` runs on the HTP NPU; `false` falls back to CPU
`qnn_backend`	`htp`, `gpu`, `cpu`	`htp` targets the Hexagon NPU
`colormap`	`inferno`, `magma`, `viridis`, `plasma`, `jet`, `turbo`, `hot`, `bone`	Colormap applied to the published depth visualization

Build

colcon build --packages-select midas_depth_ros --symlink-install
source install/setup.bash

Run

Start the camera first if it is not already running:

ros2 run v4l2_camera v4l2_camera_node \
  --ros-args -p video_device:=/dev/video0 -p pixel_format:=YUYV \
  -p image_size:=[640,480] -p camera_frame_id:=camera_link

Then launch the inference node:

ros2 launch midas_depth_ros midas_depth.launch.py

On startup the node logs whether the NPU delegate loaded successfully:

✅ QNN delegate ACTIVE on 'htp' backend — inference runs on the NPU.

If the delegate fails to load it falls back to CPU and logs the reason:

❌ QNN delegate requested but NOT active — running on CPU. Reason: ...

Topics

Direction	Topic	Type	Notes
sub	`/image_raw`	`sensor_msgs/Image` bgr8	Camera input from `v4l2_camera`
pub	`/midas/depth_image`	`sensor_msgs/Image` bgr8	Colorized inverse-depth for RViz
pub	`/midas/depth`	`sensor_msgs/Image` 32FC1	Raw inverse-depth (higher = closer)

Visualizing in RViz

Add an Image display and set the topic to /midas/depth_image. The colorized output maps closer objects to brighter values with the default inferno colormap.

Next steps

Adapt this scaffolding to another model. Swap the MiDaS export in Step 2 for any TFLite model from Qualcomm AI Hub and adjust preprocessing in midas_tflite.py. The delegate loading, topic wiring, and launch/config files carry over unchanged.
Want to avoid the per-frame CPU copy between the camera and this node? See qrb_ros_transport for zero-copy DMA-buf passing.
Prefer the packaged version of this pipeline? sample_depth_estimation in qrb_ros_samples ships the same pipeline pre-wired — use it when you want to run depth estimation without building a node yourself.

​Where each stage runs

​How this differs from a stock ROS 2 TFLite node

​Topics

​Visualizing in RViz

​Next steps

Where each stage runs

How this differs from a stock ROS 2 TFLite node

Topics

Visualizing in RViz

Next steps