> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Depth Estimation on the NPU

> Build a ROS 2 depth estimation node that runs a quantized TFLite model on the Hexagon HTP NPU via the QNN delegate.

This page is a hand-rolled version of [`sample_depth_estimation`](https://github.com/qualcomm-qrb-ros/qrb_ros_samples/tree/main/ai_vision/sample_depth_estimation) from the [`qrb_ros_samples`](./qrb-ros-samples) catalog. Instead of installing the packaged sample, you build the ROS 2 node yourself — subscribing to a camera topic, running a quantized TFLite model on the Hexagon HTP NPU via the Qualcomm QNN delegate, and publishing a colorized depth image plus the raw inverse-depth map. The goal is to show end-to-end how the pieces fit so you can build your own nodes for any model.

<Info>
  **Why build your own instead of using the sample?** The sample catalog is a great starting point, but you'll eventually hit a model or pipeline shape it doesn't cover. This page walks through the same pattern the samples use — QNN delegate loading, preprocessing, inference, postprocess, publish — wired against stock `sensor_msgs` / `cv_bridge` and a model from [Qualcomm AI Hub](https://aihub.qualcomm.com). Once you've seen it once, you can swap MiDaS v2 for any AI Hub model and reuse the same scaffolding. Both this page and [`sample_depth_estimation`](https://github.com/qualcomm-qrb-ros/qrb_ros_samples/tree/main/ai_vision/sample_depth_estimation) target the same Hexagon HTP NPU — this one just exposes every wire.
</Info>

## Where each stage runs

```mermaid theme={null}
flowchart LR
    C["Camera<br/><i>ISP</i>"] --> V["v4l2_camera<br/><i>CPU</i>"]
    V -->|"/image_raw"| PRE["cv_bridge + resize<br/><i>CPU</i>"]
    PRE --> NPU["QNN TFLite<br/>delegate (HTP)<br/><b>NPU</b>"]
    NPU --> POST["colorize + cv_bridge<br/><i>CPU</i>"]
    POST -->|"/midas/depth_image"| OUT1["RViz"]
    POST -->|"/midas/depth"| OUT2["downstream nodes"]
    style NPU fill:#31017D,stroke:#31017D,color:#fff
    style C fill:#e6d9f5
```

| Stage                          | Where it runs         | Notes                                                                                                        |
| ------------------------------ | --------------------- | ------------------------------------------------------------------------------------------------------------ |
| Camera capture                 | **ISP**               | Camera hardware block.                                                                                       |
| Color convert (YUYV → BGR)     | **CPU**               | Done inside `v4l2_camera`. Can be offloaded to GPU via [IM SDK](/ai-workflows/im-sdk) GStreamer plugins.     |
| Preprocess (resize, normalize) | **CPU**               | `cv2.resize` + NumPy in `midas_tflite.py`.                                                                   |
| **Inference**                  | **NPU (Hexagon HTP)** | via the QNN TFLite delegate (`libQnnTFLiteDelegate.so`, backend `htp`). This is the Qualcomm differentiator. |
| Postprocess (colorize)         | **CPU**               | `cv2.applyColorMap`.                                                                                         |
| Publish                        | **CPU**               | `rclpy` + `cv_bridge`.                                                                                       |

## How this differs from a stock ROS 2 TFLite node

* **Stock TFLite runs on CPU (or OpenCL GPU at best).** The Hexagon HTP NPU is only reachable through the Qualcomm **QNN delegate** or the QNN SDK, which is what this pipeline loads.
* **Every node boundary is a memcpy.** `cv_bridge` + `v4l2_camera` allocate and copy the full frame on each hop. For a hardware‑to‑hardware pipeline (camera ISP → NPU) that copy is avoidable — see [`qrb_ros_transport`](./qrb-ros-transport) for DMA‑buf fd passing.
* **NVIDIA Isaac ROS / Intel OpenVINO packages target different silicon** (NVIDIA GPU / Intel VPU) and won't run on Qualcomm hardware.

**Prerequisites:** Complete the [Software Setup](./software-setup) and [TurtleBot3](./turtlebot3) workflows before starting.

<Note>
  Pre and post processing in this workflow — image decoding, resizing, color conversion, and visualization — runs on the CPU. For GPU‑accelerated pre and post processing, use the GStreamer plugins in [IM SDK](/ai-workflows/im-sdk).
</Note>

<Steps>
  <Step title="Install prerequisites">
    Install the Python TFLite runtime, verify the QNN TFLite delegate, and install the camera driver.

    **TFLite runtime**

    ```bash theme={null}
    pip install ai-edge-litert
    ```

    If `ai-edge-litert` is unavailable for your platform, fall back to `tflite-runtime`:

    ```bash theme={null}
    pip install tflite-runtime
    ```

    **QNN TFLite delegate** — confirm the shared library is present:

    ```bash theme={null}
    ls /usr/lib/libQnnTFLiteDelegate.so
    ```

    If the file is missing, install the Qualcomm AI SDK or QNN runtime package for your device before continuing. The library must be at `/usr/lib/libQnnTFLiteDelegate.so` (the default path the node loads from).

    **Camera driver**

    ```bash theme={null}
    sudo apt install -y ros-jazzy-v4l2-camera
    ```
  </Step>

  <Step title="Get the model">
    All models on Qualcomm AI Hub are compiled and validated for your specific target device before you download them.

    1. Go to [https://aihub.qualcomm.com/models/midas](https://aihub.qualcomm.com/models/midas).
    2. Select **Export** and choose your target device — **IQ‑8275 EVK** for IQ8, or **IQ‑9075 EVK** for IQ9.
    3. Select **TFLite** as the runtime and **INT8** quantization (`w8a8`).
    4. Download the exported `.tflite` file — it will be named `midas-midas-v2-w8a8.tflite`.

    <Warning>
      After downloading, confirm the filename ends in `.tflite`. If AI Hub only shows an ONNX export option for your device, TFLite is not available for that combination — see the [AI Workflows](/ai-workflows/lite-rt) section for how to run ONNX models on the NPU instead.
    </Warning>
  </Step>

  <Step title="Scaffold the package">
    `ros2 pkg create` generates all the boilerplate — `package.xml`, `setup.cfg`, `setup.py`, the ament resource marker, and `__init__.py`.

    <Note>
      Navigate to your workspace root before running these commands. All paths in the remaining steps are relative to your workspace root.
    </Note>

    <Note>
      The package name must be the first positional argument after `ros2 pkg create` — place it immediately after `create`, before any flags. Putting it after `--dependencies` causes the CLI to treat it as another dependency, and the command will fail with no package created.
    </Note>

    ```bash theme={null}
    cd src
    ros2 pkg create midas_depth_ros \
      --build-type ament_python \
      --dependencies rclpy sensor_msgs std_msgs cv_bridge
    ```

    Create the remaining directories and move the model into place:

    ```bash theme={null}
    mkdir -p midas_depth_ros/launch
    mkdir -p midas_depth_ros/config
    mkdir -p midas_depth_ros/models

    mv ~/Downloads/midas-midas-v2-w8a8.tflite midas_depth_ros/models/
    ```
  </Step>

  <Step title="Update package.xml">
    `ros2 pkg create` added the ROS dependencies already. Append these two Python system dependencies inside the `<package>` block:

    ```xml theme={null}
    <exec_depend>python3-numpy</exec_depend>
    <exec_depend>python3-opencv</exec_depend>
    ```

    After adding these dependencies, re‑run `rosdep install` to resolve them:

    ```bash theme={null}
    cd ..
    rosdep install --from-paths src --ignore-src -r -y
    ```
  </Step>

  <Step title="Replace setup.py">
    The generated `setup.py` needs updated `data_files` to install the launch file, config, and model.

    ```python theme={null}
    from setuptools import setup
    from glob import glob

    package_name = 'midas_depth_ros'

    setup(
        name=package_name,
        version='0.1.0',
        packages=[package_name],
        data_files=[
            ('share/ament_index/resource_index/packages', ['resource/' + package_name]),
            ('share/' + package_name, ['package.xml']),
            ('share/' + package_name + '/launch', glob('launch/*.py')),
            ('share/' + package_name + '/config', glob('config/*.yaml')),
            ('share/' + package_name + '/models', glob('models/*.tflite')),
        ],
        install_requires=['setuptools'],
        zip_safe=True,
        maintainer='maintainer',
        maintainer_email='you@example.com',
        description='MiDaS monocular depth estimation (TFLite) for ROS 2 on the Hexagon HTP NPU.',
        entry_points={
            'console_scripts': [
                'midas_depth_node = midas_depth_ros.midas_depth_node:main',
            ],
        },
    )
    ```
  </Step>

  <Step title="Write the source files">
    The package has two source files: a TFLite wrapper class that handles delegate loading, preprocessing, inference, and visualization; and a ROS 2 node that wires the camera subscription, runs inference, and publishes results.

    <AccordionGroup>
      <Accordion title="midas_depth_ros/midas_tflite.py" icon="file-code">
        TFLite wrapper that handles QNN delegate loading, input preprocessing, inference, and depth colorization:

        ```python theme={null}
        """MiDaS v2 (w8a8) TFLite runner for monocular depth estimation.

        Input  : [1, H, W, 3] uint8 or float32 (auto-detected from interpreter)
        Output : [1, H, W] or [1, H, W, 1] depth (inverse-depth, higher = closer)
        """
        import os
        import numpy as np
        import cv2

        try:
            from tflite_runtime.interpreter import Interpreter, load_delegate
        except ImportError:
            try:
                from tensorflow.lite.python.interpreter import Interpreter  # type: ignore
                try:
                    from tensorflow.lite.python.interpreter import load_delegate  # type: ignore
                except ImportError:
                    load_delegate = None
            except ImportError:
                from ai_edge_litert.interpreter import Interpreter  # type: ignore
                try:
                    from ai_edge_litert.interpreter import load_delegate  # type: ignore
                except ImportError:
                    load_delegate = None


        class MidasTFLite:
            def __init__(self, model_path,
                         use_qnn_delegate=False, qnn_delegate_path=None, qnn_backend='htp'):
                if not os.path.isfile(model_path):
                    raise FileNotFoundError(model_path)

                delegates = []
                self.delegate_active = False
                self.delegate_error = None
                if use_qnn_delegate:
                    if load_delegate is None:
                        self.delegate_error = 'load_delegate unavailable in this TFLite runtime'
                    elif not qnn_delegate_path or not os.path.isfile(qnn_delegate_path):
                        self.delegate_error = f'QNN delegate .so not found at {qnn_delegate_path}'
                    else:
                        try:
                            opts = {'backend_type': qnn_backend}
                            delegates = [load_delegate(qnn_delegate_path, options=opts)]
                            self.delegate_active = True
                        except Exception as e:
                            self.delegate_error = f'load_delegate failed: {e}'

                self.interp = Interpreter(model_path=model_path,
                                          experimental_delegates=delegates or None)
                self.interp.allocate_tensors()
                self.inp = self.interp.get_input_details()[0]
                self.out = self.interp.get_output_details()[0]

                shape = self.inp['shape']  # [1, H, W, 3]
                self.in_h = int(shape[1])
                self.in_w = int(shape[2])

            @staticmethod
            def _dequant(arr, detail):
                q = detail.get('quantization', (0.0, 0))
                scale, zp = q[0], q[1]
                if scale and scale > 0:
                    return (arr.astype(np.float32) - zp) * scale
                return arr.astype(np.float32)

            def infer(self, bgr):
                """Run inference on a BGR image; return a float32 HxW inverse-depth map
                in the ORIGINAL image resolution."""
                h, w = bgr.shape[:2]
                rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
                resized = cv2.resize(rgb, (self.in_w, self.in_h),
                                     interpolation=cv2.INTER_CUBIC)

                dtype = self.inp['dtype']
                if dtype == np.uint8:
                    x = resized.astype(np.uint8)
                else:
                    x = resized.astype(np.float32) / 255.0
                    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
                    std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)
                    x = (x - mean) / std

                x = np.expand_dims(x, 0).astype(dtype)

                self.interp.set_tensor(self.inp['index'], x)
                self.interp.invoke()
                raw = self.interp.get_tensor(self.out['index'])
                depth = self._dequant(raw, self.out)

                depth = np.squeeze(depth)  # -> (H, W)
                if depth.ndim != 2:
                    depth = depth.reshape(self.in_h, self.in_w)

                return cv2.resize(depth, (w, h), interpolation=cv2.INTER_CUBIC)

            @staticmethod
            def colorize(depth, colormap=cv2.COLORMAP_INFERNO):
                """Normalize to 0..255 and apply a colormap -> BGR uint8 for display."""
                d = depth.astype(np.float32)
                dmin, dmax = float(np.min(d)), float(np.max(d))
                if dmax - dmin < 1e-6:
                    norm = np.zeros_like(d, dtype=np.uint8)
                else:
                    norm = ((d - dmin) / (dmax - dmin) * 255.0).astype(np.uint8)
                return cv2.applyColorMap(norm, colormap)
        ```
      </Accordion>

      <Accordion title="midas_depth_ros/midas_depth_node.py" icon="file-code">
        ROS 2 node that wires the camera subscription, inference, and topic publishing together:

        ```python theme={null}
        """ROS 2 node: MiDaS monocular depth estimation from a mono RGB camera."""
        import os
        import time

        import numpy as np
        import rclpy
        from rclpy.node import Node
        from sensor_msgs.msg import Image
        from cv_bridge import CvBridge
        from ament_index_python.packages import get_package_share_directory

        from .midas_tflite import MidasTFLite


        class MidasDepthNode(Node):
            def __init__(self):
                super().__init__('midas_depth_node')
                self._declare_params()
                self.bridge = CvBridge()

                model_path = self.get_parameter('model_path').value
                if not model_path:
                    model_path = os.path.join(
                        get_package_share_directory('midas_depth_ros'),
                        'models', 'midas-midas-v2-w8a8.tflite')

                self.midas = MidasTFLite(
                    model_path=model_path,
                    use_qnn_delegate=bool(self.get_parameter('use_qnn_delegate').value),
                    qnn_delegate_path=str(self.get_parameter('qnn_delegate_path').value),
                    qnn_backend=str(self.get_parameter('qnn_backend').value),
                )
                self.get_logger().info(
                    f'MiDaS loaded: {model_path}  input={self.midas.in_w}x{self.midas.in_h}')
                want_npu = bool(self.get_parameter('use_qnn_delegate').value)
                if want_npu and self.midas.delegate_active:
                    self.get_logger().info(
                        f"✅ QNN delegate ACTIVE on '{self.get_parameter('qnn_backend').value}' "
                        f"backend — inference runs on the NPU.")
                elif want_npu:
                    self.get_logger().error(
                        f'❌ QNN delegate requested but NOT active — running on CPU. '
                        f'Reason: {self.midas.delegate_error}')
                else:
                    self.get_logger().warn('QNN delegate disabled — running on CPU.')

                qos = 10
                self.sub = self.create_subscription(
                    Image, self.get_parameter('image_topic').value, self.on_image, qos)

                self.pub_vis = self.create_publisher(
                    Image, self.get_parameter('depth_image_topic').value, qos)
                self.pub_raw = self.create_publisher(
                    Image, self.get_parameter('depth_raw_topic').value, qos)

                self._perf_interval = float(self.get_parameter('perf_log_interval_sec').value)
                self._perf_reset()

            def _declare_params(self):
                defaults = {
                    'image_topic':           '/image_raw',
                    'depth_image_topic':     '/midas/depth_image',
                    'depth_raw_topic':       '/midas/depth',
                    'model_path':            '',
                    'use_qnn_delegate':      False,
                    'qnn_delegate_path':     '/usr/lib/libQnnTFLiteDelegate.so',
                    'qnn_backend':           'htp',
                    'colormap':              'inferno',
                    'perf_log_interval_sec': 2.0,
                }
                for name, value in defaults.items():
                    self.declare_parameter(name, value)

            _COLORMAPS = {
                'inferno': 14, 'magma': 13, 'viridis': 16, 'plasma': 15,
                'jet': 2, 'turbo': 20, 'hot': 11, 'bone': 1,
            }

            def _perf_reset(self):
                self._perf_window_start = time.monotonic()
                self._perf_frames       = 0
                self._perf_total_ms     = 0.0
                self._perf_max_ms       = 0.0

            def _perf_maybe_log(self, frame_ms: float):
                self._perf_frames   += 1
                self._perf_total_ms += frame_ms
                if frame_ms > self._perf_max_ms:
                    self._perf_max_ms = frame_ms
                elapsed = time.monotonic() - self._perf_window_start
                if elapsed < self._perf_interval:
                    return
                fps    = self._perf_frames / elapsed if elapsed > 0 else 0.0
                avg_ms = self._perf_total_ms / self._perf_frames if self._perf_frames else 0.0
                self.get_logger().info(
                    f'[perf] {fps:5.2f} Hz  avg {avg_ms:6.2f} ms  max {self._perf_max_ms:6.2f} ms  '
                    f'(window {self._perf_frames} frames / {elapsed:.1f}s)')
                self._perf_reset()

            def on_image(self, msg: Image):
                t0 = time.monotonic()
                try:
                    bgr = self.bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')
                except Exception as e:
                    self.get_logger().error(f'cv_bridge failed: {e}')
                    return

                depth = self.midas.infer(bgr)

                cmap_name = str(self.get_parameter('colormap').value).lower()
                cmap      = self._COLORMAPS.get(cmap_name, 14)
                vis       = MidasTFLite.colorize(depth, colormap=cmap)

                vis_msg        = self.bridge.cv2_to_imgmsg(vis, encoding='bgr8')
                vis_msg.header = msg.header
                self.pub_vis.publish(vis_msg)

                raw_msg        = self.bridge.cv2_to_imgmsg(depth.astype(np.float32), encoding='32FC1')
                raw_msg.header = msg.header
                self.pub_raw.publish(raw_msg)

                self._perf_maybe_log((time.monotonic() - t0) * 1000.0)


        def main():
            rclpy.init()
            node = MidasDepthNode()
            try:
                rclpy.spin(node)
            finally:
                node.destroy_node()
                rclpy.shutdown()


        if __name__ == '__main__':
            main()
        ```
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Write the launch and config files">
    <CodeGroup>
      ```python launch/midas_depth.launch.py theme={null}
      from launch import LaunchDescription
      from launch_ros.actions import Node
      from ament_index_python.packages import get_package_share_directory
      import os


      def generate_launch_description():
          pkg_share = get_package_share_directory('midas_depth_ros')
          params    = os.path.join(pkg_share, 'config', 'params.yaml')

          return LaunchDescription([
              Node(
                  package='midas_depth_ros',
                  executable='midas_depth_node',
                  name='midas_depth_node',
                  output='screen',
                  parameters=[params],
              )
          ])
      ```

      ```yaml config/params.yaml theme={null}
      midas_depth_node:
        ros__parameters:
          image_topic: /image_raw
          depth_image_topic: /midas/depth_image
          depth_raw_topic: /midas/depth
          model_path: ''
          use_qnn_delegate: true
          qnn_delegate_path: /usr/lib/libQnnTFLiteDelegate.so
          qnn_backend: htp
          colormap: inferno
          perf_log_interval_sec: 2.0
      ```
    </CodeGroup>

    Key parameters:

    | Parameter          | Options                                                                | Notes                                                 |
    | ------------------ | ---------------------------------------------------------------------- | ----------------------------------------------------- |
    | `use_qnn_delegate` | `true` / `false`                                                       | `true` runs on the HTP NPU; `false` falls back to CPU |
    | `qnn_backend`      | `htp`, `gpu`, `cpu`                                                    | `htp` targets the Hexagon NPU                         |
    | `colormap`         | `inferno`, `magma`, `viridis`, `plasma`, `jet`, `turbo`, `hot`, `bone` | Colormap applied to the published depth visualization |
  </Step>

  <Step title="Build">
    ```bash theme={null}
    colcon build --packages-select midas_depth_ros --symlink-install
    source install/setup.bash
    ```
  </Step>

  <Step title="Run">
    Start the camera first if it is not already running:

    ```bash theme={null}
    ros2 run v4l2_camera v4l2_camera_node \
      --ros-args -p video_device:=/dev/video0 -p pixel_format:=YUYV \
      -p image_size:=[640,480] -p camera_frame_id:=camera_link
    ```

    Then launch the inference node:

    ```bash theme={null}
    ros2 launch midas_depth_ros midas_depth.launch.py
    ```

    On startup the node logs whether the NPU delegate loaded successfully:

    ```
    ✅ QNN delegate ACTIVE on 'htp' backend — inference runs on the NPU.
    ```

    If the delegate fails to load it falls back to CPU and logs the reason:

    ```
    ❌ QNN delegate requested but NOT active — running on CPU. Reason: ...
    ```
  </Step>
</Steps>

## Topics

| Direction | Topic                | Type                      | Notes                               |
| --------- | -------------------- | ------------------------- | ----------------------------------- |
| sub       | `/image_raw`         | `sensor_msgs/Image` bgr8  | Camera input from `v4l2_camera`     |
| pub       | `/midas/depth_image` | `sensor_msgs/Image` bgr8  | Colorized inverse-depth for RViz    |
| pub       | `/midas/depth`       | `sensor_msgs/Image` 32FC1 | Raw inverse-depth (higher = closer) |

## Visualizing in RViz

Add an **Image** display and set the topic to `/midas/depth_image`. The colorized output maps closer objects to brighter values with the default `inferno` colormap.

## Next steps

* **Adapt this scaffolding to another model.** Swap the MiDaS export in [Step 2](#get-the-model) for any TFLite model from [Qualcomm AI Hub](https://aihub.qualcomm.com) and adjust preprocessing in `midas_tflite.py`. The delegate loading, topic wiring, and launch/config files carry over unchanged.
* **Want to avoid the per-frame CPU copy between the camera and this node?** See [`qrb_ros_transport`](./qrb-ros-transport) for zero-copy DMA-buf passing.
* **Prefer the packaged version of this pipeline?** [`sample_depth_estimation`](https://github.com/qualcomm-qrb-ros/qrb_ros_samples/tree/main/ai_vision/sample_depth_estimation) in [`qrb_ros_samples`](./qrb-ros-samples) ships the same pipeline pre-wired — use it when you want to run depth estimation without building a node yourself.
