Skip to main content
qtimlsnpe is only available in qcom-multimedia-proprietary-image
For more information on QLI images refer to Qualcomm Linux release

Overview

qtimlsnpe executes neural network models using Qualcomm’s Snapdragon Neural Processing Engine (SNPE). Models are packaged as DLC files and, once loaded, the runtime exposes the model’s input and output signature (tensor count, shapes, and element types). SNPE provides multiple execution targets—including CPU, NSP, GPU (Adreno), AIP —so the same model can be deployed across hardware with different performance, latency, and power characteristics. qtimlsnpe surfaces SNPE’s key runtime controls as simple tunables: a delegate selector (CPU/DSP/GPU) to choose the preferred target, performance profiles to trade power for throughput or latency, optional profiling levels for runtime diagnostics, and an execution priority hint. These settings do not change model accuracy by themselves; they help match runtime behavior to your deployment goals. Inputs and outputs flow as neural-network/tensors. The element derives exact tensor shapes and types from the DLC at runtime, ensuring downstream components receive tensors that reflect the model’s declared outputs. When needed by downstream algorithms, outputs can be requested as FLOAT32 even if the model is quantized, enabling dequantization without changing the model artifact.

Key Responsibilities

qtimlsnpe is responsible for:
  • loading and executing an SNPE DLC model on CPU, DSP (Hexagon), GPU (Adreno), or AIP
  • accepting preformatted input tensors from upstream elements
  • producing output tensors that match the model output signature
  • negotiating tensor data types and dimensions with adjacent pipeline elements
  • propagating tensor metadata required by downstream elements
  • managing buffers through SNPE user buffer mode to reduce unnecessary memory copies
  • exposing runtime controls including delegate, performance-profile, profiling-level, and priority
  • supporting explicit output filtering by named layers or tensors (order-preserving)
In practice, qtimlsnpe serves as the inference stage in the pipeline, while tensor preparation and result interpretation are handled externally.

Example Pipeline

1

Download Required Files

FileDownloadSave as
YOLOX modelQualcomm AI hub modelyolox_w8a8.dlc
YOLO labelsYolov8 Labelsyolov8.json
Input videoInput VideoDraw_1080p_180s_30FPS.mp4
2

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media}"
scp yolox_w8a8.dlc <user>@<device-ip>:$HOME/models/
scp yolov8.json <user>@<device-ip>:$HOME/labels/
scp Draw_1080p_180s_30FPS.mp4 <user>@<device-ip>:$HOME/media/
3

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>
4

Set environment variables

Run below command on your device
export MODEL_NAME=yolox_w8a8.dlc
export LABELS_NAME=yolov8.json
export SRC_VIDEO_NAME=Draw_1080p_180s_30FPS.mp4
5

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! queue ! tee name=split \
split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \
split. ! queue ! video/x-raw,format=NV12 ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp tensors="<boxes,scores,class_idx>" model=/$HOME/models/$MODEL_NAME ! queue ! qtimlpostprocess results=10 module=yolov8 labels=/$HOME/labels/$LABELS_NAME settings="{\"confidence\": 70.0}" ! video/x-raw,format=BGRA,width=640,height=640 ! queue ! mixer.

Plugin Hierarchy

GObject -> GstObject -> GstElement -> GstBaseTransform -> GstMLSnpe Here are the tables converted into MDX-compatible Markdown tables (you can directly use them in .mdx files):

Pad Templates

sink

Capabilities
neural-network/tensorstype: { INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, FLOAT16, FLOAT32 }
Availability: Always
Direction: sink

src

Capabilities
neural-network/tensorstype: { INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, FLOAT16, FLOAT32 }
Availability: Always
Direction: source

Element Properties

PropertyDescription
modelPath to the SNPE DLC model file.

Type: String
Default: NULL
Flags: readable/writable, construct
delegateDelegate the graph execution to a runtime backend.

Type: Enum
Default: DEFAULT_PROP_DELEGATE
Range:
    (0): none - CPU execution (fallback always available)
    (1): dsp - DSP execution
    (2): gpu - GPU execution
    (3): aip - AIP execution
Flags: readable/writable
performance-profileRequest a performance profile.

Type: Enum
Default: DEFAULT_PROP_PERF_PROFILE
Range:
    (0): default - Default performance
    (1): balanced - Balanced performance and power
    (2): high-performance - Maximum performance
    (3): power-saver - Lower power usage
    (4): system-settings - System defined behavior
    (5): sustained-high-performance - Sustained performance mode
    (6): burst - Short bursts of high performance
    (7): low-power-saver - Aggressive power saving
    (8): high-power-saver - Moderate power saving
    (9): low-balanced - Lower balanced mode
Flags: readable/writable
profiling-levelSet profiling level for runtime statistics.

Type: Enum
Default: DEFAULT_PROP_PROFILING_LEVEL
Range:
    (0): off - No profiling
    (1): basic - Minimal profiling
    (2): moderate - Medium level profiling
    (3): detailed - Full profiling
Flags: readable/writable
priorityExecution priority hint for SNPE runtime.

Type: Enum
Default: DEFAULT_PROP_EXEC_PRIORITY
Range:
    (0): normal - Default priority
    (1): high - Higher priority execution
    (2): low - Lower priority execution
Flags: readable/writable
layersList of output layer names.

Type: Array of String
Default: []
Note: Mutually exclusive with tensors
Flags: readable/writable
tensorsList of output tensor names. Outputs follow defined order.

Type: Array of String
Default: []
Note: Mutually exclusive with layers
Flags: readable/writable
Layers vs tensors: Set only one of these. If your model exposes named output tensors, prefer tensors for precise ordering. If both are set sequentially, the last one written takes effect (the other is cleared).

Input and Output Behavior

Input Tensors

qtimlsnpe exposes a single sink pad, but it supports both single-input and batch-input models. For batch-input models, all required tensors are delivered through the same sink pad as a tensor set in a single gstbuffer. Input tensors must be fully prepared before they reach qtimlsnpe. Expected tensor layout, shape, data type, and batch size are determined by:
  • the SNPE DLC model input signature
  • caps negotiation with upstream elements
Typical upstream elements include:
  • qtimlvconverter for scaling, color conversion, normalization, and quantization (if required).
qtimlsnpe does not modify, reshape, batch, or reinterpret incoming tensors. It maps input tensor blocks into SNPE user buffers and passes them to the SNPE runtime as received.

Output Tensors

qtimlsnpe exposes a single source pad and produces output tensors that follow the model’s declared output signature. This single-pad design does not limit the element to a single output. Models with batch output tensors are fully supported, and all outputs are emitted together on the source pad. Supported output behavior includes:
  • single-tensor and batch-tensor outputs
  • arbitrary tensor shapes and ranks, including batch and depth dimensions.
  • both quantized and floating-point tensor types
  • selective emission of output tensors using the layers or tensors property
  • FLOAT32 dequantization: if the model’s native output type is not FLOAT32, output caps will include a type list [FLOAT32, native] to enable downstream negotiation for dequantization without changing the model artifact
The generated output tensors are intended for downstream post-processing stages, which are responsible for decoding model-specific results such as classification outputs, detection results, segmentation masks, landmark data, and other structured inference outputs.

Delegates

A SNPE delegate defines the execution hardware used to run a model. Backends allow qtimlsnpe to offload inference from the default CPU interpreter to an optimized hardware accelerator, such as NPU, GPU, AIP. qtimlsnpe supports multiple backend options. The backend is selected through the backend property by specifying the path to the corresponding shared library.

DSP

Runs the model on the AI accelerator (NPU).
  • Use case: Preferred backend where available. Best performance and power efficiency for quantized models.

GPU

Runs supported operations through the snpe GPU backend.
  • Use case: Floating-point models and workloads that benefit from GPU parallelism.

AIP

Runs supported operations through the snpe AIP backend. Use case: Hybrid acceleration combining DSP + CPU + other HW blocks Best for complex models with mixed operator support where pure DSP may fall back frequently Useful when targeting maximum throughput with balanced power efficiency Recommended for production pipelines where model partitioning across accelerators is beneficial

CPU

Runs the model on the default snpe CPU backend. Use case: Fallback backend when other accelerators (DSP/GPU/AIP) are not available or unsupported Ideal for debugging, validation, and functional correctness testing Useful for small models or low-throughput workloads Works with all model types (quantized + floating point) without operator support limitations Preferred when deterministic performance and ease of deployment matter more than efficiency

Profiling Level

Enables SNPE diagnostics collection. Available levels: off, basic, detailed, moderate

Runtime Memory Behavior and GAP Handling

qtimlsnpe operates within the memory model of the snpe runtime. The element uses DMA buffers via GstMLBufferPool to minimize memory copies and maintain zero-copy transport where possible.

SNPE Memory Model

SNPE uses runtime-managed memory to allocate:
  • input tensors
  • intermediate activation tensors
  • output tensors
The element discovers input/output tensor metadata(count,shape,type) at model load time and configures buffer pools accordingly.

GAP Buffer Handling

qtimlsnpe is GAP-aware and correctly handles input buffers marked with GST_BUFFER_FLAG_GAP. When a GAP buffer is received, the element skips inference and forwards the buffer downstream. This preserves timing and synchronization while explicitly indicating that no valid inference input is available for that timestamp. GAP buffers commonly appear in conditional AI pipelines, such as cascaded workflows where later inference stages run only when earlier stages produce valid regions of interest.

Use cases

Single-Stage AI Inference on Live Camera Stream (HTP):

1

Download Required Files

FileDownloadSave as
YOLOX modelQualcomm AI hub modelyolox_w8a8.dlc
YOLO labelsYolov8 Labelsyolov8.json
Input videoInput VideoDraw_1080p_180s_30FPS.mp4
2

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media}"
scp yolox_w8a8.dlc <user>@<device-ip>:$HOME/models/
scp yolov8.json <user>@<device-ip>:$HOME/labels/
scp Draw_1080p_180s_30FPS.mp4 <user>@<device-ip>:$HOME/media/
3

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>
4

Set environment variables

Run below command on your device
export MODEL_NAME=yolox_w8a8.dlc
export LABELS_NAME=yolov8.json
export SRC_VIDEO_NAME=Draw_1080p_180s_30FPS.mp4
5

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! queue ! tee name=split \
split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \
split. ! queue ! video/x-raw,format=NV12 ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp tensors="<boxes,scores,class_idx>" model=/$HOME/models/$MODEL_NAME ! queue ! qtimlpostprocess results=10 module=yolov8 labels=/$HOME/labels/$LABELS_NAME settings="{\"confidence\": 70.0}" ! video/x-raw,format=BGRA,width=640,height=640 ! queue ! mixer.

Single-Stage AI Inference on Live Camera Stream (GPU):

1

Download Required Files

FileDownloadSave as
Inception modelQualcomm AI Hub modelinception_v3_float.dlc
MobileNet labelsMobilenet Labelsmobilenet.json
Input videoInput VideoAnimals_000_1080p_180s_30FPS.mp4
2

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media}"
scp inception_v3_float.dlc <user>@<device-ip>:$HOME/models/
scp mobilenet.json <user>@<device-ip>:$HOME/labels/
scp Animals_000_1080p_180s_30FPS.mp4 <user>@<device-ip>:$HOME/media/
3

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>
4

Set environment variables

Run below command on your device
export MODEL_NAME=inception_v3_float.dlc
export LABELS_NAME=mobilenet.json
export SRC_VIDEO_NAME=Animals_000_1080p_180s_30FPS.mp4
5

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! queue ! tee name=split \
split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \
split. ! queue ! video/x-raw,format=NV12 ! qtimlvconverter ! queue ! qtimlsnpe delegate=gpu tensors="<class_logits>" model=$HOME/models/$MODEL_NAME ! queue ! qtimlpostprocess results=1 module=mobilenet labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! video/x-raw,format=BGRA,width=640,height=640 ! queue ! mixer.