qtimlsnpe is only available in qcom-multimedia-proprietary-image
For more information on QLI images refer to Qualcomm Linux release

Overview

qtimlsnpe executes neural network models using Qualcomm’s Snapdragon Neural Processing Engine (SNPE). Models are packaged as DLC files and, once loaded, the runtime exposes the model’s input and output signature (tensor count, shapes, and element types). SNPE provides multiple execution targets—including CPU, NSP, GPU (Adreno), AIP —so the same model can be deployed across hardware with different performance, latency, and power characteristics. qtimlsnpe surfaces SNPE’s key runtime controls as simple tunables: a delegate selector (CPU/DSP/GPU) to choose the preferred target, performance profiles to trade power for throughput or latency, optional profiling levels for runtime diagnostics, and an execution priority hint. These settings do not change model accuracy by themselves; they help match runtime behavior to your deployment goals. Inputs and outputs flow as neural-network/tensors. The element derives exact tensor shapes and types from the DLC at runtime, ensuring downstream components receive tensors that reflect the model’s declared outputs. When needed by downstream algorithms, outputs can be requested as FLOAT32 even if the model is quantized, enabling dequantization without changing the model artifact.

Key Responsibilities

qtimlsnpe is responsible for:

loading and executing an SNPE DLC model on CPU, DSP (Hexagon), GPU (Adreno), or AIP
accepting preformatted input tensors from upstream elements
producing output tensors that match the model output signature
negotiating tensor data types and dimensions with adjacent pipeline elements
propagating tensor metadata required by downstream elements
managing buffers through SNPE user buffer mode to reduce unnecessary memory copies
exposing runtime controls including delegate, performance-profile, profiling-level, and priority
supporting explicit output filtering by named layers or tensors (order-preserving)

In practice, qtimlsnpe serves as the inference stage in the pipeline, while tensor preparation and result interpretation are handled externally.

Example Pipeline

Download Required Files

File	Download	Save as
YOLOX model	Qualcomm AI hub model	`yolox_w8a8.dlc`
YOLO labels	Yolov8 Labels	`yolov8.json`
Input video	Input Video”	`Draw_1080p_180s_30FPS.mp4`

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media}"
scp yolox_w8a8.dlc <user>@<device-ip>:$HOME/models/
scp yolov8.json <user>@<device-ip>:$HOME/labels/
scp Draw_1080p_180s_30FPS.mp4 <user>@<device-ip>:$HOME/media/

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>

Set environment variables

Run below command on your device

export MODEL_NAME=yolox_w8a8.dlc
export LABELS_NAME=yolov8.json
export SRC_VIDEO_NAME=Draw_1080p_180s_30FPS.mp4

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! queue ! tee name=split \
split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \
split. ! queue ! video/x-raw,format=NV12 ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp tensors="<boxes,scores,class_idx>" model=/$HOME/models/$MODEL_NAME ! queue ! qtimlpostprocess results=10 module=yolov8 labels=/$HOME/labels/$LABELS_NAME settings="{\"confidence\": 70.0}" ! video/x-raw,format=BGRA,width=640,height=640 ! queue ! mixer.

Plugin Hierarchy

GObject -> GstObject -> GstElement -> GstBaseTransform -> GstMLSnpe Here are the tables converted into MDX-compatible Markdown tables (you can directly use them in .mdx files):

Pad Templates

sink

Capabilities
`neural-network/tensors`	`type: { INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, FLOAT16, FLOAT32 }`

Availability: Always
Direction: sink

src

Capabilities
`neural-network/tensors`	`type: { INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, FLOAT16, FLOAT32 }`

Availability: Always
Direction: source

Element Properties

Property	Description
`model`	Path to the SNPE DLC model file. Type: String Default: NULL Flags: readable/writable, construct
`delegate`	Delegate the graph execution to a runtime backend. Type: Enum Default: DEFAULT_PROP_DELEGATE Range: (0): none - CPU execution (fallback always available) (1): dsp - DSP execution (2): gpu - GPU execution (3): aip - AIP execution Flags: readable/writable
`performance-profile`	Request a performance profile. Type: Enum Default: DEFAULT_PROP_PERF_PROFILE Range: (0): default - Default performance (1): balanced - Balanced performance and power (2): high-performance - Maximum performance (3): power-saver - Lower power usage (4): system-settings - System defined behavior (5): sustained-high-performance - Sustained performance mode (6): burst - Short bursts of high performance (7): low-power-saver - Aggressive power saving (8): high-power-saver - Moderate power saving (9): low-balanced - Lower balanced mode Flags: readable/writable
`profiling-level`	Set profiling level for runtime statistics. Type: Enum Default: DEFAULT_PROP_PROFILING_LEVEL Range: (0): off - No profiling (1): basic - Minimal profiling (2): moderate - Medium level profiling (3): detailed - Full profiling Flags: readable/writable
`priority`	Execution priority hint for SNPE runtime. Type: Enum Default: DEFAULT_PROP_EXEC_PRIORITY Range: (0): normal - Default priority (1): high - Higher priority execution (2): low - Lower priority execution Flags: readable/writable
`layers`	List of output layer names. Type: Array of String Default: [] Note: Mutually exclusive with `tensors` Flags: readable/writable
`tensors`	List of output tensor names. Outputs follow defined order. Type: Array of String Default: [] Note: Mutually exclusive with `layers` Flags: readable/writable

Layers vs tensors: Set only one of these. If your model exposes named output tensors, prefer tensors for precise ordering. If both are set sequentially, the last one written takes effect (the other is cleared).

Input and Output Behavior

Input Tensors

qtimlsnpe exposes a single sink pad, but it supports both single-input and batch-input models. For batch-input models, all required tensors are delivered through the same sink pad as a tensor set in a single gstbuffer. Input tensors must be fully prepared before they reach qtimlsnpe. Expected tensor layout, shape, data type, and batch size are determined by:

the SNPE DLC model input signature
caps negotiation with upstream elements

Typical upstream elements include:

qtimlvconverter for scaling, color conversion, normalization, and quantization (if required).

qtimlsnpe does not modify, reshape, batch, or reinterpret incoming tensors. It maps input tensor blocks into SNPE user buffers and passes them to the SNPE runtime as received.

Output Tensors

qtimlsnpe exposes a single source pad and produces output tensors that follow the model’s declared output signature. This single-pad design does not limit the element to a single output. Models with batch output tensors are fully supported, and all outputs are emitted together on the source pad. Supported output behavior includes:

single-tensor and batch-tensor outputs
arbitrary tensor shapes and ranks, including batch and depth dimensions.
both quantized and floating-point tensor types
selective emission of output tensors using the layers or tensors property
FLOAT32 dequantization: if the model’s native output type is not FLOAT32, output caps will include a type list [FLOAT32, native] to enable downstream negotiation for dequantization without changing the model artifact

The generated output tensors are intended for downstream post-processing stages, which are responsible for decoding model-specific results such as classification outputs, detection results, segmentation masks, landmark data, and other structured inference outputs.

Delegates

A SNPE delegate defines the execution hardware used to run a model. Backends allow qtimlsnpe to offload inference from the default CPU interpreter to an optimized hardware accelerator, such as NPU, GPU, AIP. qtimlsnpe supports multiple backend options. The backend is selected through the backend property by specifying the path to the corresponding shared library.

DSP

Runs the model on the AI accelerator (NPU).

Use case: Preferred backend where available. Best performance and power efficiency for quantized models.

GPU

Runs supported operations through the snpe GPU backend.

Use case: Floating-point models and workloads that benefit from GPU parallelism.

AIP

Runs supported operations through the snpe AIP backend. Use case: Hybrid acceleration combining DSP + CPU + other HW blocks Best for complex models with mixed operator support where pure DSP may fall back frequently Useful when targeting maximum throughput with balanced power efficiency Recommended for production pipelines where model partitioning across accelerators is beneficial

CPU

Runs the model on the default snpe CPU backend. Use case: Fallback backend when other accelerators (DSP/GPU/AIP) are not available or unsupported Ideal for debugging, validation, and functional correctness testing Useful for small models or low-throughput workloads Works with all model types (quantized + floating point) without operator support limitations Preferred when deterministic performance and ease of deployment matter more than efficiency

Profiling Level

Enables SNPE diagnostics collection. Available levels: off, basic, detailed, moderate

Runtime Memory Behavior and GAP Handling

qtimlsnpe operates within the memory model of the snpe runtime. The element uses DMA buffers via GstMLBufferPool to minimize memory copies and maintain zero-copy transport where possible.

SNPE Memory Model

SNPE uses runtime-managed memory to allocate:

input tensors
intermediate activation tensors
output tensors

The element discovers input/output tensor metadata(count,shape,type) at model load time and configures buffer pools accordingly.

GAP Buffer Handling

qtimlsnpe is GAP-aware and correctly handles input buffers marked with GST_BUFFER_FLAG_GAP. When a GAP buffer is received, the element skips inference and forwards the buffer downstream. This preserves timing and synchronization while explicitly indicating that no valid inference input is available for that timestamp. GAP buffers commonly appear in conditional AI pipelines, such as cascaded workflows where later inference stages run only when earlier stages produce valid regions of interest.

Use cases

Single-Stage AI Inference on Live Camera Stream (HTP):

Download Required Files

File	Download	Save as
YOLOX model	Qualcomm AI hub model	`yolox_w8a8.dlc`
YOLO labels	Yolov8 Labels	`yolov8.json`
Input video	Input Video”	`Draw_1080p_180s_30FPS.mp4`

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media}"
scp yolox_w8a8.dlc <user>@<device-ip>:$HOME/models/
scp yolov8.json <user>@<device-ip>:$HOME/labels/
scp Draw_1080p_180s_30FPS.mp4 <user>@<device-ip>:$HOME/media/

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>

Set environment variables

Run below command on your device

export MODEL_NAME=yolox_w8a8.dlc
export LABELS_NAME=yolov8.json
export SRC_VIDEO_NAME=Draw_1080p_180s_30FPS.mp4

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! queue ! tee name=split \
split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \
split. ! queue ! video/x-raw,format=NV12 ! qtimlvconverter ! queue ! qtimlsnpe delegate=dsp tensors="<boxes,scores,class_idx>" model=/$HOME/models/$MODEL_NAME ! queue ! qtimlpostprocess results=10 module=yolov8 labels=/$HOME/labels/$LABELS_NAME settings="{\"confidence\": 70.0}" ! video/x-raw,format=BGRA,width=640,height=640 ! queue ! mixer.

Single-Stage AI Inference on Live Camera Stream (GPU):

Download Required Files

File	Download	Save as
Inception model	Qualcomm AI Hub model	`inception_v3_float.dlc`
MobileNet labels	Mobilenet Labels	`mobilenet.json`
Input video	Input Video	`Animals_000_1080p_180s_30FPS.mp4`

Copy files to device

# Replace $HOME to the appropriate device path before running the commands.
# For QLI:    /root
# For Ubuntu: /home/ubuntu
# Modify this based on your platform and ensure files are copied to the correct location on the device.
# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip> "mkdir -p $HOME/{models,labels,media}"
scp inception_v3_float.dlc <user>@<device-ip>:$HOME/models/
scp mobilenet.json <user>@<device-ip>:$HOME/labels/
scp Animals_000_1080p_180s_30FPS.mp4 <user>@<device-ip>:$HOME/media/

Connect to device

# Run from your host machine — replace <user> and <device-ip>
ssh <user>@<device-ip>

Set environment variables

Run below command on your device

export MODEL_NAME=inception_v3_float.dlc
export LABELS_NAME=mobilenet.json
export SRC_VIDEO_NAME=Animals_000_1080p_180s_30FPS.mp4

Run the pipeline

gst-launch-1.0 -e --gst-debug=2 \
filesrc location=$HOME/media/$SRC_VIDEO_NAME ! qtdemux ! queue ! h264parse ! v4l2h264dec capture-io-mode=4 output-io-mode=4 ! queue ! tee name=split \
split. ! queue ! qtivcomposer name=mixer sink_1::dimensions="<1920,1080>" ! queue ! waylandsink fullscreen=true sync=false \
split. ! queue ! video/x-raw,format=NV12 ! qtimlvconverter ! queue ! qtimlsnpe delegate=gpu tensors="<class_logits>" model=$HOME/models/$MODEL_NAME ! queue ! qtimlpostprocess results=1 module=mobilenet labels=$HOME/labels/$LABELS_NAME settings="{\"confidence\": 51.0}" ! video/x-raw,format=BGRA,width=640,height=640 ! queue ! mixer.

​Overview

​Key Responsibilities

​Example Pipeline

​Plugin Hierarchy

​Pad Templates

​sink

​src

​Element Properties

​Input and Output Behavior

​Input Tensors

​Output Tensors

​Delegates

​DSP

​GPU

​AIP

​CPU

​Profiling Level

​Runtime Memory Behavior and GAP Handling

​SNPE Memory Model

​GAP Buffer Handling

​Use cases

​Single-Stage AI Inference on Live Camera Stream (HTP):

​Single-Stage AI Inference on Live Camera Stream (GPU):

Overview

Key Responsibilities

Example Pipeline

Plugin Hierarchy

Pad Templates

sink

src

Element Properties

Input and Output Behavior

Input Tensors

Output Tensors

Delegates

DSP

GPU

AIP

CPU

Profiling Level

Runtime Memory Behavior and GAP Handling

SNPE Memory Model

GAP Buffer Handling

Use cases

Single-Stage AI Inference on Live Camera Stream (HTP):

Single-Stage AI Inference on Live Camera Stream (GPU):