qtimlsnpe is only available in
For more information on QLI images refer to Qualcomm Linux release
qcom-multimedia-proprietary-image For more information on QLI images refer to Qualcomm Linux release
Overview
qtimlsnpe executes neural network models using Qualcomm’s Snapdragon Neural Processing Engine (SNPE). Models are packaged as DLC files and, once loaded, the runtime exposes the model’s input and output signature (tensor count, shapes, and element types). SNPE provides multiple execution targets—including CPU, NSP, GPU (Adreno), AIP —so the same model can be deployed across hardware with different performance, latency, and power characteristics. qtimlsnpe surfaces SNPE’s key runtime controls as simple tunables: a delegate selector (CPU/DSP/GPU) to choose the preferred target, performance profiles to trade power for throughput or latency, optional profiling levels for runtime diagnostics, and an execution priority hint. These settings do not change model accuracy by themselves; they help match runtime behavior to your deployment goals. Inputs and outputs flow as neural-network/tensors. The element derives exact tensor shapes and types from the DLC at runtime, ensuring downstream components receive tensors that reflect the model’s declared outputs. When needed by downstream algorithms, outputs can be requested as FLOAT32 even if the model is quantized, enabling dequantization without changing the model artifact.Key Responsibilities
qtimlsnpe is responsible for:- loading and executing an SNPE DLC model on CPU, DSP (Hexagon), GPU (Adreno), or AIP
- accepting preformatted input tensors from upstream elements
- producing output tensors that match the model output signature
- negotiating tensor data types and dimensions with adjacent pipeline elements
- propagating tensor metadata required by downstream elements
- managing buffers through SNPE user buffer mode to reduce unnecessary memory copies
- exposing runtime controls including delegate, performance-profile, profiling-level, and priority
- supporting explicit output filtering by named layers or tensors (order-preserving)
Example Pipeline
Download Required Files
| File | Download | Save as |
|---|---|---|
| YOLOX model | Qualcomm AI hub model | yolox_w8a8.dlc |
| YOLO labels | Yolov8 Labels | yolov8.json |
| Input video | Input Video” | Draw_1080p_180s_30FPS.mp4 |
Plugin Hierarchy
GObject -> GstObject -> GstElement -> GstBaseTransform -> GstMLSnpe Here are the tables converted into MDX-compatible Markdown tables (you can directly use them in.mdx files):
Pad Templates
sink
| Capabilities | |
|---|---|
neural-network/tensors | type: { INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, FLOAT16, FLOAT32 } |
Direction: sink
src
| Capabilities | |
|---|---|
neural-network/tensors | type: { INT8, UINT8, INT16, UINT16, INT32, UINT32, INT64, UINT64, FLOAT16, FLOAT32 } |
Direction: source
Element Properties
| Property | Description |
|---|---|
model | Path to the SNPE DLC model file. Type: String Default: NULL Flags: readable/writable, construct |
delegate | Delegate the graph execution to a runtime backend. Type: Enum Default: DEFAULT_PROP_DELEGATE Range: (0): none - CPU execution (fallback always available) (1): dsp - DSP execution (2): gpu - GPU execution (3): aip - AIP execution Flags: readable/writable |
performance-profile | Request a performance profile. Type: Enum Default: DEFAULT_PROP_PERF_PROFILE Range: (0): default - Default performance (1): balanced - Balanced performance and power (2): high-performance - Maximum performance (3): power-saver - Lower power usage (4): system-settings - System defined behavior (5): sustained-high-performance - Sustained performance mode (6): burst - Short bursts of high performance (7): low-power-saver - Aggressive power saving (8): high-power-saver - Moderate power saving (9): low-balanced - Lower balanced mode Flags: readable/writable |
profiling-level | Set profiling level for runtime statistics. Type: Enum Default: DEFAULT_PROP_PROFILING_LEVEL Range: (0): off - No profiling (1): basic - Minimal profiling (2): moderate - Medium level profiling (3): detailed - Full profiling Flags: readable/writable |
priority | Execution priority hint for SNPE runtime. Type: Enum Default: DEFAULT_PROP_EXEC_PRIORITY Range: (0): normal - Default priority (1): high - Higher priority execution (2): low - Lower priority execution Flags: readable/writable |
layers | List of output layer names. Type: Array of String Default: [] Note: Mutually exclusive with tensorsFlags: readable/writable |
tensors | List of output tensor names. Outputs follow defined order. Type: Array of String Default: [] Note: Mutually exclusive with layersFlags: readable/writable |
Layers vs tensors: Set only one of these. If your model exposes named output tensors, prefer tensors for precise ordering. If both are set sequentially, the last one written takes effect (the other is cleared).
Input and Output Behavior
Input Tensors
qtimlsnpe exposes a single sink pad, but it supports both single-input and batch-input models. For batch-input models, all required tensors are delivered through the same sink pad as a tensor set in a single gstbuffer. Input tensors must be fully prepared before they reach qtimlsnpe. Expected tensor layout, shape, data type, and batch size are determined by:- the SNPE DLC model input signature
- caps negotiation with upstream elements
- qtimlvconverter for scaling, color conversion, normalization, and quantization (if required).
Output Tensors
qtimlsnpe exposes a single source pad and produces output tensors that follow the model’s declared output signature. This single-pad design does not limit the element to a single output. Models with batch output tensors are fully supported, and all outputs are emitted together on the source pad. Supported output behavior includes:- single-tensor and batch-tensor outputs
- arbitrary tensor shapes and ranks, including batch and depth dimensions.
- both quantized and floating-point tensor types
- selective emission of output tensors using the layers or tensors property
- FLOAT32 dequantization: if the model’s native output type is not FLOAT32, output caps will include a type list [FLOAT32, native] to enable downstream negotiation for dequantization without changing the model artifact
Delegates
A SNPE delegate defines the execution hardware used to run a model. Backends allowqtimlsnpe to offload inference from the default CPU interpreter to an optimized hardware accelerator, such as NPU, GPU, AIP.
qtimlsnpe supports multiple backend options. The backend is selected through the backend property by specifying the path to the corresponding shared library.
DSP
Runs the model on the AI accelerator (NPU).- Use case: Preferred backend where available. Best performance and power efficiency for quantized models.
GPU
Runs supported operations through the snpe GPU backend.- Use case: Floating-point models and workloads that benefit from GPU parallelism.
AIP
Runs supported operations through the snpe AIP backend. Use case: Hybrid acceleration combining DSP + CPU + other HW blocks Best for complex models with mixed operator support where pure DSP may fall back frequently Useful when targeting maximum throughput with balanced power efficiency Recommended for production pipelines where model partitioning across accelerators is beneficialCPU
Runs the model on the default snpe CPU backend. Use case: Fallback backend when other accelerators (DSP/GPU/AIP) are not available or unsupported Ideal for debugging, validation, and functional correctness testing Useful for small models or low-throughput workloads Works with all model types (quantized + floating point) without operator support limitations Preferred when deterministic performance and ease of deployment matter more than efficiencyProfiling Level
Enables SNPE diagnostics collection. Available levels:off, basic, detailed, moderate
Runtime Memory Behavior and GAP Handling
qtimlsnpe operates within the memory model of the snpe runtime. The element uses DMA buffers via GstMLBufferPool to minimize memory copies and maintain zero-copy transport where possible.
SNPE Memory Model
SNPE uses runtime-managed memory to allocate:- input tensors
- intermediate activation tensors
- output tensors
GAP Buffer Handling
qtimlsnpe is GAP-aware and correctly handles input buffers marked with GST_BUFFER_FLAG_GAP.
When a GAP buffer is received, the element skips inference and forwards the buffer downstream. This preserves timing and synchronization while explicitly indicating that no valid inference input is available for that timestamp.
GAP buffers commonly appear in conditional AI pipelines, such as cascaded workflows where later inference stages run only when earlier stages produce valid regions of interest.
Use cases
Single-Stage AI Inference on Live Camera Stream (HTP):
Download Required Files
| File | Download | Save as |
|---|---|---|
| YOLOX model | Qualcomm AI hub model | yolox_w8a8.dlc |
| YOLO labels | Yolov8 Labels | yolov8.json |
| Input video | Input Video” | Draw_1080p_180s_30FPS.mp4 |
Single-Stage AI Inference on Live Camera Stream (GPU):
Download Required Files
| File | Download | Save as |
|---|---|---|
| Inception model | Qualcomm AI Hub model | inception_v3_float.dlc |
| MobileNet labels | Mobilenet Labels | mobilenet.json |
| Input video | Input Video | Animals_000_1080p_180s_30FPS.mp4 |

