Skip to main content
This section covers QIM SDK AI sample applications that demonstrate vision, audio, and multi-model inference on Qualcomm platforms using LiteRT models accelerated on Qualcomm AI hardware. The following tables list all available GStreamer C/C++ AI applications and their platform support. Select the appropriate configuration tab for your setup.
ApplicationSource codeDescriptionInput sourcesQCS6490IQ-8275IQ-9075IQ-615
Image classificationgst-ai-classificationClassification on streams from a file source or RTSP.Camera, file, RTSP, USB×
Object detectiongst-ai-object-detectionObject detection on streams from a file source or RTSP.Camera, file, RTSP, USB×
Pose detectiongst-ai-pose-detectionPose detection on streams from a file source or RTSP.File, RTSP, USB×
Image segmentationgst-ai-segmentationImage segmentation on streams from a file source or RTSP.File, RTSP×
Daisy chain detection + classificationgst-ai-daisychain-detection-classificationCascaded object detection and classification.File, RTSP, USB×
Daisy chain detection + posegst-ai-daisychain-detection-poseCascaded object detection and pose detection.File, RTSP, USB×
Monodepthgst-ai-monodepthMonocular depth estimation from file or RTSP.File, RTSP×
Face detectiongst-ai-face-detectionFace detection from file or RTSP.File, RTSP×
Audio classificationgst-ai-audio-classificationAudio event classification from microphone or file.Audio, file×
Metadata parsinggst-ai-metadata-parser-exampleParse ML metadata and count people from file or RTSP.File, RTSP×
AI USB cameragst-ai-usb-camera-appUSB camera streaming with optional object detection.USB×
AI event encodergst-ai-event-encoderEncode video only when a person is detected.File, RTSP×

Prerequisites

Some of the steps in the pre-requisties will be removed from future releases once the necessary fixes are mainlined.
1

Set up Wi-Fi

Connect to the Wireless Access Point (Wi-Fi Router):
nmcli dev wifi connect <WiFi-SSID> password <WiFi-password>
Check the connection and device status:
nmcli -p device
Login to the target deviceLocate the IP address of the device according to the type of network connection, using the UART console on the Linux host:For Ethernet:
ip address show eth2
For Wi-Fi:
ip address show wlp1s0
Use the IP address from the Linux host to establish an SSH connection to the device:
ssh root@<ip-address>
Example:
ssh root@192.168.0.222
Connect to the SSH shell using the following password:
oelinux123
2

Download Models and Artifacts

On the target device, obtain the download_artifacts.sh script, set executable permissions, and run it to download the model, media, and label files:
cd /tmp/
curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/download_artifacts.sh
chmod +x download_artifacts.sh
./download_artifacts.sh
3

Enable qticamsrc

In the terminal of the target device, run the following command to enable the qticamsrc on Config #2:
echo -n "camx" > /var/data
efivar -n 882f8c2b-9646-435f-8de5-f208ff80c1bd-VendorDtbOverlays -w -f /var/data
efivar -n 882f8c2b-9646-435f-8de5-f208ff80c1bd-VendorDtbOverlays -p
sync
reboot
4

Enable libcamera

For Dragonwing RB3 Gen 2 Development Kit, enable the libcamera using the following instructions:
The libcamera plugin supports only the IMX577 camera sensor. Connect the IMX577 sensor before enabling libcamera.
In the terminal of the target device, enable the bootloader mode using the following command:
reboot bootloader
Once the device enters bootloader mode, flash the Vision Kit CDT file from the extracted folder:
fastboot flash cdt cdt_vision_kit.bin
You can obtain the cdt_vision_kit.bin file from Qualcomm multimedia proprietary image at images/rb3gen2-core-kit/qcom-multimedia-proprietary-image-rb3gen2-core-kit:
Reboot the device:
fastboot reboot
5

Enable Audio and GPU Delegate

In the terminal of the target device, run the following command to enable audio:
systemctl stop pipewire wireplumber pipewire.socket pipewire-manager.socket
chmod 777 /dev/dma_heap/system
adsprpcd audiopd &
systemctl start pipewire wireplumber
wpctl status
To set the default devices for sink and source, get the device numbers from wpctl status and run the following command:
wpctl set-default <device ID>
In the terminal of the target device, run the following command to enable the GPU delegate and backend:
ln -sf /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so
export OCL_ICD_FILENAMES=/usr/lib/libOpenCL_adreno.so.1

AI Vision Applications

Object Detection

The gst-ai-object-detection application allows you to detect objects within images and videos. The use cases show the execution of YOLOv5, YOLOv8 and YOLOX on Qualcomm AI HW accelerator. The following figure shows the pipeline, which receives the input from a live camera feed, file, USB source, or an RTSP stream, preprocesses it, runs inferences on AI hardware. The results are either displayed on the screen, saved as an encoded MP4 file, or streamed over the RTSP server. For information about the plugins used in the pipeline flow, see Pipeline flow Pipeline Diagram Application: gst-ai-object-detection
When the software image includes the qticamsrc plugin, the camera framework uses it by default. If absent, the framework switches to libcamera instead. Since Config #1 lacks support for qticamsrc, the system defaults to libcamera.

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraIMX577 cameraFile outputDisplayRTSP output
Config #1YesYesYesNoYesYesYesYes
Config #2YesYesYesYesYesYesYesYes

Sample Model and Label Files

RuntimeModel fileLabel file
Qualcomm Neural Processing SDKyolonas.dlcyolonas.json
LiteRTyolov8_det_quantized.tflite / yolox_quantized.tfliteyolox.json
Qualcomm AI Engine Directyolov8_det_quantized.binyolov8.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-object-detection --config-file=/etc/configs/config_detection.json
The sample application uses the /etc/configs/config_detection.json file to read the input parameters.To display all available options:
gst-ai-object-detection -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-object-detection application uses the /etc/configs/config_detection.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<path-to-input-video>",
  "ml-framework": "<snpe or tflite or qnn framework>",
  "yolo-model-type": "<yolov8 or yolonas or yolov5 or yolox>"
  "model": "<path-to-model-file>",
  "labels": "<path-to-label-file>",
  "threshold": <post processsing threshold, integer value from 1 to 100>,
  "runtime": "<dsp, gpu, cpu runtime>",
  "output-type": "waylandsink or filesink or rtspsink"
  "snpe-tensors": "<model output tensor name>"
}
For USB camera input, set the video-format, resolution, and framerate parameters in the config file to match the camera capabilities, see Configure USB camera.
The snpe-tensors field applies only to the SNPE runtime. To retrieve the output tensor names for a DLC model, open the model in Netron.
When using DLC models from the AI Hub, the snpe-tensors field is optional.
Camera source, LiteRT model, DSP runtime
{
  "camera": 0,
  "ml-framework": "tflite",
  "yolo-model-type": "yolox",
  "model": "/etc/models/yolox_quantized.tflite",
  "labels": "/etc/labels/yolox.json",
  "threshold": 40,
  "runtime": "dsp",
  "output-type": "waylandsink",
  "snpe-tensors": "<Model output tensor name>"
}

Expected Output

Detected objects with bounding boxes and labels are overlaid on the video and displayed on the local display.Output Diagram

Pipeline Flow

The following table lists the plugins used in the object detection pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc, followed by qtdemux, which demultiplexes the stream.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
v4l2src• Captures the live stream from USB camera.
• Uses tee to split the stream for inferencing.
h264parseParses the H.264 video bitstream.
v4l2h264decHardware-decodes H.264 video to raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimlsnpe
qtimltflite
qtimlqnn
1. After the inference runtime receives the tensor stream on its sink pad, it runs inference using the provided model.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocessHandles inference results from any object detection model.
1. Applies a threshold to the chosen number of results.
2. Loads the YOLO (YOLOv5, YOLOv8, or YOLO-NAS) module.
3. Produces video frames with only bounding boxes that can be overlaid on objects.
4. Sends these processed frames to the sink pad of qtivcomposer.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.
filesinkReceives the video stream on sink pad and saves it as an H.264-encoded MP4 file.
qtirtspbin1. Serves as a network sink.
2. Transmits UDP packets to the network.

Config JSON Field Description

FieldValues / Description
ml-frameworkSupported ML frameworks:
snpe (Qualcomm Neural Processing SDK)
tflite (LiteRT)
qnn (Qualcomm AI Engine Direct)
yolo-model-typeSupported YOLO architectures:
yolov8
yolonas
yolov5
yolox
runtimeHardware runtimes:
cpu
gpu
dsp
Input sourceSupported input sources:
camera (0=primary, 1=secondary)
file-path
rtsp-ip-port
usb-camera (set enable-usb-camera to TRUE)
output-ip-addressOutput RTSP server IP address
portOutput RTSP server port
output-typeSupported output sinks:
waylandsink (display)
filesink (MP4 file)
rtspsink (RTSP stream)
snpe-tensors["output-tensor-name", "output-tensor-name"]
USB camera video-format and resolution1. Use one of the following video-format options:
   • waylandsink (display)
   • filesink (MP4 file)
   • rtspsink (RTSP stream)
2. Use the following resolution fields:
   • width
   • height
   • framerate
output-fileOutput filename. The default output file is output_object_detection.mp4.

Known issues

Green tint is observed on the display with libcamera.
Green tint is observed on the display with libcamera.

Image Classification

The gst-ai-classification application is designed to identify the subject in an image. The use cases are implemented using the Qualcomm Neural Processing SDK, LiteRT, or Qualcomm AI Engine Direct models. The pipeline receives a video stream from a camera, file source, USB source, or RTSP, preprocesses it, and runs the inference on AI hardware. The results are either displayed on the screen, saved as an encoded MP4 file, or streamed over the RTSP server. For information about the plugins used in the pipeline flow, see Pipeline flow Pipeline Diagram Application: gst-ai-classification
When the software image includes the qticamsrc plugin, the camera framework uses it by default. If absent, the framework switches to libcamera instead. Since Config #1 lacks support for qticamsrc, the system defaults to libcamera.

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraIMX577 cameraFile outputDisplayRTSP output
Config #1YesYesYesNoYesYesYesYes
Config #2YesYesYesYesYesYesYesYes

Sample Model and Label Files

RuntimeModel fileLabel file
Qualcomm Neural Processing SDKinceptionv3.dlcclassification.json
LiteRTinception_v3_quantized.tfliteclassification.json
Qualcomm AI Engine Directinception_v3_quantized.binclassification.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-classification --config-file=/etc/configs/config_classification.json
The sample application uses the /etc/configs/config_classification.json file to read the input parameters.To display all available options:
gst-ai-classification -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-classification application uses the /etc/configs/config_classification.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<path-to-input-video>",
  "ml-framework": "<snpe or tflite or qnn framework>",
  "model": "<path-to-model-file>",
  "labels": "<path-to-label-file>",
  "threshold": <post processsing threshold, integer value from 1 to 100>,
  "runtime": "<dsp, gpu, cpu runtime>",
  "output-type": "waylandsink or filesink or rtspsink"
}
For USB camera input, set the video-format, resolution, and framerate parameters in the config file to match the camera capabilities, see Configure USB camera.
Camera source, LiteRT model, DSP runtime
{
  "camera": 0,
  "ml-framework": "tflite",
  "model": "/etc/models/inception_v3_quantized.tflite",
  "labels": "/etc/labels/classification.json",
  "threshold": 40,
  "runtime": "dsp",
  "output-type": "waylandsink"
}

Expected Output

The classified object label and confidence score are overlaid on the video and displayed on the local displayExpected output for gst-ai-classification application

Pipeline Flow

The following table lists the plugins used in the classification pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc, followed by qtdemux, which demultiplexes the stream.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
v4l2src• Captures the live stream from USB camera.
• Uses tee to split the stream for inferencing.
h264parseParses the H.264 video bitstream.
v4l2h264decHardware-decodes H.264 video to raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream on its source pad.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimlsnpe
qtimltflite
qtimlqnn
1. After the inference runtime receives the tensor stream on its sink pad, it runs inference using the provided model.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocessHandles inference results from any classification model.
1. Applies a threshold to the chosen number of results.
2. Loads the MobileNet-softmax postprocessing module.
3. Produces results as video frames with classification labels.
4. Sends these processed frames to the sink pad of qtivcomposer.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.
filesinkReceives the video stream on sink pad and saves it as an H.264-encoded MP4 file.
qtirtspbin1. Serves as a network sink.
2. Transmits UDP packets to the network.

Config JSON Field Description

FieldValues / Description
ml-frameworkSupported ML frameworks:
   • snpe (Qualcomm Neural Processing SDK)
   • tflite (LiteRT)
   • qnn (Qualcomm AI Engine Direct)
runtimeHardware runtimes:
   • cpu
   • gpu
   • dsp
Input sourceSupported input sources:
   • camera (0=primary, 1=secondary)
   • file-path
   • rtsp-ip-port
   • usb-camera (set enable-usb-camera to TRUE)
output-ip-addressOutput RTSP server IP address.
portOutput RTSP server port.
output-typeSupported output sinks:
   • waylandsink(display)
   • filesink (MP4 file)
   • rtspsink (RTSP stream)
USB camera video-format and resolution1. Use one of the following video-format options:
   • waylandsink (display)
   • filesink (MP4 file)
   • rtspsink (RTSP stream)
2. Use the following resolution fields:
   • width
   • height
   • framerate
output-fileOutput filename. The default output file is output_classification.mp4.

Known Issues

Green tint is observed on the display with libcamera.
Green tint is observed on the display with libcamera.

Face Detection

The gst-ai-face-detection application collects the live video input from a camera, file, or an RTSP stream and uses the Qualcomm AI Engine direct and LiteRT face detection models to produce a preview with the overlaid AI model output on the HDMI display. The following figure shows the pipeline, which receives the input, preprocesses it, runs inferences on AI hardware, and displays the results on the screen. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Application: gst-ai-face-detection

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesNoNoNoYesNo
Config #2YesYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTface_det_lite_quantized.tfliteface_detection.json
Qualcomm AI Engine Directface_det_lite_quantized.binface_detection.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-face-detection --config-file=/etc/configs/config_face_detection.json
The sample application uses the /etc/configs/config_face_detection.json file to read the input parameters.To display all available options:
gst-ai-face-detection -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-face-detection application uses the /etc/configs/config_face_detection.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<input-video-path>",
  "ml-framework": "<tflite or qnn framework>",
  "model": "<path-to-model-file",
  "labels": "<path-to-label-file",
  "threshold": <post-processing threshold, integer value from 1-100>,
  "runtime": ""<cpu, gpu or dsp runtime>"
}
File source, LiteRT model, DSP runtime
{
  "file-path": "/etc/media/video.mp4",
  "ml-framework": "tflite",
  "model":"/etc/models/face_det_lite_quantized.tflite",
  "labels": "/etc/labels/face_detection.json",
  "threshold": 51,
  "runtime": "dsp"
}

Pipeline Flow

The following table lists the plugins used in the face detection pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
h264parse• Parses the H.264 video bitstream.
v4l2h264dec• Hardware-decodes H.264 video to raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite
qtimlqnn
1. After the inference runtime receives the tensor stream on its sink pad, it runs inference using the provided model.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess1. Handles inference results from any face detection model.
2. Applies a threshold to the chosen number of results.
qtimetamux• Receives string-based postprocessing output text with video frame and multiplexes it.
qtivoverlay1. Receives the multiplexed stream.
2. Overlays the bounding boxes on the stream.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
ml-frameworkSupported ML frameworks:
   • tflite (LiteRT)
   • qnn (Qualcomm AI Engine Direct)
runtimeSupported hardware runtimes:
   • cpu
   • gpu
   • dsp
Input sourceSupported input sources:
   • file-path
   • rtsp-ip-port
   • camera

Known issues

Detection accuracy may decrease when human faces are far from the camera.
Detection accuracy may decrease when human faces are far from the camera.

Semantic Segmentation

The gst-ai-segmentation application allows you to divide an image into different and meaningful parts or segments and assign a label to each homogeneous segment based on the similarity of the attributes. The application uses Qualcomm Neural Processing SDK runtime, Qualcomm AI Engine direct runtime, and LiteRT for image segmentation. The following figure shows the pipeline, which receives the input from a live camera feed, file, or an RTSP stream, preprocesses the video data, runs inferences using AI hardware, and displays the segmented data on the screen. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Application: gst-ai-segmentation

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesNoNoNoYesNo
Config #2YesYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel fileLabel file
Qualcomm Neural Processing SDKdeeplabv3_plus_mobilenet.dlcdeeplabv3_resnet50.json
LiteRTdeeplabv3_plus_mobilenet_quantized.tflitedeeplabv3_resnet50.json
Qualcomm AI Engine Directdeeplabv3_plus_mobilenet_quantized.bindeeplabv3_resnet50.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-segmentation --config-file=/etc/configs/config_segmentation.json
The sample application uses the /etc/configs/config_segmentation.json file to read the input parameters.To display all available options:
gst-ai-segmentation -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-segmentation application uses the /etc/configs/config_segmentation.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<input-video-path>",
  "ml-framework": "<snpe, tflite, or qnn framework>",
  "model": "<path-to-model-file>",
  "labels": "<path-to-label-file>",
  "runtime": "<dsp, gpu, or cpu runtime>"
}
File source, LiteRT model, DSP runtime
{
 "file-path": "/etc/media/video.mp4",
 "ml-framework": "tflite",
 "model": "/etc/models/deeplabv3_plus_mobilenet_quantized.tflite",
 "labels": "/etc/labels/deeplabv3_resnet50.json",
 "runtime": "dsp"
}

Expected Output

The segmented data is displayed on the local display.Expected output for gst-ai-classification application

Pipeline Flow

The following table lists the plugins used in the segmentation pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for concurrent display and ML inference.
filesrc• Captures the video stream using filesrc, followed by qtdemux, which demultiplexes the stream.
• Uses tee to split the stream for processing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for processing.
h264parse• Parses the H.264 video bitstream to ensure downstream elements can handle the payload.
v4l2h264dec• Hardware-accelerated decoder that converts H.264 video into raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data. This preprocessing is done when the model expects floating-point values as input:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream on its source pad.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimlsnpe
qtimltflite
qtimlqnn
1. After the inference runtime receives the tensor stream on its sink pad, it runs inference using the provided model.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess• Converts the inference tensors received on its sink pad into video formats that multimedia plugins use for further processing.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
ml-frameworkSupported ML frameworks:
   • snpe (Qualcomm Neural Processing SDK)
   • tflite (LiteRT)
   • qnn (Qualcomm AI Engine Direct)
runtimeSupported hardware runtimes:
   • cpu
   • gpu
   • dsp
Input sourceSupported input sources:
   • file-path
   • rtsp-ip-port
   • camera

Pose Detection

The gst-ai-pose-detection application allows you to detect the body pose of the subject in an image or video. The use case processes input streams from a camera, file, or an RTSP source and uses LiteRT and Qualcomm AI Engine direct models for pose detection. The results are either displayed on the screen, saved as an encoded MP4 file, or streamed over the RTSP server. The following figure shows the pipeline, which receives the input from a live camera feed, file, USB source, or an RTSP stream, preprocesses it, conducts inference on AI hardware, and generates the output. This process allows for real-time pose detection and visualization of human poses. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Application: gst-ai-pose-detection

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesYesNoYesYesYes
Config #2YesYesYesYesYesYesYes

Sample Model and Label Files

RuntimeModel fileLabel files
LiteRThrnet_pose_quantized.tflitehrnet_pose.json, hrnet_settings.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-pose-detection --config-file=/etc/configs/config_pose.json
The sample application uses the /etc/configs/config_pose.json file to read the input parameters.To display all available options:
gst-ai-pose-detection -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-pose-detection application uses the /etc/configs/config_pose.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<input-video-path>",
  "ml-framework": "<tflite or qnn framework>",
  "model": "<path-to-model-file>",
  "labels": "<path-to-label-file>",
  "pose-settings-path": "<path-to-hrnet_settings-file>",
  "output-type": "waylandsink or filesink or rtspsink",
  "runtime": "<cpu, gpu or dsp runtime>"
}
For USB camera input, set the video-format, resolution, and framerate parameters in the config file to match the camera capabilities, see Configure USB camera.
To change the threshold, you must configure the confidence value in the hrnet_settings.json file.
File source, LiteRT model, DSP runtime
{
  "file-path": "/etc/media/video.mp4",
  "ml-framework": "tflite",
  "model": "/etc/models/hrnet_pose_quantized.tflite",
  "labels": "/etc/labels/hrnet_pose.json",
  "pose-settings-path":"/etc/labels/hrnet_settings.json",
  "runtime": "dsp",
  "output-type": "waylandsink"
}

Expected Output

The displayed output shows the detected pose of the objects.Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the pose detection pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for concurrent display and ML inference.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for processing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for processing.
v4l2src• Captures the live stream from USB camera.
• Uses tee to split the stream for processing.
h264parse• Parses the H.264 video bitstream to ensure downstream elements can handle the payload.
v4l2h264dec• Hardware-accelerated decoder that converts H.264 video into raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite
qtimlqnn
• Uses the HRNet model for pose detection.
• The application runs on the external delegate to execute the model using the Hexagon Tensor Processor.
• After the inference runtime receives the tensor stream on its sink pad, it does the following:
   • Runs the inference.
   • Produces a tensor stream containing the inference results on its source pad.
   • Manages the inference results from the pose detection model.
qtimlpostprocess• Applies a threshold to the chosen number of results.
• Loads corresponding modules for various pose detection models.
In this specific use case, qtimlpostprocess does the following:
1. Loads the HRNet module.
2. Produces results in the form of video frames with drawn poses.
3. Sends the results to the sink pad of qtivcomposer for further processing or display.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.
filesink• Receives the video stream on its sink pad and saves it as an H.264-encoded MP4 file.
qtirtspbin1. Serves as a network sink.
2. Transmits UDP packets to the network.

Config JSON Field Description

FieldValues / Description
ml-frameworkSupported ML frameworks:
   • tflite (LiteRT)
   • qnn (Qualcomm AI Engine Direct)
runtimeHardware runtimes:
   • cpu
   • gpu
   • dsp
Input sourceSupported input sources:
   • camera (0=primary, 1=secondary)
   • file-path
   • rtsp-ip-port
   • usb-camera (set enable-usb-camera to TRUE)
output-ip-addressOutput RTSP server IP address.
portOutput RTSP server port.
output-typeSupported output sinks:
   • waylandsink (display)
   • filesink (MP4 file)
   • rtspsink (RTSP stream)
USB camera video-format and resolution1. Use one of the following video-format options:
   • waylandsink (display)
   • filesink (MP4 file)
   • rtspsink (RTSP stream)
2. Use the following resolution fields:
   • width
   • height
   • framerate
enable-usb-cameraSet to TRUE or FALSE.
output-fileOutput filename. Default: output_pose

Known Issues

  • Detection accuracy may decrease when objects are far from the camera - Identifies the pose of only one person in, even when multiple people are present in the frame.
  • Detection accuracy may decrease when objects are far from the camera - Identifies the pose of only one person in, even when multiple people are present in the frame.
For better accuracy and detection results, use the gst-ai-daisychain-detection-pose application.

Mono Depth

The gst-ai-monodepth application allows you to infer depth of a source feed from a live camera stream, file, or an RTSP stream. The following figure shows the pipeline, which captures feed from the source, preprocesses it, and runs inferences using the AI hardware. For information about the plugins used in the pipeline, see Pipeline flow. Pipeline Diagram Application: gst-ai-monodepth

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesNoNoNoYesNo
Config #2YesYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel fileLabel file
Qualcomm Neural Processing SDKmidasv2.dlcmonodepth.json
LiteRTmidas_quantized.tflitemonodepth.json
Qualcomm AI Engine Directmidas_quantized.binmonodepth.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-monodepth --config-file=/etc/configs/config_monodepth.json
The sample application uses the /etc/configs/config_monodepth.json file to read the input parameters.To display all available options:
gst-ai-monodepth -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-monodepth application uses the /etc/configs/config_monodepth.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<input video path>",
  "ml-framework": "<snpe, tflite, or qnn framework>",
  "model": "<path-to-model-file>",
  "labels": "<path-to-label-file>",
  "runtime": "<dsp, gpu, or cpu runtime>"
}
File source, LiteRT model, DSP runtime
{
  "file-path": "/etc/media/video.mp4",
  "ml-framework": "tflite",
  "model": "/etc/models/midas_quantized.tflite",
  "labels": "/etc/labels/monodepth.json",
  "runtime": "dsp"
}

Expected Output

The overlaid model output stream is shown side by side with the live feed.Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the monodepth pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for concurrent display and ML inference.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for processing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for processing.
h264parse• Parses the H.264 video bitstream to ensure downstream elements can handle the payload.
v4l2h264dec• Hardware-accelerated decoder that converts H.264 video into raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data. This preprocessing is done when the model expects floating-point values as input:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream on its source pad.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimlsnpe
qtimltflite
qtimlqnn
• Uses the Midasv2 model for monodepth calculation.
1. The inference runtime receives the tensor stream on its sink pad.
2. The runtime executes the inference.
3. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess• Converts the inference tensors received on its sink pad into video formats that multimedia plugins use for further processing.
qtivtransform• Converts the buffers on its source pad to formats compatible with composition on waylandsink.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
ml-frameworkSupported ML frameworks:
   • snpe (Qualcomm Neural Processing SDK)
   • tflite (LiteRT)
   • qnn (Qualcomm AI Engine Direct)
runtimeSupported hardware runtimes:
   • cpu
   • gpu
   • dsp
Input sourceSupported input sources:
   • file-path
   • rtsp-ip-port
   • camera (0=primary, 1=secondary)

Super Resolution

The gst-ai-superresolution application allows you to generate high resolution video frames from low-resolution input. The following figures shows the pipeline, which receives a video stream from a file source as input, processes it through the super resolution module using LiteRT, and displays the output. For information about the plugins used in the pipeline, see Pipeline flow. Pipeline Diagram Pipeline Diagram Application: gst-ai-superresolution

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #2YesNoNoNoYesYesNo

Sample Model Files

RuntimeModel file
LiteRTquicksrnetsmall_quantized.tflite

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-superresolution --config-file=/etc/configs/config-superresolution.json
The sample application uses the /etc/configs/config-superresolution.json file to read the input parameters.To display all available options:
gst-ai-superresolution -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-superresolution application uses the /etc/configs/config-superresolution.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "input-file-path": "<input-video-path>",
  "model": "<path-to-model-file>",
  "output-file-path": "<path-to-output-video>"
}
The video super‑resolution application requires an input video resolution of 128 × 128.
File source, LiteRT model, DSP runtime
{
  "input-file-path": "/etc/media/video.mp4",
  "model": "/etc/models/quicksrnetsmall_quantized.tflite"
}

Expected Output

The output is displayed on an HDMI monitor.Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the superresolution pipeline:
PluginDescription
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for processing.
h264parse• Parses the H.264 video bitstream to ensure downstream elements can handle the payload.
v4l2h264dec• Hardware-accelerated decoder that converts H.264 video into raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data. This preprocessing is done when the model expects floating-point values as input:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream on its source pad.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite• Runs on LiteRT and uses the quicksrnetsmall_quantized model for super resolution.
1. The inference runtime receives the tensor stream on its sink pad.
2. The runtime executes the inference.
3. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess• Handles inference results from any super resolution model.
1. Loads the SRNet module.
2. Produces results as high-resolution video frames.
3. Sends the processed frames to the sink pad of qtivcomposer.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
Input sourceinput-file-path: The directory path of the input video.
modelmodel: The path to the super resolution model.
Output sourceConfiguration for the output destination:
   • output-file-path: The directory path of the output video.
   • If the output-file-path is not provided, the display output is automatically enabled.

Known Issues

File source output fails to render correctly.

AI Event Encoder

The gst-ai-event-encoder application receives the live video stream input from camera, file, or RTSP source. When a human enters the video frame the application preprocesses the video, runs inferences on the AI hardware, and encodes the video. The encoding stops 5 seconds after the human moves away from the frame and restarts when anyone enters the frame. The following figures show the event detection and recording pipelines for event encoder application. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Pipeline Diagram Application: gst-ai-event-encoder

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesNoNoNoYesNo
Config #2YesYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTyolox_quantized.tfliteyolox.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-event-encoder --config-file=/etc/configs/config-event-encoder.json
The sample application uses the /etc/configs/config-event-encoder.json file to read the input parameters.To display all available options:
gst-ai-event-encoder -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-event-encoder application uses the /etc/configs/config-event-encoder.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<path to video file>",
  "model": "<path to model file>",
  "labels": "<path to label file>",
  "threshold": <integer between 1 and 100>,
  "runtime": "<cpu, gpu, or dsp runtime>"
}
File source, LiteRT model, DSP runtime
{
  "file-path": "/etc/media/video.mp4",
  "model": "/etc/models/yolox_quantized.tflite",
  "labels": "/etc/labels/yolox.json",
  "threshold": 40,
  "runtime": "dsp"
}

Expected Output

The output is saved as an MP4 file within /etc/media folder as output-1.mp4, output-2.mp4, and so on.

Pipeline Flow

The following table lists the plugins used in the event encoder pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for concurrent display and ML inference.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for processing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for processing.
h264parse• Parses the H.264 video bitstream to ensure downstream elements can handle the payload.
v4l2h264dec• Hardware-accelerated decoder that converts H.264 video into raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite1. After the inference runtime receives the tensor stream on its sink pad, it executes the inference.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocessInstance 1 (Detection Overlay):
   • Applies a threshold to the chosen number of results.
   • Loads the YOLOv8 module.
   • Produces video frames with only bounding boxes for object overlay.
   • Sends processed frames to the sink pad of qtivcomposer.

Instance 2 (Metadata Generation):
   • Produces output in text format (bounding box coordinates and labels).
   • Connects to an appsink plugin where metadata is read, parsed, and logged.
   • Uses bounding box information to count the number of humans in each frame.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
runtimeSupported hardware runtimes:
   • cpu
   • gpu
   • dsp
Input sourceSupported input sources:
   • file-path
   • rtsp-ip-port
   • camera (0=primary, 1=secondary)

Known Issues

On the QCS6490 device, FPS fluctuates between 22 and 29 when using file input.

Metadata Parser

The gst-ai-metadata-parser-example application receives the live video stream input from camera, file, or RTSP source, and passes the stream to the YOLO models for object detection and preview. The overlaid AI model output, including labels and bounding boxes, is displayed on an HDMI display. The extracted metadata is logged to the console and used to count the number of humans in the frame. The following figure shows the pipeline for metadata parsing. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Application: gst-ai-metadata-parser-example

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesNoNoNoYesNo
Config #2YesYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTyolox_quantized.tfliteyolox.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-metadata-parser-example --config-file=/etc/configs/config-metadata-parser-example.json
To view the bounding box information along with the human count, run the following command before running the application:
export GST_DEBUG=4
The sample application uses the /etc/configs/config-metadata-parser-example.json file to read the input parameters.To display all available options:
gst-ai-metadata-parser-example -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-metadata-parser-example application uses the /etc/configs/config-metadata-parser-example.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<path to video file>",
  "model": "<path to model file>",
  "labels": "<path to label file>",
  "threshold": <integer between 1 and 100>,
  "runtime": "<cpu, gpu, or dsp runtime>"
}
File source, LiteRT model, DSP runtime
{
  "file-path": "/etc/media/video.mp4",
  "model": "/etc/models/yolox_quantized.tflite",
  "labels": "/etc/labels/yolox.json",
  "threshold": 40,
  "runtime": "dsp"
}

Expected Output

Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the metadata parser pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream into two for inferencing and composing.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream into two for inferencing and composing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream into two for inferencing and composing.
h264parse• Parses the H.264 video.
v4l2h264dec• Decodes the video.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream on its source pad. The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite• After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
• Produces a tensor stream with the inference results on its source pad.
qtimlpostprocessInstance 1 (Object Detection):
   • Applies a threshold to the chosen number of results.
   • Loads the YOLOv8 module.
   • Produces video frames with only bounding boxes that can be overlaid on objects.
   • Sends these processed frames to the sink pad of qtivcomposer.

Instance 2 (Human Counting):
   • Produces the output in a text format (bounding box coordinates and labels).
   • This output is connected to appsink plugin where the metadata is read, parsed, and logged.
   • The bounding box information is used to count the number of humans in each frame.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. waylandsink submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues/Description
runtimeUse one of the following runtimes:
cpu
gpu
dsp
Input sourceUse one of the following input sources:
camera – Primary (0) or secondary (1).
file-path – The directory path to the video file.
rtsp-ip-port – The address of the RTSP stream: rtsp://<ip>:<port>/<stream>

AI USB Camera

The gst-ai-usb-camera-app streams video from a USB webcam connected to the Qualcomm EVK. This webcam should be accessible as a /dev/videoX device. Additionally, you can perform object detection and preview the results. You can choose to preview the output on Wayland, or encode to a video file, or live stream through the RTSP. Alternatively, you can set enable-object-detection as True to perform object detection. The following figures show a pipeline, which processes the input from the USB camera to generate various outputs. For information about the plugins used in this pipeline, see Pipeline flow. Pipeline Diagram Pipeline Diagram Application: gst-ai-usb-camera-app

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1NoNoYesNoYesYesYes
Config #2NoNoYesNoYesYesYes

Sample Model and Label Files

RuntimeModel FilesLabel Files
Qualcomm Neural Processing SDKyolonas.dlc
yolov5.dlc
yolov8.dlc
yolonas.json
yolov5.json
yolov8.json
LiteRTyolov8_det_quantized.tflite
yolonas_quantized.tflite
yolov5.tflite
yolox_quantized.tflite
yolov8.json
yolonas.json
yolov5.json
yolox.json
Qualcomm AI Engine directyolov8_det_quantized.binyolov8.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-usb-camera-app --config-file=/etc/configs/config-usb-camera-app.json
The sample application uses the /etc/configs/config-usb-camera-app.json file to read the input parameters.To display all available options:
gst-ai-usb-camera-app -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-usb-camera-app application uses the /etc/configs/config-usb-camera-app.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "width": "<Supported USB camera width>",
  "height": "<Supported USB camera Height>",
  "framerate": "<Supported USB camera fps>",
  "video-format": "<Supported USB camera video format like yuy2, mjpeg, nv12>",
  "output": "<output type like waylandsink, filesink or rtspsink>",
  "ip-address": "<Device IP address in case of rtsp streaming>",
  "port": "<Device port num in case of rtsp streaming>",
  "enable-object-detection": "<to enable the object-detection>",
  "file-path": "<input video path>",
  "ml-framework": "<snpe, or tflite or qnn framework>",
  "yolo-model-type": "<yolov8 or yolonas or yolov5 or yolox>",
  "model": "<Model Path>",
  "labels": "<Label Path>",
  "threshold": "<Post-processing threshold, integer value from 1-100>",
  "runtime": "<dsp, cpu or gpu runtime>",
  "snpe-tensors": "<Model output tensor name>"
}
For USB camera input, set the video-format, resolution, and framerate parameters in the config file to match the camera capabilities.
You can run the Yolo-NAS-Quantized.tflite model using the YOLOv8 value in the yolo-model-type field.
The snpe-tensors field applies only to the SNPE runtime. To retrieve the output tensor names for a DLC model, open the model in Netron.
When using DLC models from the AI Hub, the snpe-tensors field is optional.
If the USB camera isn’t detected on the target device, Download the required firmware. See Download PCIe to USB controller firmware.
RTSP input, LiteRT, YOLOx model and DSP runtime
{
  "width": 1920,
  "height": 1080,
  "framerate": 30,
  "output":"waylandsink",
  "video-format":"yuy2",
  "model":"/etc/models/yolox_quantized.tflite",
  "labels":"/etc/labels/yolox.json",
  "output-file":"/etc/media/output.mp4",
  "ip-address":"127.0.0.1",
  "port":"8900",
  "enable-object-detection": "TRUE",
  "ml-framework": "tflite",
  "yolo-model-type": "yolox",
  "threshold": 75,
  "runtime": "dsp"
}

Expected Output

Output Diagram

Pipeline flow

The following table lists the plugins used in AI USB camera pipelines:
PipelineDescription
Dump the USB camera to a filesink• USB camera captures the live camera stream.

qtivtransform transforms the stream data.

capsfilter is applied to enforce constraints on the raw video data.

filesink is used to dump the data into a file.
Video encoding• USB camera captures the live camera stream.

qtivtransform transforms the stream data.

capsfilter is applied to enforce constraints on the raw video data.

v4l2h264enc is used to encode the video using the H.264 format.

h264parse is used to parse the video.

mp4mux is used to multiplex the video into an MP4 container.

filesink is used to write the video to a file.
RTSP streaming• USB camera captures the live camera stream.

qtivtransform transforms the stream data.

capsfilter is applied to enforce constraints on the raw video data.

v4l2h264enc is used to encode the video using the H.264 format.

h264parse is used to parse the video.

qtirtspbin is used to load the stream to RTSP.
USB camera and object detection on RTSP• USB camera captures the live camera stream.

capsfilter is applied to enforce constraints on the raw video data.

tee is used to split the stream for inferencing.

qtivtransform transforms the stream data.

qtimlvconverter performs preprocessing and converts the video stream to a tensor stream, which is used for inferencing.

qtimlsnpe, qtimltflite, or qtimlqnn run the inference on the stream.

qtimlpostprocess handles the inference results from any object detection model and produces video frames.

qtivcomposer composes the video frames and shares them with qtirtspbin.

qtirtspbin submits the composed video stream to Weston, which renders it on the local display.
USB camera and object detection on wayland• USB camera captures the live camera stream.

capsfilter is applied to enforce constraints on the raw video data.

tee is used to split the stream for inferencing.

qtivtransform transforms the stream data.

qtimlvconverter performs preprocessing and converts the video stream to a tensor stream, which is used for inferencing.

qtimlsnpe, qtimltflite, or qtimlqnn run the inference on the stream.

qtimlpostprocess handles the inference results from any object detection model and produces video frames.

qtivcomposer composes the video frames and shares them with waylandsink.

waylandsink submits the composed video stream to Weston, which renders it on the local display.
Object detection using USB camera and file encode• USB camera captures the live camera stream.

capsfilter is applied to enforce constraints on the raw video data.

tee is used to split the stream for inferencing.

qtivtransform transforms the stream data.

qtimlvconverter performs preprocessing and converts the video stream to a tensor stream, which is used for inferencing.

qtimlsnpe, qtimltflite, or qtimlqnn run the inference on the stream.

qtimlpostprocess handles the inference results from any object detection model and produces video frames.

qtivcomposer composes the video frames and shares them with filesink.

filesink writes the composed video stream to a file.

Config JSON field description

FieldValues/Description
ml-frameworkUse one of the following models:

snpe — Qualcomm Neural Processing SDK
tflite — LiteRT
qnn — Qualcomm AI Engine Direct
yolo-model-typeRun one of the following models respectively:

yolov5
yolov8
yolonas
runtimeUse one of the following runtimes:

cpu
gpu
dsp
outputUse one of the following output types:

filesink
waylandsink
rtspsink
enable-object-detectionTRUE
FALSE
snpe-tensors["output-tensor-name","output-tensor-name"]
USB camera video-format and resolutionUse one of the following video formats:

nv12
yuy2
mjpeg

Use the following resolution parameters:

width: Input USB camera source resolution width.
height: Input USB camera source resolution height.
framerate: Input USB camera source framerate.

Face Recognition

The gst-ai-face-recognition application collects the live video input from a camera or an RTSP stream and shares this input for face detection, facial landmarking, and face recognition. It uses the face_det_quantized models for face detection, facemap_3dmm_quantized model for facial landmarking, and face_attrib_net_quantized model for face recognition labels. The result is a preview of the overlaid AI model on the HDMI display.
This application isn’t supported in Config #1 for the QLI 2.0 RC3 release because CPU runtime is not supported.
The following figure shows the pipeline, which receives the input, preprocesses it, runs inferences on AI hardware, and displays the results on the screen. Pipeline Diagram Application: gst-ai-face-recognition For information about the plugins used in the pipeline flow, see Pipeline flow.

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #2NoYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel filesLabel files
Qualcomm AI Engine Direct & LiteRTface_det_lite_quantized.tflite, facemap_3dmm_quantized.tflite, face_attrib_net_quantized.tflite, face_det_lite_quantized.bin, facemap_3dmm_quantized.bin, face_attrib_net_quantized.binface_detection.json, face_recognition_settings.json, face_recognition.json, facemap_3dmm_settings.json

Register a face for facial recognition

Before running the gst-ai-face-recognition application, you can register a face for secure verification and authentication:
1
Ensure that you complete the Prerequisites.
2
To register a face, use the following gst-pipeline on the target device shell:
gst-pipeline-app -e \
qtimlvconverter name=stage_01_preproc mode=image-batch-non-cumulative \
qtimltflite name=stage_01_inference model=/etc/models/face_det_lite_quantized.tflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
qtimlpostprocess name=stage_01_postproc settings="{\"confidence\": 40.0}" results=4 module=qfd labels=/etc/labels/face_detection.json \
qtimlvconverter name=stage_03_preproc mode=roi-batch-cumulative \
qtimltflite name=stage_03_inference model=/etc/models/face_attrib_net_quantized.tflite delegate=external external-delegate-path=libQnnTFLiteDelegate.so \
external-delegate-options="QNNExternalDelegate,backend_type=htp;" \
qticamsrc video_0::type=video name=camsrc ! video/x-raw,format=NV12,width=1920,height=1080 ! queue ! waylandsink fullscreen=true sync=false \
camsrc.image_1 ! video/x-raw,width=1920,height=1080 ! qtivtransform ! video/x-raw,format=NV12 ! tee name=t_split_1 \
t_split_1. ! queue ! metamux_1. \
t_split_1. ! queue ! stage_01_preproc. stage_01_preproc. ! queue ! stage_01_inference. stage_01_inference. ! queue ! \
stage_01_postproc. stage_01_postproc. ! text/x-raw ! queue ! metamux_1. \
qtimetamux name=metamux_1 ! queue ! tee name=t_split_3 \
t_split_3. ! queue ! stage_03_preproc. stage_03_preproc. ! queue ! stage_03_inference. stage_03_inference. ! queue ! \
multifilesink location=/etc/data/tensor_%d.bin sync=true async=false enable-last-sample=false
3
To prepare for capturing a facial image, do the following:
1
Select the following options from the list. Choose the number corresponding to the option:
  • PLAYING: Move the pipeline to the Playing state.
  • Plugin Modecamsrccapture-image: Capture the image using a camera source.
2
Using the live preview on the display, face the camera and ensure that the camera is pointed straight and there is only one person in the frame.
3
In the terminal, enter 1 for the following values:
  • GstImageCaptureMode for arg0.
  • guint for arg1.
4
To capture all the sides of your face, select capture-image do the following for each side:
1
Left and right: Turn your head left by 40° while keeping the landmarks visible, then repeat steps 3 and 4. Turn your head right (by 40°) and repeat.
2
Up and down: Raise your head by 30° while keeping the landmarks visible, then repeat steps 3 and 4. Lower your head (by 30°) and repeat.
5
To stop the pipeline, use (b)Back and (q)Quit.
After running the pipeline, five individual tensor bins are created (tensor_0.bin to tensor_4.bin) with facial properties recorded for each side of the face.
6
On the target device, go to /etc/data/, find the tensor bins. To pull the bins from the target device to the Linux host computer, run the following commands:
scp root@<IP-Address>:/etc/data/tensor_0.bin .
scp root@<IP-Address>:/etc/data/tensor_1.bin .
scp root@<IP-Address>:/etc/data/tensor_2.bin .
scp root@<IP-Address>:/etc/data/tensor_3.bin .
scp root@<IP-Address>:/etc/data/tensor_4.bin .
7
To merge the tensor bins with all the facial properties into a cohesive image, download and run the facedb.py script in the same directory as the tensor bins on the Linux host computer.
1
Download the facedb.py script:
curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/facedb.py
2
Run the script. Note that <Name of the person> is case and style sensitive. Ensure that you use the same name consistently.
python3 ./facedb.py "<Name of the person>" 512 32 tensor_0.bin tensor_1.bin tensor_2.bin tensor_3.bin tensor_4.bin
3
A face.bin binary is created.
8
Push the face.bin binary to /etc/data directory and rename it to face0.bin.
scp face.bin root@<ip addressof target device>:/etc/data/face0.bin
9
To generate the face_recognition.json file and register the new person into the database, use the following reference label file for two-person registered face:
[
  {"id": 0, "color": "0x00FF00FF", "label": "<Name of Person>"},
  {"id": 1, "color": "0xFFFF00FF", "label": "<Name of Person>"}
]
Update the ID field according to the number in the list. If more faces are registered, add the structure in a new line within face_recognition.json.
10
To generate the face_recognition_settings.json file use the following reference label file:
{
  "confidence": 51.0,
    "databases":[
      {"id":  0, "database": "/etc/data/face0.bin"},
      {"id":  1, "database": "/etc/data/face1.bin"}
    ]
}
11
To push the updated face_recognition.json and face_recognition_settings.json files to the /etc/labels directory on the target device.
scp face_recognition.json root@<ip address of target device>:/etc/labels
scp face_recognition_settings.json root@<ip address of target device>:/etc/labels

Run the application on the target device

The following commands provide the default model and label paths. If you have a different folder structure, replace the default paths in the command-line parameters. See Sample model and label files.
1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-face-recognition --config-file=/etc/configs/config-face-recognition.json
The sample application uses the /etc/configs/config-face-recognition file to read the input parameters.To display all available options:
gst-ai-face-recognition -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-face-recognition application uses the /etc/configs/config-face-recognition.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "ml-framework": "<qnn or tflite>",
  "face-detection-model": "<path for face detection model file>",
  "face-landmark-model":"<path for face landmark model file>",
  “face-recognition-model”:”<path for face recognition model file>”,
  "face-detection-labels":”<path for face detection labels file>”,
  "face-recognition-labels":”<path for face recognition labels file>”,
  "face-recognition-settings": "<Path of face recognition settings>",
  "facemap-3dmm-settings": "<Path of facemap-3dmm settings>"
}
Camera source, LiteRT, and DSP runtime
{
  "ml-framework":"tflite",
  "face-detection-model":"/etc/models/face_det_lite_quantized.tflite",
  "face-landmark-model":"/etc/models/facemap_3dmm_quantized.tflite",
  "face-recognition-model":"/etc/models/face_attrib_net_quantized.tflite",
  "face-detection-labels": "/etc/labels/face_detection.json",
  "face-recognition-labels": "/etc/labels/face_recognition.json",
  "face-recognition-settings": "/etc/labels/face_recognition_settings.json",
  "facemap-3dmm-settings": "/etc/labels/facemap_3dmm_settings.json"
}

Expected output

Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the daisychain detection and classification pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
h264parse• Parses the H.264 video bitstream.
v4l2h264dec• Hardware-decodes H.264 video to raw frames.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite1. After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess (Pose-estimation)• Uses the lite-3dmm module to perform facial pose recognition.
qtimlpostprocess (Classification)• Uses the qfr module to receive the stream from qtimetamux and classifies the face.
qtimetamux1. Receives the output of the face detection models from qtimlpostprocess and multiplexes it.
2. Receives the output of facial pose from qtimlpostprocess and multiplexes it.
tee• Splits the stream for inferencing.
qtivoverlay1. Receives the multiplexed stream.
2. Overlays the bounding boxes on the stream.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
ml-frameworkUse one of the following models:
- tflite – LiteRT
- qnn – Qualcomm AI Engine direct
Models and labelsSee Sample model and label files
face-detection-modelThe path to the face detection model
face-landmark-modelThe path to the face landmark model
face-recognition-modelThe path to the face recognition model
face-detection-labelsThe path to the face detection labels
face-recognition-labelsThe path to the face recognition labels
face-recognition-settingsThe path of face recognition setting labels
facemap-3dmm-settingsThe path of facemap-3dmm setting labels

Image segmentation using Python with container

The application allows you to perform image segmentation using the Qualcomm Neural Processing SDK with Python bindings, all from within a Docker container.
This application isn’t supported in the QLI 2.0 RC3 release.

Setup the host container

Pipeline DiagramTo set up the host container, do the following on your Linux host computer with Docker:
1
Ensure that you complete the Prerequisites.
3
Download and extract Qualcomm Neural Processing SDK:
qpm-cli --login <username>
4
Download scripts and model attachments to run the sample mode.
  1. Download the Dockerfile and scripts and prepare the directory for storing the image.
    git clone https://git.codelinaro.org/clo/le/sdk-tools.git -b imsdk-tools.lnx.1.0
    
      cd sdk-tools/snpe-container-python
    
      mkdir images
    
    The snpe-container-python file has the Dockerfile and scripts. Run all the commands from this directory.
  2. Copy the test image in a new folder called inputs and rename the test image as input_image.jpg.
    mkdir inputs
    
    cp <path of test image> ./inputs
    
    cd inputs
    
    mv <test_image_name> input_image.jpg
    
  3. Set up the host device for cross compilation:
    sudo groupadd docker
    
    sudo usermod -aG docker $USER
    
    newgrp docker
    
    Until the host device is rebooted, continue running these commands for every new console that’s used to run the Docker.
    sudo apt-get install qemu-user-static qemu-system-arm
    
    docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
    
    docker buildx rm builder
    
    docker buildx create --name builder --driver docker-container --use
    
    docker buildx inspect --bootstrap
    
  4. Build the Docker image. Populate the <path-to-sdk-tools>/targets/config.json fileThe following code shows an updated sample config.json file.The following inputs are mandatory:
    • SNPE_version
    • Base_Image
    • Target_platform: The following are field values for each SoC:
      • For QCS6490 – qcm6490
      • For Dragonwing IQ-8275 – qcs8300
      • For Dragonwing IQ-9075 – qcs9100
    • URL (ensure that the address provided isn’t the same as your current directory)
    To prevent code failures, ensure that you remove the comments before using it.
    {
      "SNPE_version":"2.41.0.251128",
      "Base_Image": "ubuntu:22.04",
      "Target_platform": "qcm6490",
    
      "Additional_tag_container": "",
      "Additional_tag_image": "",
      "URL": "<path-to-sdk-tools>/snpe-container-python/images",
    
      "DeviceID" : "null"
    }
    
5
Build the Docker image:
sudo apt install jq
cd  <path to sdk-tools>/snpe-container-python
source scripts/host/docker_env_setup.sh
qml-docker-build-image targets/config.json
Save the Docker image:
qml-docker-device-save-image targets/config.json
The Docker image is compressed and the TAR file is saved at the directory specified in the URL field in config.json.
  • If Additional_tag_image is empty, the compressed image is stored as qml.tar.
  • If Additional_tag_image is populated, the compressed image is stored as qml-<field value>.tar.
6
Push the Docker image on the target device:
scp <path specified in URL field>/qml.tar root@<IP address of target device>:/opt
7
Start the container:
ssh root@<ip of target>
mount -o remount, rw /usr
systemctl restart docker
docker load < /opt/qml.tar
8
Run the Docker image on the target device:
docker run -it -d \
  --device=/dev/fastrpc-cdsp-secure \
  --device /dev/kgsl-3d0 \
  --device /dev/dma_heap/system \
  --device /dev/dma_heap/qcom,system \
  --volume /usr/lib/libCB.so.1:/usr/lib/libCB.so.1 \
  --volume /usr/lib/libOpenCL.so.1:/usr/lib/libOpenCL.so.1 \
  --volume /usr/lib/libOpenCL_adreno.so.1:/usr/lib/libOpenCL_adreno.so.1 \
  --volume /usr/lib/libcdsprpc.so:/usr/lib/libcdsprpc.so \
  --volume /usr/lib/libdmabufheap.so.0:/usr/lib/libdmabufheap.so.0 \
  --volume /usr/lib/libglib-2.0.so.0:/usr/lib/libglib-2.0.so.0 \
  --volume /usr/lib/libgsl.so.1:/usr/lib/libgsl.so.1 \
  --volume /usr/lib/libgthread-2.0.so.0:/usr/lib/libgthread-2.0.so.0 \
  --volume /usr/lib/libllvm-qcom.so.1:/usr/lib/libllvm-qcom.so.1 \
  --volume /usr/lib/libpcre.so.1:/usr/lib/libpcre.so.1 \
  --volume /usr/lib/libvmmem.so.0:/usr/lib/libvmmem.so.0 \
  --volume /usr/lib/libatomic.so.1:/usr/lib/libatomic.so.1 \
  --hostname qml --name qml qml
A Docker image called qml should be running. To check this image, run the following command:
docker ps
9
Push input_image.jpg to the target device.
scp <path to inputs_directory>/input_image.jpg root@<IP address of target device>:/opt/
ssh root@<ip of target>
docker cp /opt/input_image.jpg qml:/opt/
10
Copy the deeplab_resnet50.dlc model to the Docker:
cd /etc/models/
wget https://github.com/quic/sample-apps-for-qualcomm-linux/releases/download/GA1.7-rel/deeplabv3_resnet50.dlc
docker cp /etc/models/deeplabv3_resnet50.dlc qml:/opt/

Run the application on the target device

1
Run the Qualcomm Neural Processing SDK model using Python bindings:
docker exec qml python3 /mnt/qml/src/python/snpe/test_snpe/snpe_segmentation_app.py -d /opt/deeplabv3_resnet50.dlc -i /opt/input_image.jpg -r dsp -o /mnt/qml/output/ -b USERBUFFER_FLOAT -p /usr/lib/libSNPE.so
  • The output image is saved in the container at /opt/.
  • The output from the DLC model (RAW file) is saved at /mnt/qml/output/.
2
Copy the output from the container to the target device:
docker cp qml:/opt/output.jpg /opt/
3
To pull the image from the target device to host, run the command on your Linux host computer:
scp root@<IP address of target device>:/opt/output.jpg ./

Expected Result

Pipeline Diagram

AI Audio Applications

Audio Classification

The gst-ai-audio-classification application shows audio classification using input from either a file source or a microphone. It displays both the classification results and a video preview. The following figure shows the pipeline, which gets the input from a file or a microphone, preprocesses it, and runs inferences on AI hardware. The results are displayed on the screen. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Application: gst-ai-audio-classification

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTyamnet.tfliteyamnet.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-audio-classification --config-file=/etc/configs/config-audio-classification.json
The sample application uses the /etc/configs/config-audio-classification.json file to read the input parameters.To display all available options:
gst-ai-audio-classification -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-audio-classification application uses the /etc/configs/config-audio-classification.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "file-path": "<path to video file>",
  "model": "<path to model file>",
  "labels": "<path to label file>",
  "threshold": <integer between 1 and 100>,
  "runtime": "<cpu or gpu>",
  "codec": "<mp3 or flac>"
}
File source, LiteRT model, CPU runtime
{
  "file-path": "/etc/media/video-mp3.mp4",
  "model": "/etc/models/yamnet.tflite",
  "labels": "/etc/labels/yamnet.json",
  "runtime": "cpu",
  "threshold": 20,
  "codec": "mp3"
}

Expected Output

The output video and classified audio are played on the screen.

Pipeline Flow

The following table lists the plugins used in the audio classification pipeline:
PluginDescription
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for processing.
h264parse• Parses the H.264 video.
v4l2h264dec• Decodes the video bitstream into raw frames.
mpegaudioparse / flacparse• Parses the audio bitstream (MP3 or FLAC) to ensure downstream elements can handle the payload.
mpg123audiodec / flacdec• Decodes the compressed audio (MP3 or FLAC) into raw audio buffers.
audioconvert• Converts raw audio buffers between various possible formats to ensure compatibility.
audioresample• Resamples the audio buffers to different sample rates as required by the model.
pulsesrc• Reads the live audio stream from the microphone.
audiobuffersplit• Splits the incoming audio buffers into equal-sized chunks for consistent processing.
qtimlaconverter1. Receives the audio stream on its sink pad.
2. Performs preprocessing on the audio stream data.
3. Converts the stream to a tensor stream for inferencing in the later stages of the pipeline.
qtimltflite1. Receives the tensor stream on its sink pad.
2. Performs inferencing using the YAMNet model.
3. Produces a tensor stream with the results on its source pad.
qtimlpostprocess• Uses the yamnet module to handle audio classification inference results:
   • Applies a threshold to the chosen number of results.
   • Creates a text overlay for the identified audio classes.
qtivcomposer• Combines the text overlay for classification results and the video preview into a single composed frame.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
runtimeUse one of the following runtimes:
   • cpu
   • gpu
Input sourceUse one of the following input sources:
   • file-path: The directory path to the video file.
   • Microphone
threshold=<integer>Use any integer between 1 and 100.
codecThe audio codec of input video:
   • MP3 (default)
   • FLAC

AI Multi-Model Applications

Daisychain Detection + Classification

The gst-ai-daisychain-detection-classification application allows you to perform cascaded object detection and classification with a camera, file source, or RTSP stream. The use case involves detecting objects and classifying the detected objects. The following figures show the pipeline workflow, which captures the video stream from the source, preprocesses it, and runs inferences using AI hardware. The results are either displayed on the screen, saved as an encoded MP4 file, or streamed over the RTSP server. For information about the plugins used in this pipeline, see Pipeline flow. Pipeline Diagram Pipeline Diagram Application: gst-ai-daisychain-detection-classification

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesYesNoYesYesYes
Config #2YesYesYesYesYesYesYes

Sample Model and Label Files

RuntimeModel filesLabel files
LiteRTdetection: yolox_quantized.tflite,
classification: inception_v3_quantized.tflite
yolox.json, classification.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-daisychain-detection-classification --config-file=/etc/configs/config_daisychain_detection_classification.json
The sample application uses the /etc/configs/config_daisychain_detection_classification.json file to read the input parameters.To display all available options:
gst-ai-daisychain-detection-classification -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-daisychain-detection-classification application uses the /etc/configs/config_daisychain_detection_classification.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "input-file": "<input-video-path>",
  "detection-model": "<path-to-detection model>",
  "detection-labels": "<path-to-detection-labels>",
  "classification-model": "<path-to-classification-model>",
  "classification-labels": "<path-to-classification-labels>",
  "detection-runtime": "<Can be dsp or cpu or gpu>",
  "classification-runtime": "<Can be dsp or cpu or gpu>"
}
For USB camera input, set the video-format, resolution, and framerate parameters in the config file to match the camera capabilities, see Configure USB camera.
If a drop in performance is observed, you can use YOLOv8 LiteRT model. For YOLOv8 export instructions, see Prerequisites.
File source, LiteRT model, DSP runtime
{
 "input-file": "/etc/media/video.mp4",
 "detection-model": "/etc/models/yolox_quantized.tflite",
 "detection-labels": "/etc/labels/yolox.json",
 "classification-model": "/etc/models/inception_v3_quantized.tflite",
 "classification-labels": "/etc/labels/classification.json",
 "detection-runtime": "dsp",
 "classification-runtime": "dsp"
}

Expected Output

The cropped video frame is overlaid on the frame and displayed on a local device.Pipeline DiagramPipeline Diagram
The classification models trained on the Imagenet dataset don’t contain the person class.

Pipeline Flow

The following table lists the plugins used in the daisychain detection and classification pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
v4l2src• Captures the live stream from USB camera.
• Uses tee to split the stream for inferencing.
h264parse• Parses the H.264 video bitstream.
v4l2h264dec• Hardware-decodes H.264 video to raw frames.
qtimetamux• Multiplexes the stream.
qtivsplit• Crops the full frame into smaller frames based on the detected bounding boxes (maximum 4).
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite1. After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess (Detection)• Handles inference results from any object detection model:
   • Applies a threshold to the chosen number of results.
   • Loads the YOLOv8 module.
   • Produces video frames with only bounding boxes that can be overlaid on objects.
   • Produces video frames with only bounding boxes that can be cropped.
qtimlpostprocess (Classification)• Processes results on the cropped frame:
   • Applies the threshold to the chosen number of results on the cropped frame.
   • Loads the MobileNet-softmax module.
   • Produces results as video frames with classification labels.
   • Sends them to the sink pad of qtivcomposer.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
filesink• Receives the video stream on its sink pad and saves it as an H.264-encoded MP4 file.
qtirtspbin1. Serves as a network sink.
2. Transmits UDP packets to the network.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
Input sourceSupported input sources:
   • input-file: The directory path to the video file.
   • rtsp-ip-port: The address of the RTSP stream in rtsp://<ip>:<port>/<stream> format.
Models and labelsSupported model and label paths:
   • detection-model: The path to the detection model file.
   • detection-labels: The path to the detection label file.
   • classification-model: The path to the classification model file.
   • classification-labels: The path to the classification label file.
output-typeUse one of the following output sinks:
   • waylandsink: To display output via the Weston compositor.
   • filesink: To store output in a local file.
   • rtspsink: To stream output to a network server.
USB camera video-format and resolution1. Use one of the following video-format options:
   • nv12
   • yuy2
   • mjpeg
2. Use the following resolution parameters:
   • width: Input USB camera source resolution width.
   • height: Input USB camera source resolution height.
   • framerate: Input USB camera source framerate.
output-file• Output filename. The default output file is output_detection.mp4.
output-ip-address and portNetwork configuration for RTSP output:
   • output-ip-address: Output server IP address.
   • port: Output server port.

Daisychain Detection + Pose

The gst-ai-daisychain-detection-pose application allows you to perform cascaded object detection and pose detection with a camera, file source, or an RTSP stream. The use cases involve detecting objects and estimating the body poses of the subject in an image or a video. The following figure show the application workflow, which receives the source, postprocesses it, and runs inferences on AI hardware. The results are either displayed on the screen, saved as an encoded MP4 file, or streamed over the RTSP server. For information about the plugins used in the pipeline flow, see Pipeline flow. Pipeline Diagram Application: gst-ai-daisychain-detection-pose

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #1YesYesYesNoYesYesYes
Config #2YesYesYesYesYesYesYes

Sample Model and Label Files

RuntimeModel filesLabel files
LiteRTdetection: yolox_quantized.tflite,
pose: hrnet_pose_quantized.tflite
detection: yolox.json, pose: hrnet_pose.json, pose: hrnet_settings.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-daisychain-detection-pose --config-file=/etc/configs/config-daisychain-detection-pose.json
The sample application uses the /etc/configs/config-daisychain-detection-pose.json file to read the input parameters.To display all available options:
gst-ai-daisychain-detection-pose -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-daisychain-detection-pose application uses the /etc/configs/config-daisychain-detection-pose.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "input-file": "<input-video-path>",
  "rtsp-ip-port": "<RTSP-IP-port-address>",
  "detection-model": "<path-to-detection model>",
  "detection-labels": "<path-to-detection-labels>",
  "detection-runtime": "<It can take dsp, gpu or cpu as runtime. Runs detection model on given runtime>",
  "pose-runtime": "<It can take dsp, gpu or cpu as runtime. Runs pose model on given runtime>",
  "output-file": "<output-file-path>"
}
For QCS6490, if file-path and rtsp-ip-port are not present in the configuration file, then the camera input is selected.
For USB camera input, set the video-format, resolution, and framerate parameters in the config file to match the camera capabilities, see Configure USB camera.
If a drop in performance is observed, you can use YOLOv8 LiteRT model. For YOLOv8 export instructions, see Prerequisites.
File source, LiteRT model, DSP runtime
{
 "input-file": "/etc/media/video.mp4",
 "pose-runtime":"dsp",
 "detection-runtime":"dsp",
 "detection-model": "/etc/models/yolox_quantized.tflite",
 "detection-labels": "/etc/labels/yolox.json",
 "pose-model": "/etc/models/hrnet_pose_quantized.tflite",
 "pose-labels": "/etc/labels/hrnet_pose.json",
 "pose-settings-path":"/etc/labels/hrnet_settings.json"
}

Expected Output

The cropped video frame is overlaid on the frame and displayed on a local device.Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the daisychain detection and pose pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc, followed by qtdemux for demultiplexing.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
v4l2src• Captures the live stream from USB camera.
• Uses tee to split the stream for inferencing.
h264parse• Parses the H.264 video bitstream.
v4l2h264dec• Hardware-decodes H.264 video to raw frames.
qtimetamux• Multiplexes the stream.
qtivsplit• Crops the full frame into smaller frames based on the detected bounding boxes (maximum 4).
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite1. After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess (Detection)• Handles inference results from any object detection model:
   • Applies a threshold to the chosen number of results.
   • Loads the YOLOv8 module.
   • Produces video frames with only bounding boxes that can be overlaid on objects.
   • Produces video frames with only bounding boxes that can be cropped.
qtimlpostprocess (Pose)• Applies a threshold to the chosen number of results.
• Loads corresponding modules for various pose detection models.
In this specific use case, qtimlpostprocess does the following:
1. Loads the HRNet module.
2. Produces results in the form of video frames with drawn poses.
3. Sends the results to the sink pad of qtivcomposer for further processing or display.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
filesink• Receives the video stream on sink pad and saves it as an H.264-encoded MP4 file.
qtirtspbin1. Serves as a network sink.
2. Transmits UDP packets to the network.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues / Description
Input sourceUse one of the following input sources:
   • input-file: The directory path to the video file.
   • rtsp-ip-port: The address of the RTSP stream in rtsp://<ip>:<port>/<stream> format.
   • enable-usb-camera: Set to TRUE or FALSE to enable/disable USB camera input.
Models and labelsSupported model and label paths:
   • detection-model: The path to the detection model file.
   • detection-labels: The path to the detection label file.
   • pose-model: The path to the pose model file.
   • pose-labels: The path to the pose label file.
Output sourceoutput-file: The directory path to save the output file.
Note: The display is not enabled if this field is left empty.
USB camera video-format and resolution1. Use one of the following video-format options:
   • nv12
   • yuy2
   • mjpeg
2. Use the following resolution fields:
   • width: Input USB camera source resolution width.
   • height: Input USB camera source resolution height.
   • framerate: Input USB camera source framerate.
detection-runtime and classification-runtimeHardware runtime configuration:
   • Takes CPU, GPU, or DSP as input.
   • Executes the respective use case model in the specified runtime for optimized inference.

Multistream Inference

The gst-ai-multistream-inference application shows AI inference (object detection and classification) on up to 32 input streams coming from camera, file, or RTSP stream. The following figure shows the pipeline, which receives several input streams, preprocesses them, runs AI inferences, combines the streams, and merges them all into a single video output. The maximum number of input streams supported on each SoC as verified on 1080P and 720P are follows:
  • QCS6490–8
  • Dragonwing IQ-8275–16
  • Dragonwing IQ-9075–32
    This application isn’t supported in Config #1 for the QLI 2.0 RC3 release because CPU runtime is not supported.
    The output is displayed on an HDMI display, saved as an H.264 encoded MP4 file, or converted into an RTSP stream.
For information about the plugins used in this pipeline, see Pipeline flow. Pipeline Diagram Application: gst-ai-multistream-inference

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #2YesYesNoYesYesYesYes

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTdetection: yolox_quantized.tflite
classification: inception_v3_quantized.tflite
detection: yolox.json
classification: classification.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-multistream-inference --config-file=/etc/configs/config-multistream-inference.json
The sample application uses the /etc/configs/config-multistream-inference.json file to read the input parameters.To display all available options:
gst-ai-multistream-inference -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-multistream-inference application uses the /etc/configs/config-multistream-inference.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "input-file-path": ["<input file 1>", "<input file 2>"],
  "input-rtsp-path": ["<rtsp stream 1>", "<rtsp stream 2 >"],
  "input-type": "<h264 or h265>",
  "model": "<path to model file>",
  "labels": "<path to label file>",
  "output-display": "<0 or 1>",
  "output-file-path": "<output file path>",
  "output-ip-address": "<ip address of test kit>",
  "output-port-number": "<port number over which rtsp stream can be listened>",
  "use-case": "<0 or 1>"
}
If a drop in performance is observed, you can use YOLOv8 LiteRT model. For YOLOv8 export instructions, see Prerequisites.
Object Detection on 8 H.264 file inputs, LiteRT model, DSP runtime
{
  "input-file-path":
    [
      "/etc/media/video.mp4",
      "/etc/media/video.mp4",
      "/etc/media/video.mp4",
      "/etc/media/video.mp4",
      "/etc/media/video.mp4",
      "/etc/media/video.mp4",
      "/etc/media/video.mp4",
      "/etc/media/video.mp4"
    ],
  "model": "/etc/models/yolox_quantized.tflite",
  "labels": "/etc/labels/yolox.json",
  "input-type": "h264",
  "output-display": 1,
  "use-case": 0
}

Expected Output

Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the metadata parser pipeline:
PluginDescription
qticamsrc• Captures the live stream from camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc, followed by qtdemux, which demultiplexes the stream.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc, followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
h264parse• Parses the H.264 video.
v4l2h264dec• Decodes the video.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data. This preprocessing is done when the model expects floating-point values as input:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. Converts the preprocessed video stream to a tensor stream on its source pad.
The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite1. After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
2. Produces a tensor stream with the inference results on its source pad.
qtimlpostprocessHandles the inference results from any object detection, classification, pose detection, and segmentation model.

Detection use case:
   • Applies a threshold to the chosen number of results.
   • Loads the YOLOv8 module.
   • Produces video frames with only bounding boxes that can be overlaid on objects, sending them to the sink pad of qtivcomposer.

Classification use case:
   • Applies the threshold to the chosen number of results.
   • Loads the MobileNet-softmax module.
   • Produces results as video frames with classification labels, sending them to the sink pad of qtivcomposer.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. waylandsink submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.
filesink1. Receives the video stream on its sink pad.
2. Saves the stream as a H.264-encoded MP4 file.
qtirtspbin1. Serves as a network sink.
2. Transmits UDP packets to the network.

Config JSON Field Description

FieldValues / Description
Input sourceUse one of the following input sources:
   • num-camera: The number of inputs from the camera.
   • camera: The input camera if num-camera=1.
   • input-file-path: The directory path to the video file.
   • input-rtsp-path: The address of the RTSP stream: rtsp://<ip>:<port>/<stream>
input-typeThe video encoding type for file and RTSP input:
   • H.264
   • H.265
OutputUse one of the following outputs:
   • output-file-path: The directory path to save the output file.
   • output-ip-address: The IP address of the device on which the RTSP stream can be played.
   • output-port-number: The port number of the device on which the RTSP stream can be played.
   • output-display: The connected display device for preview. Select 1 to enable this option.

Known Issues

Low FPS and frame drop is observed during inference on Dragonwing IQ‑8275, Dragonwing IQ‑9075, and QCS6490.

Multi-Stream Batch Inference

The gst-ai-multistream-batch-inference application shows batched AI inference (object detection and segmentation) on up to 24 input streams from video files. The following figure shows the pipeline, which receives several input streams, preprocesses them, runs AI inferences, combines the streams with inference, and merges them into a single video output. The maximum number of input streams supported on each SoC are follows: QCS6490–8 Dragonwing IQ-8275–4 Dragonwing IQ-9075–4 The output is displayed either on an HDMI display or saved as an H.264 encoded MP4 file. For information about the plugins used in this pipeline, see Pipeline flow.
This application isn’t supported in Config #1 for the QLI 2.0 RC3 release because CPU runtime is not supported.
Pipeline Diagram Application: gst-ai-multistream-batch-inference

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #2YesNoNoNoYesYesYes

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTsegmentation: deeplabv3_plus_mobilenet_quantized.tflite
detection: yolov8_det_quantized.tflite
segmentation:deeplabv3_resnet50.json
detection: yolov8.json
Qualcomm AI Engine directsegmentation: deeplabv3_plus_mobilenet_quantized.bin
detection: yolov8_det_quantized.bin
segmentation:deeplabv3_resnet50.json
detection: yolov8.json
Qualcomm Neural Processing SDKsegmentation: deeplabv3_plus_mobilenet_quantized.dlc
detection: yolov8_det_quantized.dlc
segmentation:deeplabv3_resnet50.json
detection: yolov8.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-multistream-batch-inference --config-file=/etc/configs/config-multistream-batch-inference.json
The sample application uses the /etc/configs/config-multistream-batch-inference.json file to read the input parameters.To display all available options:
gst-ai-multistream-batch-inference -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-multistream-batch-inference application uses the /etc/configs/config-multistream-batch-inference.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "output-type": "wayland or filesink",
  "out-file":"<Path to output file if output-type is filesink>",
  "pipeline-info":[
    {
      "id": "<Batch id takes value from 0 to 5>",
      "Input type": "<Input source type like file>",
      "input-file-path": [
        {
        "<stream[i]: Comma separated input file/RTSP path>"
        }
      ],
      "mlframework": "<tflite or snpe or qnn>",
      "model-path": "<path-to-model-file>",
      "labels": "<path-to-label-file>",
      "post processing plugin": "qtimlpostprocess"
    }
  ]
}
For 16 and 24 streams, add the required elements in the pipeline-info parameter. The id parameter takes the values from 0 to 5 for each added batch.
File source, LiteRT model, DSP runtime
{
  "output-type":"wayland",
  "pipeline-info":[
    {
      "id":0,
      "input-type":"file",
      "input-file-path":[
        {
            "stream-0":"/etc/media/video.mp4",
            "stream-1":"/etc/media/video.mp4",
            "stream-2":"/etc/media/video.mp4",
            "stream-3":"/etc/media/video.mp4"
        }
      ],
      "mlframework":"tflite",
      "model-path":"/etc/models/yolov8_det_quantized.tflite",
      "labels-path":"/etc/labels/yolov8.json",
      "post-process-plugin": "qtimlpostprocess"
    }
  ]
}

Expected Output

Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the metadata parser pipeline:
PluginDescription
filesrc1. Captures the video stream using filesrc.
2. qtdemux demultiplexes the stream.
3. Uses tee to split the stream for inferencing.
h264parseParses the H.264 video.
v4l2h264decDecodes the video.
qtibatch1. Reads input from the streams on its sink pad.
2. Batches the streams for preprocessing.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data when the model expects floating-point values as input:
   • Color conversion
   • Scaling (up or down)
   • Normalization
3. The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite1. After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
2. Produces a tensor stream with the inference results on its source pad.
qtimldemux1. Demultiplexes the batched output.
2. Splits the output corresponding to the input streams.
qtimlpostprocessConverts the inference tensors received on the sink pad into video formats that the multimedia plugins can use for further processing.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.
filesinkTakes the video stream received on its sink pad and saves it as an H.264-encoded MP4 file.

Config JSON Field Description

FieldValues/Description
output-typeUse one of the following output types:
Wayland: Displays output on Weston.
filesink: Encodes the output in a video file.
out-fileThe file path to save the output file.
pipeline-infoProvides the pipeline information:
1. Stream id: Ranges from 0 to 5.
2. Input-type: The input source file.
3. Input-file-path: The array of the input file path.
mlframeworkUse one of the following frameworks:
tflite
qnn
snpe
model-pathThe path to the model file.
labels-pathThe path to the labels file.

Known Issues

Segmentation fault is observed on Dragonwing IQ-8275 and Dragonwing IQ-9075 with a batch‑8 stream using two batch‑4 models.

Multi input/output object detection

The gst-ai-multi-input-output-object-detection application allows you to perform object detection, object classification, pose detection, and image segmentation on an input stream from different sources such as a camera, a file, or an RTSP network. The use cases implement the LiteRT models for object detection, image segmentation, classification, and pose detection. The following figure shows the pipeline workflow, which captures video streams for inferencing from different sources such as camera, file, or RTSP. For information about the plugins used in the pipeline, see Pipeline flow.
This application isn’t supported in Config #1 for the QLI 2.0 RC3 release because CPU runtime is not supported.
Pipeline Diagram Application: gst-ai-multi-input-output-object-detection

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #2YesYesNoYesYesYesYes

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTyolov5.tfliteyolov5.json

Prerequisites

Update the following commands according to the Python version in your Linux host computer.
  • Create the Python 3.8 virtual environment:
sudo apt-get install python3.8 
sudo apt-get install python3.8-venv 
python3.8 -m venv py3.8 
source py3.8/bin/activate 
  • Generate the yolov5.tflite model:
git clone https://github.com/ultralytics/yolov5.git
cd yolov5
python -m pip install -r requirements.txt tensorflow-cpu
python export.py --weights yolov5m.pt --img 320 --include tflite --int8 --data data/coco128.yaml
  • In the terminal of the host computer, run the following command to push the model to the target device:
scp yolov5m-int8.tflite root@<ip address of the device>:/etc/models/yolov5.tflite
If any model isn’t available after downloading the script file, you can download the model from IoT– Qualcomm AI Hub.
  • In the terminal of the host computer, run the following command to push the model files to the target device:
scp <model_filename> root@< address of target device>:/etc/models

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

Enter SSH shell and copy the YOLOX label files to YOLOv5:
cp /etc/labels/yolox.json /etc/labels/yolov5.json
Run the application:
gst-ai-multi-input-output-object-detection --config-file=/etc/configs/config-multi-input-output-object-detection.json
The sample application uses the /etc/configs/config-multi-input-output-object-detection.json file to read the input parameters.To display all available options:
multi-input-output-object-detection -h
To stop the use case, press CTRL + C.
  • Pull the files from the target device, once you are done running the application:
scp root@<ip address of target device>:/etc/media/out.mp4 <destination directory>

Configurations

The gst-ai-multi-input-output-object-detection application uses the /etc/configs/config-multi-input-output-object-detection file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "num-camera": "<number-of-cameras>",
  "camera-id": "<specific-camera-id>",
  "input-file-path": "<JSON-array-of-input-file>",
  "input-rtsp-path": "<JSON-array-of-RTSP-URLs>",
  "model": "<path-to-model-file>",
  "labels": "<path-to-label-file>",
  "output-file-path": "<path-to-output-file>",
  "output-ip-address": "<ip-address-of-output>",
  "output-port-number": "<port number over which rtsp stream can be listened>",
  "output-display": "<true or false>"
}
Ensure that the total number of input streams from the camera, RTSP, and file source doesn’t exceed 6.
For QCS6490, if file-path and rtsp-ip-port are not present in the configuration file, then the camera input is selected.
File source, LiteRT model, DSP runtime
{
  "input-file-path":
      [
          "/etc/media/video1.mp4",
          "/etc/media/video2.mp4"
      ],
  "model": "/etc/models/yolov5.tflite",
  "labels": "/etc/labels/yolov5.json",
  "output-display": true,
  "output-file-path": "/etc/media/output.mp4",
  "output-ip-address": "127.0.0.1",
  "output-port-number": "8554"
}

Expected Output

Based on the use case, the results are either displayed on an HDMI screen, saved as an H.264 encoded MP4 file, or streamed over the RTSP server.Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the metadata parser pipeline:
PluginDescription
qticamsrc• Captures the live stream from the camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc.
• Followed by qtdemux, which demultiplexes the stream.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc.
• Followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
h264parseParses the H.264 video.
v4l2h264decDecodes the video.
qtimlvconverter• Receives the video stream on its sink pad.
• Performs the following preprocessing on the stream data when the model expects floating-point values as input:
  • Color conversion
  • Scaling (up or down)
  • Normalization
• The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite• Runs on LiteRT and uses the yolov5.tflite model for object detection.
• After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
• Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess• Converts the inference tensors that it receives on its sink pad into video formats that the multimedia plugins can process later.
qtivcomposer• Composes frames with contents from its sink pads.
• Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsinkwaylandsink submits the video stream received on its sink pad to the Wayland compositor.
• Renders the video stream on a local display.
filesinkTakes the video stream that it receives on its sink pad and saves it as an H.264-encoded MP4 file.
qtirtspbin• Serves as a network sink.
• Transmits UDP packets to the network.

Config JSON Field Description

FieldValues/Description
Input sourceUse one of the following input sources:
num-camera: The number of inputs from the camera. Select either 1 or 2.
camera-id: The id of the test camera. Select either 0 or 1.
input-file-path: The directory path to the video file.
input-rtsp-path: The address of the RTSP stream: rtsp://<ip>:<port>/<stream>
Models and labelsmodel: The path to the model file.
labels: The path to the label file.
OutputUse one of the following outputs:
output-file-path: The directory path to save the output file.
output-ip-address: The IP address of the device on which the RTSP stream can be played.
output-port-number: The port number of the device on which the RTSP stream can be played.
output-display: The connected display device for preview.

Known Issues

A drop in fps is observed when the application runs at 1080p resolution on QCS6490.

Parallel Inferencing

The gst-ai-parallel-inference application allows you to perform object detection, object classification, pose detection, and image segmentation on an input stream from different sources such as a camera, a file, or an RTSP network. The use cases implement the LiteRT models for object detection, image segmentation, classification, and pose detection. The following figure shows the pipeline, which receives input streams from a camera, file, or an RTSP stream, performs the parallel inferencing for the four use cases, and displays the results side by side on the screen.
This application isn’t supported in Config #1 for the QLI 2.0 RC3 release because CPU runtime is not supported.
For information about the plugins used in this pipeline, see Pipeline flow. Pipeline Diagram Application: gst-ai-parallel-inference

Input and Output Capabilities

ConfigFile srcRTSPUSB cameraMIPI cameraFile outputDisplayRTSP output
Config #2YesYesNoYesNoYesNo

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRTdetection: yolox_quantized.tflite
classification: inception_v3_quantized.tflite
segmentation: deeplabv3_plus_mobilenet_quantized.tflite
pose: hrnet_pose_quantized.tflite
detection: yolox.json
classification: classification.json
segmentation: deeplabv3_resnet50.json
pose: hrnet_pose.json, hrnet_settings.json

Run the application on the target device

1

Download artifacts

Ensure that you complete the Prerequisites. This downloads all required artifacts to the target device.
2

Run the application

gst-ai-parallel-inference --config-file=/etc/configs/config-parallel-inference.json
The sample application uses the /etc/configs/config-parallel-inference.json file to read the input parameters.To display all available options:
gst-ai-parallel-inference -h
To stop the use case, press CTRL + C.

Configurations

The gst-ai-parallel-inference application uses the /etc/configs/config-parallel-inference.json file. Update its properties to match your model, input stream, and output. See Config JSON Field Description for all fields.
{
  "camera": "<camera-id>",
  "file-path": "<input-video-path>",
  "rtsp-ip-port": "<RTSP-IP-Port-address>",
  "detection-model": "<path-to-detection model>",
  "detection-labels": "<path-to-detection-labels>",
  "pose-model": "<path-to-pose-model>",
  "pose-labels": "<path-to-pose-label>",
  "pose-settings-path": "<path-to-pose-settings-file>",
  "segmentation-model": "<path-to-segmentation-model>",
  "segmentation-labels": "<path-to-segmentation-labels>",
  "classification-model": "<path-to-classification-model>",
  "classification-labels": "<path-to-classification-labels>"
}
For QCS6490, if file-path and rtsp-ip-port are not present in the configuration file, then the camera input is selected.
File source, LiteRT model, DSP runtime
{
    "file-path": "/etc/media/video.mp4",
    "detection-model": "/etc/models/yolox_quantized.tflite",
    "detection-labels": "/etc/labels/yolox.json",
    "pose-model": "/etc/models/hrnet_pose_quantized.tflite",
    "pose-labels": "/etc/labels/hrnet_pose.json",
    "pose-settings-path": "/etc/labels/hrnet_settings.json",
    "segmentation-model": "/etc/models/deeplabv3_plus_mobilenet_quantized.tflite",
    "segmentation-labels": "/etc/labels/deeplabv3_resnet50.json",
    "classification-model": "/etc/models/inception_v3_quantized.tflite",
    "classification-labels": "/etc/labels/classification.json"
}

Expected Output

After performing the four parallel inferences, the results are displayed side by side on the screen.Pipeline Diagram

Pipeline Flow

The following table lists the plugins used in the metadata parser pipeline:
PluginDescription
qticamsrc• Captures the live stream from the camera.
• Uses tee to split the stream for inferencing.
filesrc• Captures the video stream using filesrc.
• Followed by qtdemux, which demultiplexes the stream.
• Uses tee to split the stream for inferencing.
rtspsrc• Captures the RTSP stream using rtspsrc.
• Followed by rtph264depay for video extraction.
• Uses tee to split the stream for inferencing.
h264parseParses the H.264 video.
v4l2h264decDecodes the video.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data when the model expects floating-point values as input:
  • Color conversion
  • Scaling (up or down)
  • Normalization
3. The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite• After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
• Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess — detectiona. Receives the inference tensors from the object detection model.
b. Converts the inference tensors on its sink pad into formats such as video or text that the multimedia plugins can process later.
c. Applies the threshold to the chosen number of results.
d. Loads the corresponding modules for detection models.
In this use case, qtimlpostprocess does the following:
  • Loads the YOLOv8 submodule.
  • Produces results as structures of text.
  • Sends them to the sink pad of qtimetamux.
qtimlpostprocess — classificationa. Receives the inference tensors from the classification model.
b. Converts the inference tensors on its sink pad into formats such as video or text that the multimedia plugins can process later.
c. Applies the threshold to the chosen number of results.
d. Loads the corresponding modules for classification models.
In this use case, qtimlpostprocess does the following:
  • Loads the submodule of the model.
  • Produces results as video frames with classification labels.
  • Sends them to the sink pad of qtivcomposer.
qtimlpostprocess — segmentationa. Receives the inference tensors on its sink pad.
b. Converts the inference tensors into video formats that the multimedia plugins can process later.
c. Produces the semantic segmentations for the frame.
d. Loads the corresponding modules for the segmentation models.
In this use case, qtimlpostprocess does the following:
  • Loads the deeplab-argmax submodule.
  • Produces video frames with segmentation masks.
  • Sends them to the sink pad of qtivcomposer.
qtimlpostprocess — posea. Receives the inference tensors on its sink pad.
b. Converts the inference tensors into video formats that the multimedia plugins can process later.
c. Applies the threshold to the chosen number of results.
d. Loads the corresponding modules for various pose estimation models.
In this use case, qtimlpostprocess does the following:
  • Loads the HRNet module.
  • Produces results as video frames with poses drawn.
  • Sends them to the sink pad of qtivcomposer.
qtivcomposer1. Composes frames with contents from its sink pads.
2. Pushes the GStreamer buffers containing these composed frames to its source pad.
waylandsink1. Submits the video stream received on its sink pad to Weston.
2. Weston renders the video stream on a local display.

Config JSON Field Description

FieldValues/Description
Input sourceUse one of the following input sources:
camera: Primary (0) or secondary (1).
file-path: The directory path to the video file.
rtsp-ip-port: The address of the RTSP stream: rtsp://<ip>:<port>/<stream>
Models and labelsdetection-model: The path to the detection model.
detection-labels: The path to the detection label.
pose-model: The path to the pose model.
pose-labels: The path to the pose labels.
segmentation-model: The path to the segmentation model.
segmentation-labels: The path to the segmentation labels.
classification-model: The path to the classification model.
classification-labels: The path to the classification labels.

Known Issues

  • Identifies the pose of only one person even if many people are present in the frame.
  • The Inception V3 model doesn’t include a person class because it is trained on the ImageNet dataset and supports only image classification.

Hardware benchmarking application

The hardware benchmarking application monitors the device hardware usage for a defined set of sample applications to capture metrics such as CPU/GPU/NPU usage and device thermals. These metrics explain the resource usage and throttling, which help to tune your AI use cases according to the requirements. The following figure shows the pipeline, which processes the input from a set of USB cameras to generate various outputs.
This application isn’t supported in Config #1 for the QLI 2.0 RC3 release because CPU runtime is not supported.
Pipeline Diagram For information about the plugins used in this pipeline, see Pipeline flow.

Sample Model and Label Files

RuntimeModel fileLabel file
LiteRT• inception_v3_quantized.tflite
• deeplabv3_plus_mobilenet_quantized.tflite
• hrnet_pose_quantized.tflite
• midas_quantized.tflite
• yolox_quantized.tflite
• classification.json
• deeplabv3_resnet50.json
• hrnet_pose.json
• monodepth.json
• yolox.json

Setup the target device

1
To access the target device from your Linux host computer, set up SSH. For instructions, see Sign in using SSH.
If SSH is already set up, you can skip this step.
2
Use the HDMI port to connect the display to the device. For instructions, see Set up HDMI display.If you face issues with display, see Troubleshoot display issues.
3
Connect two USB cameras and a mouse to the target device.If you face any issues with the camera or mouse connectivity, update the USB firmware. For more information, see FAQs.
4
Install the Qualcomm® Profiler on the Linux host computer. For installation instructions see Qualcomm Profiler.After connecting the device to the PC, run InstallerLE from the following locations:
  • For Linux:
cd /opt/qcom/Shared/QualcommProfiler/API/target-le
./InstallerLE
  • For Windows:
cd “C:\Program Files (x86)\Qualcomm\Shared\QualcommProfiler\API\target-le”
.\InstallerLE.exe

Run the application on the target device

1
Clone the repository for the demo application and push it to the target device:
git clone https://github.com/Avnet/QCS6490-Vision-AI-Demo.git
cd QCS6490-Vision-AI-Demo
git checkout QLI_2.0
scp -r ../QCS6490-Vision-AI-Demo root@<ip address of target device >:/opt
2
Sign in to the target device over SSH and run the script to set up the resources for hardware benchmarking application:
cd /opt/QCS6490-Vision-AI-Demo && bash install.sh
3
Start the application:
bash launch_visionai_with_env.sh
Pipeline Diagram
4
Select the preferred sample applications from the Camera 1 and Camera 2 drop-downs. The system thermal and hardware usage details appear at the bottom of the screen.You may run different sample applications to check the output and understand the hardware utilization.
  • Example 1: Choose the Camera option from Camera 1 and Camera 2 drop-down lists to observe the preview streams on the screen.
  • Example 2: Choose any sample application from Camera 1 and Camera 2 drop-down lists to observe the AI inferencing camera streams on the screen. Pipeline Diagram
For more information and features of the application, select the Info icon.
5
Select the Exit icon to close the application.

Pipeline Flow

The following table lists the plugins used in the metadata parser pipeline:
PluginDescription
v4l2src• Captures the live stream from the USB camera.
• Uses tee to split the stream for inferencing.
qtimlvconverter1. Receives the video stream on its sink pad.
2. Performs the following preprocessing on the stream data when the model expects floating-point values as input:
  • Color conversion
  • Scaling (up or down)
  • Normalization
3. The tensor stream is used for inferencing in the later stages of the pipeline.
qtimltflite
qtimltsnpe
qtimlqnn
• After the inference runtime receives the tensor stream on its sink pad, it runs the inference.
• Produces a tensor stream with the inference results on its source pad.
qtimlpostprocess — detectiona. Receives the inference tensors from the object detection model.
b. Converts the inference tensors on its sink pad into formats such as video or text that the multimedia plugins can process later.
c. Applies the threshold to the chosen number of results.
d. Loads the corresponding modules for detection models.
In this use case, qtimlpostprocess does the following:
  • Loads the YOLO (YOLOv5, YOLOv8, YOLOX, or YOLO-NAS) submodule.
  • Produces results as structures of text.
  • Sends them to the sink pad of qtimetamux.
qtimlpostprocess — classificationa. Receives the inference tensors from the classification model.
b. Converts the inference tensors on its sink pad into formats such as video or text that the multimedia plugins can process later.
c. Applies the threshold to the chosen number of results.
d. Loads the corresponding modules for classification models.
In this use case, qtimlpostprocess does the following:
  • Loads the submodule of the model.
  • Produces results as video frames with classification labels.
  • Sends them to the sink pad of qtivcomposer.
qtimlpostprocess — segmentationa. Receives the inference tensors on its sink pad.
b. Converts the inference tensors into video formats that the multimedia plugins can process later.
c. Produces the semantic segmentations for the frame.
d. Loads the corresponding modules for the segmentation models.
In this use case, qtimlpostprocess does the following:
  • Loads the deeplab-argmax submodule.
  • Produces video frames with segmentation masks.
  • Sends them to the sink pad of qtivcomposer.
qtimlpostprocess — posea. Receives the inference tensors on its sink pad.
b. Converts the inference tensors into video formats that the multimedia plugins can process later.
c. Applies the threshold to the chosen number of results.
d. Loads the corresponding modules for various pose estimation models.
In this use case, qtimlpostprocess does the following:
  • Loads the HRNet module.
  • Produces results as video frames with poses drawn.
  • Sends them to the sink pad of qtivcomposer.

Known Issues

The following known issues are observed:
  • Unexpected crashes in the device are observed occasionally. Restart the device.
  • GPU usage may be shown as 0 due to Qualcomm Profiler limitations on the platform.
  • Two USB Cameras operating in YUYV space may not work simultaneously. To check if your camera is in YUYV space, see Prerequisite: Obtain image format and size.
  • The CPU and DDR thermals are fixed at 35 ℃ for Dragonwing IQ-8275 and Dragonwing IQ-9075.

Troubleshooting

If any model isn’t available after downloading the script file, you can download the model manually from IoT — Qualcomm AI Hub and push it to the target device:
scp <model filename> root@<ip addr of the target device>:/etc/models
For example:
scp mobilenet_v2_quantized.tflite root@<ip addr of the target device>:/etc/models
Remount the file system with read/write permissions:For Qualcomm Linux:
mount -o remount,rw /usr
For Ubuntu Server:
mount -o remount,rw /
For Ubuntu Server, copy the model files to the user home folder and then use sudo to copy them to the /etc/models directory:
scp <model filename> ubuntu@<ip addr of the target device>:/home/ubuntu
ssh ubuntu@<ip addr of the target device>
sudo cp /home/ubuntu/<model filename> /etc/models
If you cannot locate the qticamsrc plugin, ensure that the camera server is running and clear the GStreamer cache using the following commands:
ps -ef | grep cam-server 
rm ~/.cache/gstreamer-1.0/registry.aarch64.bin 
To enable basic GStreamer logging, run the following before launching the application:
export GST_DEBUG=2 
To increase verbosity for specific plugins, use a comma-separated list with log levels (1–9):
export GST_DEBUG=3,qticamsrc:5,qtimlvconverter:5,qtimltflite:5 
To redirect logs to a file for offline analysis:
export GST_DEBUG=3 
export GST_DEBUG_FILE=/tmp/gst_classification_debug.log