So… GStreamer pipelines?
The IM SDK is built on top of GStreamer. GStreamer is a multimedia framework that lets you describe a processing pipeline for video or audio, and it takes care of running each step in order. In “normal Python” you might write OpenCV code that grabs a frame from a webcam, resizes and crops it, calls into an inference function, draws bounding boxes on the result, and then outputs or displays the frame again — with each step running on the CPU unless you explicitly wire up GPU/NPU APIs yourself. With GStreamer + IM SDK, you declare that same sequence in a pipeline string, and the framework streams frames through the chain for you. What IM SDK adds on Qualcomm hardware is the ability for those steps to be transparently accelerated: resize/crop and drawing bounding boxes can run on the GPU, inference can run on the NPU, and whole chains of operations (e.g. crop → resize → NN inference) can execute without ever yielding back to the CPU (zero-copy). From your application you only need to configure the pipeline; the underlying framework handles frame-by-frame scheduling, synchronization, and accelerator offload. The IM SDK provides the special GStreamer plugins that make this possible. For example,qtivtransform offloads color conversion, cropping, and resizing to the GPU, while qtimltflite handles inference on the NPU. This way, the same high-level pipeline you’d write with standard GStreamer can now run almost entirely on dedicated accelerators, giving you real-time throughput with minimal CPU load.
Setting up GStreamer and the IM SDK
Alright, let’s go build some applications using the IM SDK.-
Install GStreamer, the IM SDK and some extra dependencies we’ll need in this example. Open the terminal on your development board, or an SSH session to your development board, and run:
-
Get the python examples, extract them, create a venv, and install their dependencies:
-
You’ll need a camera (either built-in, like on the RB3 Gen 2 Vision Kit), or a USB webcam.
-
If you want to use a USB webcam:
-
Find out the device ID:
-
Set the environment variable (we’ll use this in our examples):
-
Find out the device ID:
-
If you’re on the RB3 Gen 2 Vision Kit, and want to use the built-in camera:
-
If you want to use a USB webcam:
Example 1: Resizing and cropping on GPU vs. CPU
Let’s show how much faster working on the GPU can be compared to the CPU. If you have a neural network that expects a 224x224 RGB input, you’ll need to preprocess your data: first, grab the frame from the webcam (e.g. native resolution is 1980x1080), then crop it to a 1/1 aspect ratio (e.g. crop to 1080x1080), then resize to the desired resolution (224x224), and finally create a Numpy array from the pixels.-
Create a new file
ex1.py, and add: -
Let’s launch the python script. This pipeline runs on the CPU (using vanilla GStreamer components):
Here you see the resize/crop takes 22ms (measured on IQ9 with USB camera).
-
Now let’s make this run on the GPU instead… Replace:
With:Here is the complete file
ex1_imsdk.py: -
Run this again:
🚀 You’ve now sped up the crop/resize operation from ~22ms to ~6ms; with just two lines of code!
Example 2: Tee’ing streams and multiple outputs
In the pipeline above you’ve seen a few elements that will be relevant when interacting with your own code:- Identity elements (e.g.
identity name=frame_ready_webcam silent=false). These can be used to debug timing in a pipeline. The timestamp when they’re emitted is saved, and then returned at the end of the pipeline in themarkselement (k/v pair, key is the identity name, value is the timestamp). - Appsink elements (e.g.
appsink name=frame). These are used to send data from a GStreamer pipeline to your application. Here the element before the appsink is avideo/x-raw,format=RGB,width=224,height=224- so we’ll send a 224x224 RGB array to Python. You receive these in theframes_by_sinkelement (k/v pair, key is the appsink name, value is the data).
identity name=frame_ready_webcam; and send one part to a new appsink; and the other part through the resize/crop pipeline.
-
Create a new file
ex2.pyand add: -
Run this python:
(The
out/directory has the last processed frames in both original and resized resolutions)
Example 3: Run a neural network
Now that we have images streaming from the webcam in the correct resolution, let’s add a neural network to the mix.The following workflows only work on Ubuntu Server edition, not on Ubuntu Desktop OS.
3.1: Neural network and compositing in Python
-
First we’ll do a “normal” implementation, where we take the resized frame from the IM SDK pipeline, and then use LiteRT to run the model (on the NPU). Afterwards we’ll then draw the top prediction on the image and write it to disk. Create a new file
ex3_from_python.pyand add: -
Now run this application:
Absolutely not bad, but let’s see if we can do better…

3.2: Running the neural network with IM SDK
Let’s move the neural network inference to the IM SDK. You do this through three plugins:qtimlvconverter- to convert the frame into an input tensor.qtimltflite- to run a neural network (in LiteRT format). If you send these results over an appsink you’ll get the exact same tensor back as earlier (you just didn’t need to hit the CPU to invoke the inference engine).- An element like
qtimlpostprocessto interpret the output. Here this plugin is made for image classification usecases (like the SqueezeNet model we use) with a(1, n)shape. This plugin outputs either text (with the predictions), or an overlay (to draw onto the original image).
Note: This element has a particular label format (see below).
-
Create a new file
ex3_nn_imsdk.pyand add:
RGB to NV12 format here (after qtivtransform), as qtimltflite requires a tightly packed buffer - and the RGB output uses row-stride padding. These issues can be very hard to debug. Adding GST_DEBUG=3 before your command (e.g. GST_DEBUG=3 python3 ex3_nn_imsdk.py) and feeding the pipeline and error into an LLM like ChatGPT can sometimes help you troubleshoot if needed.
module=mobilenet-softmax: This is used by qtimlpostprocess for classification models whose output is a FLOAT32 1×N logits vector. It applies Softmax to normalize logits into probabilities.
-
Now run this application:
OK! The model now runs on the NPU inside the IM SDK pipeline. If you rather have the top 5 outputs (like we did in 3.1), you can tee the stream after the
qtimltfliteelement and send the raw output tensor back to the application as well.
3.3: Overlays
To mimic the output in 3.1 we also want to draw an overlay. Let’s first demonstrate that with a static overlay image.-
Download a semi-transparent image (source):
-
Create a new file
ex3_overlay.pyand add: -
Run this application:

3.4: Combining neural network with overlay
You’ve now seen how to run a neural network as part of an IM SDK pipeline; and you’ve seen how to draw overlays. Let’s combine these into a single pipeline, where we overlay the prediction onto the image - all without ever touching the CPU.-
Create a new file
ex3_from_imsdk.pyand add: -
Run this application:
Great! This whole pipeline now runs in the IM SDK. You can find the output file in
out/imsdk-webcam-nn-overlay.mp4.
Troubleshooting
Pipeline does not yield anything
If you don’t see any output, addGST_DEBUG=3 to see more detailed debug info.

