Before you begin: QIM SDK must be installed — see the Installation Guide.

The 5 basic stages of vision AI pipelines

The diagram below shows the five stages of a typical vision AI pipeline.
⚡ Highlighted steps are hardware-accelerated by QIM SDK.

Data Source

Capture frames from the selected media source, ensuring compatibility and efficient data acquisition.

AI Preprocessing

Prepare raw frames for the AI model by resizing, reformatting, and normalizing pixel data into the tensor format the model expects. QIM SDK accelerates this step with the qtimlvconverter plugin, which performs color space conversion and tensor layout transformations directly on the GPU — eliminating CPU bottlenecks before inference.

AI Inference

Run the prepared tensors through the AI model to generate predictions. QIM SDK accelerates inference with the qtimltflite and qtimlqnn plugins, which delegate computation to the on-device NPU or HTP — delivering low-latency, high-throughput inference without taxing the CPU.

Post-Processing

Decode the model’s output tensors into actionable results — such as bounding boxes, class labels, confidence scores, or segmentation masks. QIM SDK’s qtimlpostprocess plugin interprets model outputs and formats them as metadata, which can then be rendered onto video frames using the GPU-accelerated qtivoverlay plugin.

Using the AI Metadata

Use the structured AI metadata output to drive downstream workflows. Common applications include:

Overlay on video — draw bounding boxes, labels, or masks directly on frames for real-time visualization
Stream enrichment — embed metadata into RTSP or WebRTC streams for remote monitoring
Cloud/edge messaging — publish to MQTT, Kafka, or other backends for storage, alerting, or further analysis
Automated actions — trigger events or control external systems based on detection results

⌘I

​The 5 basic stages of vision AI pipelines

The 5 basic stages of vision AI pipelines