Before you begin: QIM SDK must be installed — see the Installation Guide.
The 5 basic stages of vision AI pipelines
The diagram below shows the five stages of a typical vision AI pipeline.⚡ Highlighted steps are hardware-accelerated by QIM SDK.
Data Source
Capture frames from the selected media source, ensuring compatibility and efficient data acquisition.
AI Preprocessing
Prepare raw frames for the AI model by resizing, reformatting, and normalizing pixel data into the tensor format the model expects. QIM SDK accelerates this step with the qtimlvconverter plugin, which performs color space conversion and tensor layout transformations directly on the GPU — eliminating CPU bottlenecks before inference.
AI Inference
Run the prepared tensors through the AI model to generate predictions. QIM SDK accelerates inference with the qtimltflite and qtimlqnn plugins, which delegate computation to the on-device NPU or HTP — delivering low-latency, high-throughput inference without taxing the CPU.
Post-Processing
Decode the model’s output tensors into actionable results — such as bounding boxes, class labels, confidence scores, or segmentation masks. QIM SDK’s qtimlpostprocess plugin interprets model outputs and formats them as metadata, which can then be rendered onto video frames using the GPU-accelerated qtivoverlay plugin.
Using the AI Metadata
Use the structured AI metadata output to drive downstream workflows. Common applications include:
- Overlay on video — draw bounding boxes, labels, or masks directly on frames for real-time visualization
- Stream enrichment — embed metadata into RTSP or WebRTC streams for remote monitoring
- Cloud/edge messaging — publish to MQTT, Kafka, or other backends for storage, alerting, or further analysis
- Automated actions — trigger events or control external systems based on detection results

