Skip to main content
AIMET is a software toolkit for quantizing and compressing trained ML models. It improves the runtime performance of deep learning models and reduces their compute load and memory footprint for deployment on edge devices. AIMET quantization and compression workflow AIMET supports post-training quantization and fine-tuning techniques to minimize accuracy loss during quantization and compression. It supports models from the ONNX and PyTorch frameworks. For details on using AIMET to improve model accuracy, see the AIMET documentation.