- 32-bit floating-point precision
- 16-bit floating-point precision
- uint8/int8 precision (quantizing models)
| Conversion method | Description |
|---|---|
| Python APIs | Converts, optimizes, and quantizes models to LiteRT format |
| Command-line interface (CLI) tool | Converts models to LiteRT format; suitable for basic conversion only |
Convert models using Python APIs
The following table lists the Python APIs that TensorFlow provides to convert a TensorFlow SavedModel or a Keras model to LiteRT: TensorFlow Python APIs to convert models| API | Description |
|---|---|
tf.lite.TFLiteConverter.from_saved_model() (recommended) | Converts a TensorFlow SavedModel |
tf.lite.TFLiteConverter.from_keras_model() | Converts a Keras model |
Convert a TensorFlow SavedModel using the Python API
The following example converts a TensorFlow model in SavedModel format to LiteRT:The converted LiteRT model is not quantized and its data is in 32-bit floating-point precision.
Convert a Keras model using the Python API
The following example converts a Keras model to LiteRT:The converted LiteRT model is not quantized and its data is in 32-bit floating-point precision.
Quantize models
After converting models to LiteRT format using Python APIs, you can quantize them. Quantization reduces the size and computational requirements of models by converting high-precision values (such as 32-bit floating-point numbers) into lower-precision formats (such as 8-bit integers). Quantization in neural network models involves the following steps:- Quantize weights and biases — These are already part of the trained model and can be quantized without additional input data. This is a static step.
- Quantize activation layers — The ranges for activation layer outputs depend on the input data during forward propagation. A set of sample inputs, known as a calibration or representative dataset, is required to identify the minimum and maximum ranges.
Quantize models using dynamic range quantization
In dynamic range quantization, weights and biases are statically quantized from floating-point to 8-bit integer precision. Activation layer ranges remain in 32-bit floating-point precision. To reduce latency during inference, dynamic-range operators:- Quantize activations based on their ranges to 8-bit integer precision.
- Perform computations with 8-bit weights and activations.
This step only quantizes weights and does not require extra calibration data.
Quantize models using full integer quantization
In full integer quantization, a representative dataset is used to quantize the activation layers within the model. This produces a model more suitable for fixed-point integer hardware, such as the Hexagon Tensor Processor on Qualcomm development kits. The following script converts and quantizes a TensorFlow model to a full integer quantized LiteRT model:supported_ops in the converter sets target_spec to tf.lite.OpsSet.TFLITE_BUILTINS_INT8.Convert models using the tflite_convert command
You can use the tflite_convert CLI tool included in the TensorFlow pip package for offline conversions with TensorFlow v2.x and later.
The
tflite_convert command is suitable for basic conversion only. For post-training integer quantization, use the Python APIs.tflite_convert command requires the --output_file flag and either --saved_model_dir or --keras_model_file. Run tflite_convert --help for the full list of options.

