Export a TensorFlow model to LiteRT - Qualcomm Dragonwing Documentation

You can convert TensorFlow models to LiteRT format and optimize them for on-device inference. For more information about LiteRT model conversion, see Model conversion overview. LiteRT model conversion supports the following output precisions:

32-bit floating-point precision
16-bit floating-point precision
uint8/int8 precision (quantizing models)

The following table lists the conversion methods available in the TensorFlow framework: TensorFlow model conversion methods

Conversion method	Description
Python APIs	Converts, optimizes, and quantizes models to LiteRT format
Command-line interface (CLI) tool	Converts models to LiteRT format; suitable for basic conversion only

The Python APIs offer more flexibility to convert, optimize, and quantize models to suit your requirements.

Convert models using Python APIs

The following table lists the Python APIs that TensorFlow provides to convert a TensorFlow SavedModel or a Keras model to LiteRT: TensorFlow Python APIs to convert models

API	Description
`tf.lite.TFLiteConverter.from_saved_model()` (recommended)	Converts a TensorFlow SavedModel
`tf.lite.TFLiteConverter.from_keras_model()`	Converts a Keras model

Convert a TensorFlow SavedModel using the Python API

The following example converts a TensorFlow model in SavedModel format to LiteRT:

import tensorflow as tf

# Convert the model
saved_model_dir = "/path/to/tf/model/in/saved_model/format"
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()

# Save the model
with open("model.tflite", "wb") as f:
    f.write(tflite_model)

The converted LiteRT model is not quantized and its data is in 32-bit floating-point precision.

Convert a Keras model using the Python API

The following example converts a Keras model to LiteRT:

import tensorflow as tf

# Create a model using high-level tf.keras.* APIs
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units=1, input_shape=[1]),
    tf.keras.layers.Dense(units=16, activation='relu'),
    tf.keras.layers.Dense(units=1)
])

# Compile and train the model
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(x=[-1, 0, 1], y=[-3, -1, 1], epochs=5)

# Convert the model to LiteRT
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the model
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

The converted LiteRT model is not quantized and its data is in 32-bit floating-point precision.

Quantize models

After converting models to LiteRT format using Python APIs, you can quantize them. Quantization reduces the size and computational requirements of models by converting high-precision values (such as 32-bit floating-point numbers) into lower-precision formats (such as 8-bit integers). Quantization in neural network models involves the following steps:

Quantize weights and biases — These are already part of the trained model and can be quantized without additional input data. This is a static step.
Quantize activation layers — The ranges for activation layer outputs depend on the input data during forward propagation. A set of sample inputs, known as a calibration or representative dataset, is required to identify the minimum and maximum ranges.

To quantize a TensorFlow floating-point model to a quantized LiteRT model, LiteRT provides post-training quantization techniques. For more information, see Post-training quantization. LiteRT supports the following types of post-training quantization:

Dynamic range quantization
Full integer quantization

Quantize models using dynamic range quantization

In dynamic range quantization, weights and biases are statically quantized from floating-point to 8-bit integer precision. Activation layer ranges remain in 32-bit floating-point precision. To reduce latency during inference, dynamic-range operators:

Quantize activations based on their ranges to 8-bit integer precision.
Perform computations with 8-bit weights and activations.

This step only quantizes weights and does not require extra calibration data.

The following script converts a TensorFlow model to LiteRT and applies dynamic range quantization:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(exp_model_path)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model = converter.convert()
save_name = 'quantized_model.tflite'

print('Saving Dynamic Quantized LiteRT model')

with open(save_name, 'wb') as f:
    f.write(tflite_model)

Quantize models using full integer quantization

In full integer quantization, a representative dataset is used to quantize the activation layers within the model. This produces a model more suitable for fixed-point integer hardware, such as the Hexagon Tensor Processor on Qualcomm development kits. The following script converts and quantizes a TensorFlow model to a full integer quantized LiteRT model:

import tensorflow as tf

def representative_dataset():
    for data in dataset:
        yield {"image": data.image, "bias": data.bias}

saved_model_dir = "/path/to/saved/model"

# Load the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Set the representative dataset for quantization
converter.representative_dataset = representative_dataset

# For full integer quantization, set target_spec to TFLITE_BUILTINS_INT8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8   # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8

# Convert the model
tflite_quant_model = converter.convert()
save_name = 'quantized_model_int8.tflite'

print('Saving Quantized LiteRT model')

with open(save_name, 'wb') as f:
    f.write(tflite_quant_model)

supported_ops in the converter sets target_spec to tf.lite.OpsSet.TFLITE_BUILTINS_INT8.

Convert models using the `tflite_convert` command

You can use the tflite_convert CLI tool included in the TensorFlow pip package for offline conversions with TensorFlow v2.x and later.

The tflite_convert command is suitable for basic conversion only. For post-training integer quantization, use the Python APIs.

The tflite_convert command requires the --output_file flag and either --saved_model_dir or --keras_model_file. Run tflite_convert --help for the full list of options.

Convert a SavedModel

To convert a TensorFlow model in SavedModel format, run:

tflite_convert \
  --saved_model_dir=/tmp/mobilenet_saved_model \
  --output_file=/tmp/mobilenet.tflite

Convert a Keras H5 model

To convert a Keras H5 model, run:

tflite_convert \
  --keras_model_file=/tmp/mobilenet_keras_model.h5 \
  --output_file=/tmp/mobilenet.tflite

​Convert models using Python APIs

​Convert a TensorFlow SavedModel using the Python API

​Convert a Keras model using the Python API

​Quantize models

​Quantize models using dynamic range quantization

​Quantize models using full integer quantization

​Convert models using the tflite_convert command

​Convert a SavedModel

​Convert a Keras H5 model

Convert models using Python APIs

Convert a TensorFlow SavedModel using the Python API

Convert a Keras model using the Python API

Quantize models

Quantize models using dynamic range quantization

Quantize models using full integer quantization

Convert models using the `tflite_convert` command

Convert a SavedModel

Convert a Keras H5 model