> ## Documentation Index
> Fetch the complete documentation index at: https://dragonwingdocs.qualcomm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Export a TensorFlow model to LiteRT

> Convert a TensorFlow or Keras model to LiteRT format and quantize it for deployment on Qualcomm Dragonwing IoT platforms.

You can convert TensorFlow models to LiteRT format and optimize them for on-device inference. For more information about LiteRT model conversion, see [Model conversion overview](https://ai.google.dev/edge/litert/models/convert).

LiteRT model conversion supports the following output precisions:

* 32-bit floating-point precision
* 16-bit floating-point precision
* uint8/int8 precision (quantizing models)

The following table lists the conversion methods available in the TensorFlow framework:

**TensorFlow model conversion methods**

| Conversion method                 | Description                                                          |
| --------------------------------- | -------------------------------------------------------------------- |
| Python APIs                       | Converts, optimizes, and quantizes models to LiteRT format           |
| Command-line interface (CLI) tool | Converts models to LiteRT format; suitable for basic conversion only |

The Python APIs offer more flexibility to convert, optimize, and quantize models to suit your requirements.

## Convert models using Python APIs

The following table lists the Python APIs that TensorFlow provides to convert a TensorFlow SavedModel or a Keras model to LiteRT:

**TensorFlow Python APIs to convert models**

| API                                                        | Description                      |
| ---------------------------------------------------------- | -------------------------------- |
| `tf.lite.TFLiteConverter.from_saved_model()` (recommended) | Converts a TensorFlow SavedModel |
| `tf.lite.TFLiteConverter.from_keras_model()`               | Converts a Keras model           |

### Convert a TensorFlow SavedModel using the Python API

The following example converts a TensorFlow model in SavedModel format to LiteRT:

```python theme={null}
import tensorflow as tf

# Convert the model
saved_model_dir = "/path/to/tf/model/in/saved_model/format"
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()

# Save the model
with open("model.tflite", "wb") as f:
    f.write(tflite_model)
```

<Note>
  The converted LiteRT model is not quantized and its data is in 32-bit floating-point precision.
</Note>

### Convert a Keras model using the Python API

The following example converts a Keras model to LiteRT:

```python theme={null}
import tensorflow as tf

# Create a model using high-level tf.keras.* APIs
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units=1, input_shape=[1]),
    tf.keras.layers.Dense(units=16, activation='relu'),
    tf.keras.layers.Dense(units=1)
])

# Compile and train the model
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(x=[-1, 0, 1], y=[-3, -1, 1], epochs=5)

# Convert the model to LiteRT
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the model
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)
```

<Note>
  The converted LiteRT model is not quantized and its data is in 32-bit floating-point precision.
</Note>

## Quantize models

After converting models to LiteRT format using Python APIs, you can quantize them. Quantization reduces the size and computational requirements of models by converting high-precision values (such as 32-bit floating-point numbers) into lower-precision formats (such as 8-bit integers).

Quantization in neural network models involves the following steps:

1. **Quantize weights and biases** — These are already part of the trained model and can be quantized without additional input data. This is a static step.

2. **Quantize activation layers** — The ranges for activation layer outputs depend on the input data during forward propagation. A set of sample inputs, known as a calibration or representative dataset, is required to identify the minimum and maximum ranges.

To quantize a TensorFlow floating-point model to a quantized LiteRT model, LiteRT provides post-training quantization techniques. For more information, see [Post-training quantization](https://ai.google.dev/edge/litert/models/post_training_quantization).

LiteRT supports the following types of post-training quantization:

* [Dynamic range quantization](#quantize-models-using-dynamic-range-quantization)
* [Full integer quantization](#quantize-models-using-full-integer-quantization)

<h3 id="quantize-models-using-dynamic-range-quantization">
  Quantize models using dynamic range quantization
</h3>

In dynamic range quantization, weights and biases are statically quantized from floating-point to 8-bit integer precision. Activation layer ranges remain in 32-bit floating-point precision.

To reduce latency during inference, dynamic-range operators:

* Quantize activations based on their ranges to 8-bit integer precision.
* Perform computations with 8-bit weights and activations.

<Note>
  This step only quantizes weights and does not require extra calibration data.
</Note>

The following script converts a TensorFlow model to LiteRT and applies dynamic range quantization:

```python theme={null}
import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(exp_model_path)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model = converter.convert()
save_name = 'quantized_model.tflite'

print('Saving Dynamic Quantized LiteRT model')

with open(save_name, 'wb') as f:
    f.write(tflite_model)
```

<h3 id="full-integer-quantization">
  Quantize models using full integer quantization
</h3>

In full integer quantization, a representative dataset is used to quantize the activation layers within the model. This produces a model more suitable for fixed-point integer hardware, such as the Hexagon Tensor Processor on Qualcomm development kits.

The following script converts and quantizes a TensorFlow model to a full integer quantized LiteRT model:

```python theme={null}
import tensorflow as tf

def representative_dataset():
    for data in dataset:
        yield {"image": data.image, "bias": data.bias}

saved_model_dir = "/path/to/saved/model"

# Load the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Set the representative dataset for quantization
converter.representative_dataset = representative_dataset

# For full integer quantization, set target_spec to TFLITE_BUILTINS_INT8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8   # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8

# Convert the model
tflite_quant_model = converter.convert()
save_name = 'quantized_model_int8.tflite'

print('Saving Quantized LiteRT model')

with open(save_name, 'wb') as f:
    f.write(tflite_quant_model)
```

<Note>
  `supported_ops` in the converter sets `target_spec` to `tf.lite.OpsSet.TFLITE_BUILTINS_INT8`.
</Note>

## Convert models using the `tflite_convert` command

You can use the `tflite_convert` CLI tool included in the TensorFlow pip package for offline conversions with TensorFlow v2.x and later.

<Note>
  The `tflite_convert` command is suitable for basic conversion only. For post-training integer quantization, use the Python APIs.
</Note>

The `tflite_convert` command requires the `--output_file` flag and either `--saved_model_dir` or `--keras_model_file`. Run `tflite_convert --help` for the full list of options.

### Convert a SavedModel

To convert a TensorFlow model in SavedModel format, run:

```shell theme={null}
tflite_convert \
  --saved_model_dir=/tmp/mobilenet_saved_model \
  --output_file=/tmp/mobilenet.tflite
```

### Convert a Keras H5 model

To convert a Keras H5 model, run:

```shell theme={null}
tflite_convert \
  --keras_model_file=/tmp/mobilenet_keras_model.h5 \
  --output_file=/tmp/mobilenet.tflite
```
