Hello,

I applied the QAT based training following TensorFlow guidelines (Quantization aware training comprehensive guide | TensorFlow Model Optimization)

After that, I converted the quantized model into tflite format following the TF guideline (see below).

```
with tfmot.quantization.keras.quantize_scope():
quant_aware_model= tf.keras.models.load_model(keras_qat_model_file)
converter = tf.lite.TFLiteConverter.from_keras_model(quant_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
with open(self.output_model_path, 'wb') as f:
f.write(quantized_tflite_model)
```

After converting the quantized model into tflite format, the output model size is, as expected, 4X smaller than the original model, due to the fact that the parameters are in 8 bits instead of float 32 bits.

However, the input and outputs are still in float 32 bits. What is the recommended way to convert the input and output in 8 bits instead of float 32 bits ?

Do I need to follow the same procedure than the one described in post-training quantization ?

(Post-training integer quantization | TensorFlow Lite)