No model size reduction in Tflite model size with integer Quantisation

I have trained a PyTorch model, which I converted to Keras using the pytorch2keras lib.

using this Keras model to convert to tflite. I want to run the tflite on coral devices.

things which i noticed :

  • Keras model size (57.6MB)
  • Using dynamic range quantization the generated tflite is of size(15MB)
  • Using integer only quantization the generated tflite is also of size(15MB)

Ideally, we should be able to reduce the model size even further as we convert fp32 to int8

Can Anyone help me to understand why this happening?

sharing my conversion notebook and Keras modelkeras_model : data.zip - Google Drive

@letdive_deep The link shared by you requires access.

@Divvya_Saxena, the link has open access, you can download the files

For the most part the model is dominated by weights. Using dynamic range quantization the weights are stored in int8, the same as in integer-only quantization. So we wouldnt expect to see the full-int8 model be dramatically smaller than the dynamic range quantized model. (only the bias terms should make up the difference)

Actually, you should make sure that your model is actually fully integer (did you provide a representative dataset?)

Yes @David_Rim i had provided the represenation dataset

  def representative_data_gen():
    dataset_list = tf.data.Dataset.list_files('/home/ubuntu/livesense/lane_detection/GCO_BDD/bdd_images/bbox_images/*')
for i in range(100):
    image = next(iter(dataset_list))
    image = tf.io.read_file(image)
    image = tf.io.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, [img_h, img_w])
    image = tf.cast(image / 255., tf.float32)
    image = tf.expand_dims(image, 0)
    image = tf.reshape(image,(1,3,288,352))
    print(" reshape shape :",image.shape)
    print(i)
    yield [image]


def frozen_to_tflite_quant(fname):
    
    path="./frozen_models/"+fname+"_frozen_graph.pb"
    
    filename = fname +"_frozen_tflite_quant.tflite"

    converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(path,  # TensorFlow freezegraph .pb model file
                                                          input_arrays=['x'], # name of input arrays as defined in torch.onnx.export function before.
                                                          output_arrays=['Identity'] # name of output arrays defined in torch.onnx.export function before.
                                                          )

    converter.optimizations = [tf.compat.v1.lite.Optimize.DEFAULT]
    
     # And this sets the representative dataset so we can quantize the activations
    converter.representative_dataset = representative_data_gen

    # converter.experimental_new_converter = True
    
    # This ensures that if any ops can't be quantized, the converter throws an error
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    
    # These set the input and output tensors to uint8
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.uint8
    
    tf_lite_model = converter.convert()
    # Save the model.
    with open(filename, 'wb') as f:
        f.write(tf_lite_model)