Why PQT is better than QAT?

Hello everyone,

I wanted to compare the performances between QAT and PQT on my model.
I know that QAT performs better than PTQ but I have a better accuracy with PTQ.

So, I replicated this exercise:

using PQT and I obtained these results: QAT: 0.9636, PQT: 0.9615 (accuracy on TFLite).

How is it possible?

Thanks.

Hi @andre105, I have tried to train the model and have verified the accuracy of QAT and PTQ models I got the accuracy QAT accuracy as 0.96170, PTQ accuracy as 0.9611. The both QAT, PTQ are almost similar as in your case.

As you mentioned PTQ gives better accuracy than QAT could you please let us know which type of quantization in QAT and PTQ you have compared. Thank You.

Hi @Kiran_Sai_Ramineni.
In my project (not the abovementioned replicated exercise) I used:

  1. PTQ (dynamic range quantization) quantizing weights according to these instructions: (Quantização pós-treinamento  |  TensorFlow Model Optimization);
  2. QAT according to these instructions: (Quantization aware training in Keras example  |  TensorFlow Model Optimization).
    First of all, I trained my model for 100 epochs. Then, I obtained the quantization aware model (not yet quantized) because I wanted to quantize the entire model. So, I trained the model (q_aware_model.fit) for about 50 epochs (the MAE - mean absolute error - was no longer decreasing on the validation set). Finally I obtained the “qat_model.tflite” file.

I obtained these MAEs: 6.712 (qat_model.tflite); 5.681 (pqt_model.tflite).
I tried lots of attempts but in no case QAT gives better results than PQT. I don’t know why.
Moreover, I saved my two quantized models and I noticed the size of “ptq_model.tflite” file (the post-training quantized model) is 38 KB, whereas the size of “qat_model.tflite” file (the quantized model with QAT) is 30 KB. Why? (maybe because the activations are quantized dynamically at inference?)