I have quantized mobilnet_v2 model, using ‘dynamic’ and ‘full int’ quantization techniques, converted the models to tflite (used the same code from tensorflow tutorial) and benchmarked the inference (with benchmark_model) time over CPU and GPU on Android mobile phone.
My results are the following :
On GPU :
- Dynamic range quantization is slightly faster : 0.50 ms faster
- Full Int quantization is slightly slower : 0.20 ms slower
On CPU (4 Threads )
- Dynamic range quantization is really slow : 8 ms slower
- Full Int quantization is slightly slower : 0.30 ms slower
Can anyone please tell why in this case quantization is not accelerating the model ? Has anyone encountered the same problem with same/other model/s ?