INT16 quantization fully connected layers

Balasubramaniam_M_C · July 11, 2022, 1:52am

I’m trying to apply different precision for different layers- say CONV2D and fully_connected.
I’ve tried to quantize dense layers with int16 by modifying the “LastValuesQuantizer”, but I’m getting the following error on allocating tensors after converting:

File “/home/shivaubuntu/.local/lib/python3.8/site-packages/tensorflow/lite/python/interpreter.py”, line 513, in allocate_tensors
return self._interpreter.AllocateTensors()
RuntimeError: tensorflow/lite/kernels/fully_connected.cc:166 input->type != kTfLiteFloat32 (INT8 != FLOAT32)Node number 1 (FULLY_CONNECTED) failed to prepare.Failed to apply the default TensorFlow Lite delegate indexed at 0

colab link: Google Colab

Amin_Jabari · July 11, 2022, 8:42am

As pointed out in [6], the first and last layers of a neural network are the most sensitive to quantization and have a relatively small number of computations. Therefore, those two layers can be computed in floating-point precision to achieve higher accuracy without a significant loss in speed.

Balasubramaniam_M_C · July 11, 2022, 5:38pm

Yeah, I’ve tried to add different “un-quantized” layers before and after it and tried to only quantize this layer with int16, it still gives the same issue. I mainly wanted to know is int16 quantization supported in tensorflow as of now, because I want to implement varying quantization for different layers. I’m not worried about accuracy yet.