INT16 quantization fully connected layers

I’m trying to apply different precision for different layers- say CONV2D and fully_connected.
I’ve tried to quantize dense layers with int16 by modifying the “LastValuesQuantizer”, but I’m getting the following error on allocating tensors after converting:

File “/home/shivaubuntu/.local/lib/python3.8/site-packages/tensorflow/lite/python/”, line 513, in allocate_tensors
return self._interpreter.AllocateTensors()
RuntimeError: tensorflow/lite/kernels/ input->type != kTfLiteFloat32 (INT8 != FLOAT32)Node number 1 (FULLY_CONNECTED) failed to prepare.Failed to apply the default TensorFlow Lite delegate indexed at 0

colab link: Google Colab

As pointed out in [6], the first and last layers of a neural network are the most sensitive to quantization and have a relatively small number of computations. Therefore, those two layers can be computed in floating-point precision to achieve higher accuracy without a significant loss in speed.

Yeah, I’ve tried to add different “un-quantized” layers before and after it and tried to only quantize this layer with int16, it still gives the same issue. I mainly wanted to know is int16 quantization supported in tensorflow as of now, because I want to implement varying quantization for different layers. I’m not worried about accuracy yet.