How does full integer quantization work?

Hi, I am new to Tensorflow Lite. I plan to deploy an 8-bit LSTM on a self-developed chip. I would love to use the full integer quantization for simplicity. But I cannot fully understand how the full integer quantization work. Can I get the quantized weights of the LSTM and use them for my computation on the chip?

The problem is that it seems that each tensor has a scale and zero point. I do not know how the operation between such two tensors works. Take the multiplication for example, does the multiplication of two 8-bit quantized tensors simply their multiplication? Do the scale and zero point play roles here?

Thanks in advance.

Hi @Yang_Lin,

Hope you understood the concept of full integer quantization by this time. Here are few more references ref1, ref2 to better understand the concept quantization, scale and zero point. Please let us know for further assistance.

Thank You