How does full integer quantization work?

Hi, I am new to Tensorflow Lite. I plan to deploy an 8-bit LSTM on a self-developed chip. I would love to use the full integer quantization for simplicity. But I cannot fully understand how the full integer quantization work. Can I get the quantized weights of the LSTM and use them for my computation on the chip?

The problem is that it seems that each tensor has a scale and zero point. I do not know how the operation between such two tensors works. Take the multiplication for example, does the multiplication of two 8-bit quantized tensors simply their multiplication? Do the scale and zero point play roles here?

Thanks in advance.

Overview. Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is valuable for low-power devices such as microcontrollers.