Can anyone help me understand what exactly is done when quantizing an LSTM layer (allowing only integers). I’m very familiar with the mathematical operations that are performed in a classic LSTM cell, but I’m having trouble understanding how sigmoids and tanhs are performed with lookup tables, and when rescaling is performed to avoid values that are too large and thus memory overruns. Also I don’t get what you mean when saying The input activations are quantized as uint8 on the interval [-1, 127/128]. I think I’ve found the source code here : https://github.com/tensorflow/tflite-micro/blob/be11bd79e4a8b28c9ee92c6f02ca0e85414fb768/tensorflow/lite/kernels/internal/reference/lstm_cell.h#L143

Hi @Lisa, The quantization is the process of reducing the number of bits used to represent weights and activations in a neural network. This is done by scaling the values by a quantization factor and rounding them to the nearest integer values.

The look up table contains the pre-computed values of the activation function for a range of possible inputs. During runtime, the function value for a given input is retrieved from this table. please refer to this document to know more about lookup table.

If the scaled values before quantization are too large for the target data causing overflow. Rescaling will ensure the values to be within the representable limits.

All the floating point values will be represented between -1 to 127/128. Thank You.