My question is related to quantize the activations to e.g. int8 using a representative dataset. Assume a sigmoid activation function in the following.
What confuses me: Which parameters are scaled&shifted / quantized?
I understand that we monitor the common output activations to determine min and max ranges. What do we do with these values? My intuition would say, that we need to scale the entire activation function, because in case of a sigmoid, quantized input (e.g. some high int8 value caused by quantized weights) would always lead to saturated values (near 1 or 0).
Would be glad if any could clear my confusion.