Specify precision in TFLite models

When convert FP TF models to TFlite models, how can I specify the precision of each operations? I understand how to specify the integer precision of weights and activations, but how can I get information (and set) how each operation is computed? For example, if I have an element-wise addition, how do I know if the addition is computed in 8, 16, or 32bits?

As a practical example, say I need to compute z=x+y for a residual connection. x and y are both 8 bits tensors from previous conv layers. How can I compute x+y in 16 bits? It seems to me that TFlite hasn’t offered such flexibility?

Hi @Wenjie_Lu ,

May be below details help you to understand:

  • To compute z=x+y in 16 bits while x and y are 8-bit tensors:
    • QAT: Inject quantization nodes during training for the addition operation, potentially guiding the model to learn with 16-bit precision for this part.
    • Custom Operation: Create a custom 16-bit addition operation and register it with the interpreter.
    • CPU Delegation: Delegate the addition operation to the CPU, ensuring it’s performed in floating-point precision.

I Hope it helps you.

Thanks.