TFLiteConverter adds (de)quantization blocks before(and after) operations on a weight variable

I’m moving the discussion of this issue here.

Briefly, I’m trying to convert a TensorFlow model to TFLite. This model should maintain a non-trainable state vector that I update at every inference, similar to an RNN. My issue is that when I try to convert it to TFLite, this state vector doesn’t seem to get interpreted as a quantized value, so the network adds quantization and dequantization operations every time it needs to read/write from it.

The model, and corresponding Netron graph, are found in the referenced GitHub issue. They can be found and reproduced with this gist.

Ultimately, I would like to deploy this on a Coral EdgeTPU, but I would like to minimize unnecessary ops, such as the quantize/dequantize blocks. This should be possible since I want the state vector to be int8 and not float32 in the TFLIte model. How can I do that?

@Malek_Itani,

Welcome to the Tensorflow Forum!

Sorry for the delay in responding. We see there is an active discussion about this issue here #59390.

Please let us know if you need any assistance here.

Thank you!

Hi @chunduriv

Thank you for the reply. I started this discussion here out of a suggestion from, what was then, the issue assignee to the GitHub issue you’ve linked. I still haven’t found a solution to the problem. If you have ideas or insights, I would love to hear them.

Thank you