Tensorflow lite inference time

vanduong0504 · July 23, 2021, 4:09am

I tried to convert my Pytorch models to TensorFlow Lite with ONNX. But my inference time from TensorFlow Lite is twice as slow as Tensorflow and Pytorch. I run TensorFlow Lite model in google colab and this is my first time using TensorFlow Lite.

Here is my code to convert from Tensorflow to TensorFlow Lite:

converter = tf.lite.TFLiteConverter.from_saved_model("model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
model_lite = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(model_lite)

I used time module from Python to measure the latency of frameworks. I don’t know why my Lite version is slower than the others. Any suggestions will help me a lot.

Sayak_Paul · July 23, 2021, 5:29am

TFLite is not meant to perform good on the commodity hardware actually. It’s opset is optimized to run faster primarily on mobile hardware. However, if you build TFLite for your platform (preferably with XNNPACK enabled) then you may get some benefits.

Here’s some more information:

vanduong0504 · July 23, 2021, 6:32am

So when I deploy it mobile this problem may fix? I am testing it with colab but will deploy it to mobile app in the future.

Sayak_Paul · July 23, 2021, 6:45am

Yes. You should test it on a real mobile device to set the expectations right.

You could also set benchmarks on Firebase Test Labs:

vanduong0504 · July 23, 2021, 6:50am

Thanks for your help.