Tensorflow lite inference time

I tried to convert my Pytorch models to TensorFlow Lite with ONNX. But my inference time from TensorFlow Lite is twice as slow as Tensorflow and Pytorch. I run TensorFlow Lite model in google colab and this is my first time using TensorFlow Lite.

Here is my code to convert from Tensorflow to TensorFlow Lite:

converter = tf.lite.TFLiteConverter.from_saved_model("model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
model_lite = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(model_lite)

I used time module from Python to measure the latency of frameworks. I don’t know why my Lite version is slower than the others. Any suggestions will help me a lot.

TFLite is not meant to perform good on the commodity hardware actually. It’s opset is optimized to run faster primarily on mobile hardware. However, if you build TFLite for your platform (preferably with XNNPACK enabled) then you may get some benefits.

Here’s some more information:

1 Like

So when I deploy it mobile this problem may fix? I am testing it with colab but will deploy it to mobile app in the future.

Yes. You should test it on a real mobile device to set the expectations right.

You could also set benchmarks on Firebase Test Labs:

Thanks for your help.