Tf Lite 2.5 vs 2.7 , setNumThreads(-1) behaves differently?


I’ve been comparing the performance(inference time) of my models between TF Lite 2.5 and 2.7. When working with TF Lite 2.5, I figured that setting the number of threads()(setNumThreads) as -1 worked well on average. The performance matched with using around 4 threads.

However, recently when I started working with TF Lite 2.7, I still set the number of threads as -1. The inference time matched with using 1 thread. Is this expected? I have only tested on Android at the moment.


Hi Uvin

Can you try the same benchmark with the 2.8 version released last week?

And there’s also a change in this api specifically: Release TensorFlow 2.8.0 · tensorflow/tensorflow · GitHub

One factor that I think is related is that XNNPack support was enabled by default for the C++ API; I think that would have been sometime around the TF Lite 2.7 timeframe.

Looking at the source code, I see that num_threads == -1 is treated as single-threaded for XNNPack:

Whereas for Eigen, which is used if XNNPack isn’t enabled, the default is to use 4 threads, and passing num_threads == -1 keeps the default:

So, I suspect some of the operations in your model were previously implemented using Eigen, but with TF Lite 2.7 are now using XNNPack by default, and so you now get 1 thread by default rather than 4.

The documentation leaves the effect of num_threads == -1 deliberately underspecified:

  /// If set to the value -1, the number of threads used
  /// will be implementation-defined and platform-dependent.

I suspect that the intent was that -1 should correspond to a reasonable number of threads that is likely to give good performance. But your mileage may vary, as they say.

My advice: if multithreading is critical to the performance of your app, try calling setNumThreads(4) rather than setNumThreads(-1).