Will the ARM Mali GPU accelerate a Tensor Flow Lite model

Hello - I am developing an object detection model in Python closely following Google’s examples. This runs on a Raspberry Pi 4 and works quite well. There are many other SBCs and many have the ARM Mali G610 GPU and other processing elements included - such as the Orange Pi 5. When Colab isn’t broken I train the model there, currently I am using my Mac.

Simple question, hoping for a simple answer - will a .tflite model make use of these GPU/NPUs and run more frames a second or be able to run a more detailed evaluation in the same time? Ideally this can all be done from Python without having to recompile any sources. Bizarrely I found nothing on the 'net about this, I never thought this was a fringe activity.

Thank you in advance.

Hi @Charry2014, Yes, TensorFlow Lite enables the use of GPUs and other specialized processors through hardware driver called delegates. Thank You.

Thank you - that is helpful, to a degree, but still I am a little confused. For ARM platforms this would apparently run through the ARM NN delegate, it seems. As this works on any ARM SoC according to the docs, including that of the stock Raspberry, I am wondering if this just happens by magic, under the hood? ie. the framework is smart enough to figure out what hardware it is running on, what CPU, GPU, NPU is there and just extract the best from it?

My knowledge is a bit outdated, but I wasn’t aware that we have an Arm NN delegate. You will create a TFLite GPU delegate, call tflite::Interpreter::ModifyGraphWithDelegate with it (see TFLite GPU Delegate) , and hopefully it’ll be taken by the delegate. If it doesn’t accept it, you have to make changes to the network, so that it’s compatible with it. It’s not a simple process, and can require a fair amount of your time.

OK - thank you for the clarification, I think that answers my question, but still with a little doubt - the ARM docs seem pretty clear that tflite does work with the ARM NN delegate provided by ARM, but…

Given that this is a Python project that I want to keep as simple as possible it seems that a tflite model in Python on an ARM SoC such as the Rockchip RK3588S can not use the Mali GPU or the NPU as accelerators. Given a move to C++ and the investment of a lot of time this should be do-able for someone more experienced, but not in scope for this project.