I’m contemplating whether TensorFlow Lite could be used as a backend (“execution provider”) for ONNX Runtime (ORT). ORT supports most of the same execution providers as TFLite, so it’s mostly redundant, but this would unlock using Coral TPUs from ORT.
Both ORT and TFLite perform a graph partitioning, which I understand as assigning each node to the most capable hardware device that can perform the operator (i.e., a delegate in TFLite or execution provider in ORT).
Accordingly, ORT would need to ask TFLite if it is capable of executing a given operator, and TFLite would in turn ask its delegates if they are capable of executing it. What TFLite API can I call from ORT that answers this question?
A major complicating factor is that there are two layers of model conversion, from ONNX to TF frozen graph to TFLite flatbuffer file. I need a method that answers the question for a given ONNX node. (Maybe I have to feed the nodes one by one through the converters?)
Is there an API that tells me what backend TFLite selected? ORT may only want TFLite to execute subgraphs for which it has an accelerated backend, and keep anything that runs on the CPU for itself. (If no, this isn’t a showstopper.)