I’m seeing slow object detection inference times on models trained using the efficientdet_lite0.
I’m using the TF Lite model maker example notebook for object detection with a custom dataset and am seeing inference times of 1.5-2 seconds on my MacBook Pro (single thread, no GPU). I can bring this down to around 0.75s with num_threads set to 4 but this seems to be much greater than the 37ms latency the notebook mentions. I thought it could be caused by overhead loading the model but subsequent calls to the
interpreter.invoke method yield similar performance. My perf measurement is v basic, using
time.perf_counter() either side of the invoke call. I’m quite new to all this so I feel like I’m doing something obviously wrong? Or am I missing something with post-training quantization?
In Google colab I’m seeing similar performance with the default notebook using the dataset provided in the notebook.