Have been training custom tflite custom object detection model using model maker. Any way to train models which can make detections faster

Hi all,

I have been using model maker and train tflite custom models for over a year now. Our objective is to train fast on device, real time object detection models that can be used in mass market smart phone.

Unfortunately, we find that models trained using this method is taking about 0.5sec to return a prediction. Any way we can speed out the detection besides hardware improvement ( bear in mind we are targeting mass market, so a high end smart device is not pragmatic.)

I am using efficientdet_lite0. Do let me know what other parameters I can tune. Or should I switch to another method of model training altogether?

Thanks

@Raymond_Wong * Opt for a smaller model variant, such as EfficientDet Lite0.

  • Apply post-training quantization to reduce model precision.
  • Experiment with input size reduction, thread configuration, and batch size for optimal performance.
  • If available, leverage the TensorFlow Lite GPU delegate for accelerated inference.
  • Use profiling tools to identify and address any remaining performance bottlenecks.

Implementing these recommendations should help you achieve faster and more efficient real-time object detection on your target devices.

@BadarJaffer , I am looking at how to apply post-training quantization.

currently, I am using model maker to create a tflite model. Am I still able to quantize the .tflite models?

if not, can I apply quantization after the training but before the model export?

for your reference, this is my current training code:

model = object_detector.ObjectDetector.create(train_data=train_ds, #train dataset
model_spec=spec, #model specification
epochs=100, #training epochs
validation_data = val_ds, #validation subset
batch_size= 16,
train_whole_model=True
)

at the end of the training, I export the .tflite file using:

model.export(export_dir= root_dir + ‘/tflite’,
tflite_filename= ‘sample.tflite’)

I have tried thread configuration and believe I am running at optimal threads at the moment.

However, my understanding is input size is fixed? is there anyway I can modify the input size during training/export?

Thanks

Found this:

will try it out!

Unfortunately, both dynamic and float 16 quantization did not produce a model with faster inference than the default quantization.