Can I use Early Stopping or Callback with Model Maker

Z_Borui · April 8, 2022, 1:59am

I am using Model Maker and I’d like to figure out what is the best epoch for my training. I tried Callbacks in object_detector.create() with callback = tf.keras.callbacks.EarlyStopping(monitor=‘loss’, patience=3). But it seems like Model Maker doesn’t support it. Does anyone have an idea how I can find the best epoch to train a model?

8bitmp3 · April 8, 2022, 12:00pm

Hi @Z_Borui , looping in @Tian_LIN

Tian_LIN · April 10, 2022, 8:07am

Hi @Z_Borui, currently not. In order to suit different ML tasks, we only use built-in strategies for training case-by-case. But you raised a good feature request.

The difficulty is that “loss” may not be the only criteria for early stopping. Sometimes loss flattens for a few epochs, but in future it may goes down. Or, in many real complicated models, setting a good learning rate decay strategy and training to the end is much more efficient than a fixed rate and early stopping.

On the other hand, if you just want to know which epoch contains the minimum loss, it is definitely possible to do it today. For example, train the model slightly longer (say 20 epochs), and you can simply know the 15th epoch is the minimum by the metrics reported every round.

Hope it could help! Or, if you want to contribute to the new feature, welcome!

@Yuqi_Li @Lu_Wang

Z_Borui · April 11, 2022, 6:04am

@8bitmp3 @Tian_LIN Thank you all for the replies!

I am setting the epoch as 70 since my training dataset is quite large (>10k images), and it seems the “loss” is continuously dropping as the training goes. So far the 70th epoch has the minimum loss thus I am not sure if I would need to continually expand the epochs or not.

BTW, I really appreciate your work! Model Maker just makes ML much easier and I am looking forward to you adding more features to it!

Leno · November 2, 2022, 7:23pm

If the 70th epoch has the lowest ‘lost,’ you almost certainly should train for longer to see if the loss will continue to decrease. To @Tian_LIN’s point, you want to find your ‘lower bound’ and then reduce the number of epochs from that point (to decrease training time and overfitting).

As you’re experimenting with your training, using checkpoints may be helpful (it looks like the checkpoint feature for Model Maker has a bug: Tflite checkpoint restore · Issue #56117 · tensorflow/tensorflow · GitHub)

I found this thread because I agree, and early stopping feature would be helpful with TFLITE Model Maker.