Tensor Flow Lite Model Maker - add more epochs after one training run

Hello,

I ran a image_classifier.create(train_data, validation_data=validation_data, epochs=70)

I wanted to add more epochs and carry-forward the learning gained by 70.

How do I do that?

1 Like

Did you try
image_classifier.create(train_data, validation_data=validation_data, model_dir = “ckp_path”, epochs=20)

@Kzyh don’t I need to pass use_hub_library as False if I use ‘model_dir’?

@Kzyh in addition to my previous comment… how do I create a model checkpoint directory to begin with?

You can use model.export with the right param

Yes

@Bhack are you referring to ExportFormat.SAVED_MODEL

It is not clear if it is a “checkpoint” or the finalized model and it if makes a difference.

I tried this, but it doesn’t seem to work. The accuracy does not improve at all between the runs, whereas within 10 epochs it used to reach > 90 % earlier.

image_classifier.create(train_data, validation_data=validation_data, epochs=1)

model.export(export_dir=’.’, export_format=ExportFormat.SAVED_MODEL)
model = image_classifier.create(train_data, validation_data=validation_data, epochs=1, model_dir=’.’, use_hub_library=False)

model.export(export_dir=’.’, export_format=ExportFormat.SAVED_MODEL)
model = image_classifier.create(train_data, validation_data=validation_data, epochs=1, model_dir=’.’, use_hub_library=False)

…(9 times in total)

Note that I tried both ‘.’ and ‘./saved_model’ in the create() call, but neither worked.

@Bhack pls help. I tried the default export format too but that didnt work either

It is not explicitly documented but I suppose that model_dir it is only used for the keras callback to save the model and not for init the model weight. See

I suppose that it is a Feature request for this API

Doesnt it save ckpt file every 20 epochs? You can check it like that:
model = image_classifier.create(train_data, validation_data=validation_data, epochs=30, model_dir=’.’, use_hub_library=False)
Then it should load this ckpt.
Or you can edit train_image_classifier_lib.py file to save after every epoch.

Yes but he want to load the check-point as the initial state and this is a FR for this API.

I think a simple PR could support this bootstrap.

You can open a ticket or a PR if you want to contribute to this feature.

@Kzyh I think you are talking about the notebook’s auto-save? I don’t see the model being saved - there is not additional folders or files being auto-created that I see, unless it is in some other /var like folder.

@Bhack is right. It wont load checkpoint file.
You can try adding this to train_image_classifier_lib.py in train_model method.

status = tf.train.latest_checkpoint(hparams.model_dir)

if status:
 checkpoint = tf.train.Checkpoint(model)
 checkpoint.restore(checkpoint_path)

Hello all,
I have exactly the same problem. I try to continue the training or train the model with additional data, but I can’t find a solution.
Could anyone already find a solid solution or can say if @Kzyh 's solution works?
The search has already taken me days and I am grateful for any help.

Kind regards Daniel

I have tried this suggestion but it does not work.
Is there something missing or where exactly does the code need to be placed in the function?
I am grateful for any help!

I looked at the codebase in detail, I don’t think the incremental training will work without major changes in the API. Best bet is to use the non-lite version of Tensor Flow and make a non-lite model and then convert that to TFLITE model - I haven’t tried it myself so I don’t know what hurdles await there. Overall, given my recent experience with TensorFlow Lite Model Maker, I think the API is quite poor (just like the non-lite version) and the documentation is misleading or wrong in places. The configurations do not have reasonable defaults (Shuffle is False for example), Augmentation is not well documented and reading the code it seems like it has random crop for example which may just leave out the subject from the image and cause false examples, as another example. It seems like the forum complaints do not affect any change with the authors of the API at Google. Given this, I am will probably switch to some other API.

Hi Marcus,

I guess for your use case (have the checkpoint and keep training later) I’d go straight to the regular Keras API. Create a simple model and do the transfer learning.

for this use case, there’s this tutorial here: إعادة تدريب مصنف الصور  |  TensorFlow Hub
It even shows how to convert the model to TFLite later.

hope it helps

Yes that is essentially what I have been doing. I made some tweaks to accommodate for Google Colab Pro dying on me as well. To help out @Daniel_Kuhn here are some snippets:

# Mount the Google Drive of 'XXX@gmail.com' account so we can access 'TRAINING_DATA.zip'
from google.colab import drive
drive.mount('/content/drive')

# Extract the training data from Google Drive
!unzip /content/drive/MyDrive/TRAINING_DATA.zip

# Path on Google Drive where we will save the model
SAVED_MODEL_PATH = f"./drive/MyDrive/my_saved_model_{model_name}"

# 5 epochs at a time
for i in range(0, 5):
  hist = model.fit(
      train_ds,
      epochs=1, steps_per_epoch=steps_per_epoch,
      validation_data=val_ds,
      validation_steps=validation_steps).history
  # Save the model to Drive after every epoch, so that it can be reloaded to continue training, if the notebook dies    
  model.save(SAVED_MODEL_PATH)


# If the Google Colab dies, remount the Google drive as before, reload the saved model and continue training
model = tf.keras.models.load_model(SAVED_MODEL_PATH)

Rest of the code is similar to what @lgusm pointed us to: Retraining an Image Classifier | TensorFlow Hub

3 Likes

Colab Free-tier… A bit of a laugh that the platform in the middle of training comes back with… Hey, are you there? Please confirm or env is disconnected…

I know it’s free but free easily ends up taking more time than available hours of the day so… I upgraded to Pro and now its working for me… Free?? No…

Hi everyone, for people that wants to keep training the model with more data when using Model Maker, take a look on this thread: https://tensorflow-prod.ospodiscourse.com/t/how-to-stop-and-resume-object-detector-training-object-detection-model-maker/3487/14?u=lgusm

there’s a work around and a PR to address this!

1 Like