Tensor Flow Lite Model Maker - add more epochs after one training run

MarcusCato · July 6, 2021, 10:14am

Hello,

I ran a image_classifier.create(train_data, validation_data=validation_data, epochs=70)

I wanted to add more epochs and carry-forward the learning gained by 70.

How do I do that?

Kzyh · July 6, 2021, 12:04pm

Did you try
image_classifier.create(train_data, validation_data=validation_data, model_dir = “ckp_path”, epochs=20)

MarcusCato · July 6, 2021, 12:33pm

@Kzyh don’t I need to pass use_hub_library as False if I use ‘model_dir’?

MarcusCato · July 6, 2021, 12:40pm

@Kzyh in addition to my previous comment… how do I create a model checkpoint directory to begin with?

Bhack · July 6, 2021, 4:40pm

You can use model.export with the right param

Yes

MarcusCato · July 6, 2021, 4:45pm

@Bhack are you referring to ExportFormat.SAVED_MODEL

It is not clear if it is a “checkpoint” or the finalized model and it if makes a difference.

MarcusCato · July 6, 2021, 5:36pm

I tried this, but it doesn’t seem to work. The accuracy does not improve at all between the runs, whereas within 10 epochs it used to reach > 90 % earlier.

image_classifier.create(train_data, validation_data=validation_data, epochs=1)

model.export(export_dir=’.’, export_format=ExportFormat.SAVED_MODEL)
model = image_classifier.create(train_data, validation_data=validation_data, epochs=1, model_dir=’.’, use_hub_library=False)

…(9 times in total)

Note that I tried both ‘.’ and ‘./saved_model’ in the create() call, but neither worked.

MarcusCato · July 6, 2021, 6:03pm

@Bhack pls help. I tried the default export format too but that didnt work either

Bhack · July 6, 2021, 6:23pm

It is not explicitly documented but I suppose that model_dir it is only used for the keras callback to save the model and not for init the model weight. See

github.com

tensorflow/examples/blob/master/tensorflow_examples/lite/model_maker/core/task/train_image_classifier_lib.py#L186-L188

      
        
            
            
warmup_steps = hparams.warmup_steps
            if warmup_steps is None:

I suppose that it is a Feature request for this API

Kzyh · July 7, 2021, 10:36am

Doesnt it save ckpt file every 20 epochs? You can check it like that:
model = image_classifier.create(train_data, validation_data=validation_data, epochs=30, model_dir=’.’, use_hub_library=False)
Then it should load this ckpt.
Or you can edit train_image_classifier_lib.py file to save after every epoch.

Bhack · July 7, 2021, 12:30pm

Yes but he want to load the check-point as the initial state and this is a FR for this API.

I think a simple PR could support this bootstrap.

You can open a ticket or a PR if you want to contribute to this feature.

MarcusCato · July 7, 2021, 12:49pm

@Kzyh I think you are talking about the notebook’s auto-save? I don’t see the model being saved - there is not additional folders or files being auto-created that I see, unless it is in some other /var like folder.

Kzyh · July 8, 2021, 6:26am

@Bhack is right. It wont load checkpoint file.
You can try adding this to train_image_classifier_lib.py in train_model method.

  
status = tf.train.latest_checkpoint(hparams.model_dir)

if status: checkpoint = tf.train.Checkpoint(model) checkpoint.restore(checkpoint_path)

Daniel_Kuhn · July 18, 2021, 7:04am

Hello all,
I have exactly the same problem. I try to continue the training or train the model with additional data, but I can’t find a solution.
Could anyone already find a solid solution or can say if @Kzyh 's solution works?
The search has already taken me days and I am grateful for any help.

Kind regards Daniel

Daniel_Kuhn · August 2, 2021, 1:38pm

I have tried this suggestion but it does not work.
Is there something missing or where exactly does the code need to be placed in the function?
I am grateful for any help!

MarcusCato · August 2, 2021, 2:12pm

I looked at the codebase in detail, I don’t think the incremental training will work without major changes in the API. Best bet is to use the non-lite version of Tensor Flow and make a non-lite model and then convert that to TFLITE model - I haven’t tried it myself so I don’t know what hurdles await there. Overall, given my recent experience with TensorFlow Lite Model Maker, I think the API is quite poor (just like the non-lite version) and the documentation is misleading or wrong in places. The configurations do not have reasonable defaults (Shuffle is False for example), Augmentation is not well documented and reading the code it seems like it has random crop for example which may just leave out the subject from the image and cause false examples, as another example. It seems like the forum complaints do not affect any change with the authors of the API at Google. Given this, I am will probably switch to some other API.

lgusm · August 2, 2021, 5:02pm

Hi Marcus,

I guess for your use case (have the checkpoint and keep training later) I’d go straight to the regular Keras API. Create a simple model and do the transfer learning.

for this use case, there’s this tutorial here: إعادة تدريب مصنف الصور | TensorFlow Hub
It even shows how to convert the model to TFLite later.

hope it helps

MarcusCato · August 3, 2021, 2:49pm

Yes that is essentially what I have been doing. I made some tweaks to accommodate for Google Colab Pro dying on me as well. To help out @Daniel_Kuhn here are some snippets:

# Mount the Google Drive of 'XXX@gmail.com' account so we can access 'TRAINING_DATA.zip'
from google.colab import drive
drive.mount('/content/drive')

# Extract the training data from Google Drive
!unzip /content/drive/MyDrive/TRAINING_DATA.zip

# Path on Google Drive where we will save the model
SAVED_MODEL_PATH = f"./drive/MyDrive/my_saved_model_{model_name}"

# 5 epochs at a time
for i in range(0, 5):
  hist = model.fit(
      train_ds,
      epochs=1, steps_per_epoch=steps_per_epoch,
      validation_data=val_ds,
      validation_steps=validation_steps).history
  # Save the model to Drive after every epoch, so that it can be reloaded to continue training, if the notebook dies    
  model.save(SAVED_MODEL_PATH)


# If the Google Colab dies, remount the Google drive as before, reload the saved model and continue training
model = tf.keras.models.load_model(SAVED_MODEL_PATH)

Rest of the code is similar to what @lgusm pointed us to: Retraining an Image Classifier | TensorFlow Hub

Fredrik_T · January 18, 2022, 6:52am

Colab Free-tier… A bit of a laugh that the platform in the middle of training comes back with… Hey, are you there? Please confirm or env is disconnected…

I know it’s free but free easily ends up taking more time than available hours of the day so… I upgraded to Pro and now its working for me… Free?? No…

lgusm · January 18, 2022, 1:49pm

Hi everyone, for people that wants to keep training the model with more data when using Model Maker, take a look on this thread: https://tensorflow-prod.ospodiscourse.com/t/how-to-stop-and-resume-object-detector-training-object-detection-model-maker/3487/14?u=lgusm

there’s a work around and a PR to address this!