Correct way to retrain a Keras model

marcocintra · October 9, 2023, 9:24pm

Once I have trained a Keras model, I save it using:

model.save('model.keras')

for me to retrain it I import via:

from tensorflow.keras.models import load_model

I load the model using:

model = load_model('model.keras')

and use

model.fit()

Is this correct?

Laxma_Reddy_Patlolla · October 9, 2023, 9:31pm

Hi @marcocintra ,

Yes, that is correct. To retrain a Keras model that you have saved, you can use the above steps for any retraining with new data.

Ensure that you have the correct optimizer, loss function, and metrics specified during the compilation step. Adjust the parameters (like epochs, batch_size, etc.) in the fit function based on your retraining requirements.

After retraining, your model will have updated weights and be ready for evaluation or further use.

Thanks.

marcocintra · October 9, 2023, 9:34pm

Ok, thanks! One more follow-up question please. When I retrain a model, sometimes it starts with a slightly higher loss than the last loss of the previous model, is this normal?

Laxma_Reddy_Patlolla · October 9, 2023, 9:38pm

Hi @marcocintra ,

Yes, it is normal for a model to start with a slightly higher loss when retraining it. This is because the model is being initialized with new weights, which are not yet optimal for the new training data.

As the model trains, the loss will decrease and the model will become more accurate.

Thanks

marcocintra · October 9, 2023, 9:43pm

Ok, thanks, but the weights are not loaded from the previous model, the way it has the same weights?

And one more supplementary question, please. I read the following excerpt in a book:

"In TF.Keras, we can restore a model architecture and/or the model parameters (weights and biases). Restoring a model architecture is generally done for loading a prebuilt model, while loading both the model architecture and model parameters is generally done for transfer learning (discussed in chapter 11).

Note that loading the model and model parameters is not the same as checkpointing, in that we are not restoring the current state of hyperparameters. Therefore, this approach should not be used for continuous learning:

from tensorflow.keras.models import load_model

model = load_model(‘mymodel’) (Loads a pretrained model)

In the next code example, the trained weights/biases for a model are loaded into the corresponding prebuilt model, using the load_weights() method:

from tensorflow.keras.models import load_weights

model = load_model(‘mymodel’) (Loads a prebuilt model)

model.load_weights(‘myweights’) (Loads pretrained weights for the model)

This left me very confused, since the way I asked here about loading and retraining the model seems to be correct, and according to the book, it isn’t (I didn’t even understand what the correct way is according to this excerpt from the book)." Can you enlighten me on this please?

Laxma_Reddy_Patlolla · October 9, 2023, 10:59pm

Hi @marcocintra ,

A pre-trained model is a model that has been trained on a large dataset and can be used to perform a specific task, such as image classification or object detection.

A checkpointed model is a model that has been trained and then saved at a specific point in the training process. This can be useful for resuming training or for evaluating the model’s performance at different points in the training process.

When you load a pre-trained model, you are loading the model’s architecture and weights.

When you load a checkpointed model, you are loading the model’s architecture, weights, and the current state of the hyperparameters.

To retrain a model, you need to load the model’s architecture and weights. You can then compile the model and train it on new data.

The excerpt from the book that you quoted is also referring to the difference between loading the model and loading the model’s weights.

When you load the model, you are loading the model’s architecture and weights.

When you load the model’s weights, you are only loading the weights.

To retrain a model, you need to load the model’s weights. You can then compile the model and train it on new data.

Which method you use to retrain the model depends on your needs. If you want to retrain the model from scratch, then you should load the model’s weights. If you want to continue training the model from where it left off, then you should load the model.

I hope this helps to clarify the difference between loading a model, loading the model’s weights, and checkpointing a model.

Thanks.

Mog · October 12, 2023, 12:10pm

Eh, I don’t think the state of the optimizer is saved by default. So momentum and stuff disappears and have to be “learnt” again which might lead to a bit higher loss after the first bit of training. It could also be because you changed the batch size or some other parameter. Or it could be simple noise of the training procedure if it’s after the first epoch.

If you evaluate the model, save it, load it, and evaluate again then that should be exactly the same loss. On the same data of course.

A common reason not to get the same loss would be that the data processing wasn’t the same between the training of the model and loading and evaluating.

marcocintra · October 14, 2023, 4:56pm

In my case I only load a model of mine after the training, I am not using a “pre-trained model” that I’ve downloaded from TensorFlow Hub per example, if that’s what you mean (although I don’t see any difference regarding this type of specific model that was trained, if there is). So should I use checkpoint or not? Thanks.

marcocintra · October 14, 2023, 4:58pm

So how can I save the optimizer state, and after the train, load it again? No, I didn’t change change any parameters. Thanks.

Mog · October 23, 2023, 8:34am

You still haven’t answered whether model.evaluate(data) and then saving and loading the model followed by model.evaluate(data) gives the same output.