Model.restore in tensorflow in not working properly

Anshuman_Sinha · September 21, 2022, 10:18pm

I am trying to use tensorflow based deepxde module. Where I am using packages from tensorflow site-packages/tensorflow/python/training/saver.py which are checker and restore. these have been given path to folder ‘results’ ; the following folder is saving these files as shown in the code below.

I think the code line model.restore("/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt-" + str(np.argmin(model.losshistory.loss_test)*100), verbose=0) is not able to get the correct path.

My save_str is as follows: save_str = func_str+'_Seed_'+str(seed)+'_Samples_'+str(samples)+'_X_'+str(exponent_truth)+'_'+str(exponent_approx)+'_epochs_'+str(epochs)+'_blayers_'+str(b_layers)+'_neurons_'+str(neurons)

The files which are saved in the model/ folder are named like these: II have added a picture of the folder as well! Levin1_Seed_1_Samples_10_X_13_4_epochs_100_blayers_3_neurons_125model.ckpt-100.ckpt.data-00000-of-00001

According to me codes before model.restore are saving the files in ‘folder’ model ; but the code model.restore is not able to acces it somehow! Can someone please explian the error and give a possible solution to this problem!

Thanks.

Code

model = dde.Model(data, net)
model.compile("adam", lr=lr, metrics=[mean_squared_error])
checker = dde.callbacks.ModelCheckpoint(
    "/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt", save_better_only=False, period=100
    
)
losshistory, train_state = model.train(epochs=epochs, callbacks=[checker]) #Training Model batch_size = 10000
# For plotting the residuals and the training history: isplot=True will plot
dde.saveplot(losshistory, train_state, issave=False, isplot=False)

# Restore the best test loss model
model.restore("/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt-" + str(np.argmin(model.losshistory.loss_test)*100), verbose=0)

The error is as follows:

Traceback (most recent call last):
  File "/Users/anshumansinha/Desktop/Project/file/./main.py", line 302, in <module>
    NN_MSEs_test, NN_MSEs_train = DeepONet(samples, split, y/np.max(np.abs(y)) , I, inds, neurons, epochs, b_layers)
  File "/Users/anshumansinha/Desktop/Project/file/./main.py", line 282, in DeepONet
    model.restore("/Users/anshumansinha/Desktop/Project/model/"+save_str+"model.ckpt-" + str(np.argmin(model.losshistory.loss_test)*100), verbose=0)
  File "/Users/anshumansinha/venv/lib/python3.10/site-packages/deepxde/model.py", line 914, in restore
    self.saver.restore(self.sess, save_path)
  File "/Users/anshumansinha/venv/lib/python3.10/site-packages/tensorflow/python/training/saver.py", line 1409, in restore
    raise ValueError("The passed save_path is not a valid checkpoint: " +
ValueError: The passed save_path is not a valid checkpoint: /Users/anshumansinha/Desktop/Project/model/Levin1_Seed_1_Samples_100_X_13_10_epochs_100_blayers_7_neurons_500model.ckpt-100

The folder “results” is saving all these files, including ‘checkpoint’ file! Image : Image of the folder ‘results’

Renu_Patel · September 29, 2023, 6:59am

Hi @Anshuman_Sinha

As you are saving the model’s checkpoint using ModelCheckpoint callbacks, You can use load_weights method to reload the checkpoints

model.load_weights(checkpoint_path)

or use below code to select and reload the latest checkpoint

latest = tf.train.latest_checkpoint(checkpoint_dir)
model.load_weights(latest)

Please refer to this Manual checkpointing doc if you want to create checkpoint manually and restore them back.