ModelCheckpoint fails to format filename if save_freq is used

Hi everyone.

I’m trying to save a Keras model with ModelCheckpoint callback every two epochs.

If I save the model every epoch by using save_freq="epoch", everything is fine and I can use val_mean_absolute_error to format the filename. However, if I use 2* int(ceil(train_size/batch_size)) which is equal to two epochs, Keras shows an error.

KeyError: 'Failed to format this callback filepath: "saved-model_{epoch:02d}_{val_mean_absolute_error:.2f}.h5". Reason: \'val_mean_absolute_error\'' 

Below is the code, got it from here:

import tensorflow as tf
from tensorflow import keras

def get_model():
    model = keras.Sequential()
    model.add(keras.layers.Dense(1, input_dim=784))
    return model

# Load example MNIST data and pre-process it
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype("float32") / 255.0
x_test = x_test.reshape(-1, 784).astype("float32") / 255.0

# Limit the data to 1000 samples
x_train = x_train[:1000]
y_train = y_train[:1000]
x_test = x_test[:1000]
y_test = y_test[:1000]

nSteps = int(tf.math.ceil(len(x_train)/128))
filepath = "saved-model_{epoch:02d}_{val_mean_absolute_error:.2f}.h5"

callbacks = [
    tf.keras.callbacks.ModelCheckpoint(filepath=filepath, monitor='val_mean_absolute_error', verbose=1, 
            save_best_only=False, mode='min', save_freq=2*nSteps)

model = get_model()
history =

I’m not sure if it’s a bug, but something is not right!

Thank you.


After a bit of debugging, I found this code in

  def _implements_train_batch_hooks(self):
    # Only call batch hooks when saving on batch
    return self.save_freq != 'epoch'

  def _implements_train_batch_hooks(self):
    """Determines if this Callback should be called for each train batch."""
    return (not generic_utils.is_default(self.on_batch_begin) or
            not generic_utils.is_default(self.on_batch_end) or
            not generic_utils.is_default(self.on_train_batch_begin) or
            not generic_utils.is_default(self.on_train_batch_end))

  def _implements_test_batch_hooks(self):
    """Determines if this Callback should be called for each test batch."""
    return (not generic_utils.is_default(self.on_test_batch_begin) or
            not generic_utils.is_default(self.on_test_batch_end))

Accordingly, if I set save_freq='epoch', self.on_train_batch_end() is skipped and self.on_test_batch_end can format the filename correctly. So I think this is a bug and the code should consider somehow if the save_freq == n_epoch or provides another parameter to say if it is epoch or step.