ModelCheckpoint fails to format filename if save_freq is used

Hi everyone.

I’m trying to save a Keras model with ModelCheckpoint callback every two epochs.

If I save the model every epoch by using save_freq="epoch", everything is fine and I can use val_mean_absolute_error to format the filename. However, if I use 2* int(ceil(train_size/batch_size)) which is equal to two epochs, Keras shows an error.

KeyError: 'Failed to format this callback filepath: "saved-model_{epoch:02d}_{val_mean_absolute_error:.2f}.h5". Reason: \'val_mean_absolute_error\'' 

Below is the code, got it from here:

import tensorflow as tf
from tensorflow import keras

def get_model():
    model = keras.Sequential()
    model.add(keras.layers.Dense(1, input_dim=784))
    model.compile(
        optimizer=keras.optimizers.RMSprop(learning_rate=0.1),
        loss="mean_squared_error",
        metrics=["mean_absolute_error"],
    )
    return model

# Load example MNIST data and pre-process it
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype("float32") / 255.0
x_test = x_test.reshape(-1, 784).astype("float32") / 255.0

# Limit the data to 1000 samples
x_train = x_train[:1000]
y_train = y_train[:1000]
x_test = x_test[:1000]
y_test = y_test[:1000]


nSteps = int(tf.math.ceil(len(x_train)/128))
filepath = "saved-model_{epoch:02d}_{val_mean_absolute_error:.2f}.h5"

callbacks = [
    tf.keras.callbacks.ModelCheckpoint(filepath=filepath, monitor='val_mean_absolute_error', verbose=1, 
            save_best_only=False, mode='min', save_freq=2*nSteps)
]

model = get_model()
history = model.fit(
    x_train,
    y_train,
    validation_data=(x_test,y_test),
    batch_size=128,
    epochs=4,
    verbose=1,
    callbacks=callbacks,
)

I’m not sure if it’s a bug, but something is not right!

Thank you.

==================
Edited:

After a bit of debugging, I found this code in callback.py

  def _implements_train_batch_hooks(self):
    # Only call batch hooks when saving on batch
    return self.save_freq != 'epoch'

  def _implements_train_batch_hooks(self):
    """Determines if this Callback should be called for each train batch."""
    return (not generic_utils.is_default(self.on_batch_begin) or
            not generic_utils.is_default(self.on_batch_end) or
            not generic_utils.is_default(self.on_train_batch_begin) or
            not generic_utils.is_default(self.on_train_batch_end))

  def _implements_test_batch_hooks(self):
    """Determines if this Callback should be called for each test batch."""
    return (not generic_utils.is_default(self.on_test_batch_begin) or
            not generic_utils.is_default(self.on_test_batch_end))

Accordingly, if I set save_freq='epoch', self.on_train_batch_end() is skipped and self.on_test_batch_end can format the filename correctly. So I think this is a bug and the code should consider somehow if the save_freq == n_epoch or provides another parameter to say if it is epoch or step.