When I try to train a keypoint detection model, I first try **tf.GradientTape()**. Data loading uses **tf.keras.utils.Sequence**, code show as below

epoch = 20

learing_rate = 0.001

model = MyModel()

optimzers = tf.keras.optimizers.Adam(learning_rate=learing_rate)for i in range(epoch):

for j in range(0,my_training_batch_generator.len()): # iterate all image,label

images,labels = my_training_batch_generator.getitem(j) # image(4,224,224,3),label(4,56,56,17)

with tf.GradientTape() as tape:

y_pred = model(images) # get model output(4,56,56,17)

loss = tf.square(labels, y_pred)

loss = tf.reduce_mean(loss)

grads = tape.gradient(loss, model.trainable_variables)

optimzers.apply_gradients(grads_and_vars=zip(grads, model.trainable_variables))

It works very well and achieves good detection results. But I want to make use of the callback, so trying to train with **model.fit**

def loss_function(y_true, y_pred):

loss = tf.square(y_true-y_pred)

loss = tf.reduce_mean(loss)

return lossmodel.compile(

loss = loss_function,

# loss = tf.keras.losses.MSE,

optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)

)model.fit(

x=my_training_batch_generator, # tf.keras.utils.Sequence

epochs=epoch,

)

There is no change in any other settings, the accuracy after training is very bad, and I observe that the loss value keeps the same in each epoch.

Is there any difference between the two methods of training? This question has really bothered me for days, thanks for your answer