When creating a custom loss function with additional arguments, an error occurs when saving the model

Sangmin_Suh · January 29, 2024, 9:44am

My two develpment environments are 1: win10, tf.2.10-gpu native, 2: ubuntu22.04 tf.2.12-gpu. However, the symptoms are the same.

As I know, to make custom loss, we can use class or function. In addition, when using class, I can debug code inside custom loss function easy (eager mode) by using ‘tf.config.run_functions_eagerly(True)’, of course I set ‘False’ in training mode. However eventhough I use ‘tf.config.run_functions_eagerly(True)’, when using function I could not debug code inside custom loss (@tf.function still activate). So now, I am designing the custom function using class, and converting the class based code to function based loss function. Two types of the loss is designed as follows and addtional input is anchorboxes_xy designe by tf.tensor.

class ssd_loss(tf.keras.losses.Loss):
def init(self, anchorboxes_xy, name=“ssd_loss”):
super(ssd_loss, self).init(name=name)
self.anchorboxes_xy = anchorboxes_xy # external tf tensor
…

def call(self, y_true, y_pred):
    y_true_cls_batch, y_true_boxes_batch = y_true[:, :, :1], y_true[:, :, 1:]
    y_pred_cls_batch, y_pred_boxes_batch = y_pred[:, :, :param['n_classes']], y_pred[:, :, param['n_classes']:]

    true_class = tf.zeros([param['batch_size'], param['n_anchors'], 1])
    true_boxes = tf.zeros([param['batch_size'], param['n_anchors'], 4])

    for i in range(param['batch_size']):
        # IoU between y_true_boxes_batch and self.anchorboxes_xy
        obj_loc_idx = tf.where(y_true_cls_batch[i] != 0)[:,0]
        labels = tf.gather(y_true_cls_batch[i], obj_loc_idx)
        boxes = tf.gather(y_true_boxes_batch[i], obj_loc_idx)
        iou = calc_iou_2D(boxes, self.anchorboxes_xy)
  	...

    # calc loss
    code ...
 total_loss = func(code...)

    return total_loss

def get_config(self):
    return {'anchorboxes_xy': self.anchorboxes_xy}

@classmethod
def from_config(cls, config):
    return cls(**config)

def ssd_loss_func(anchorboxes_xy):

anchorboxes_xy = anchorboxes_xy # external tf tensor

…

def ssd_loss(y_true, y_pred):
    y_true_cls_batch, y_true_boxes_batch = y_true[:, :, :1], y_true[:, :, 1:]
    y_pred_cls_batch, y_pred_boxes_batch = y_pred[:, :, :param['n_classes']], y_pred[:, :, param['n_classes']:]

    true_class = tf.zeros([param['batch_size'], param['n_anchors'], 1])
    true_boxes = tf.zeros([param['batch_size'], param['n_anchors'], 4])

    for i in range(param['batch_size']):
        # IoU between y_true_boxes_batch and anchorboxes_xy
        obj_loc_idx = tf.where(y_true_cls_batch[i] != 0)[:,0]
        labels = tf.gather(y_true_cls_batch[i], obj_loc_idx)
        boxes = tf.gather(y_true_boxes_batch[i], obj_loc_idx)
        iou = calc_iou_2D(boxes, anchorboxes_xy)
  	...

    # calc loss
    code ...
 total_loss = func(code...)

    return total_loss

return ssd_loss

And, in case of class based custom loss, when saving model, the following error appear. Because I should use “model.predict” using the reconstructed model, I have to do model.save(…)

→ model.save(param[‘results_dir’] + ‘model.h5’)
(Pdb) n
TypeError: Unable to serialize [[0. 0. 0.14285715 0.14285715]
[0.14285715 0. 0.2857143 0.14285715]

…

[0.71428573 0.85714287 0.85714287 1. ]
[0.85714287 0.85714287 1. 1. ]] to JSON. Unrecognized type <class ‘tensorflow.python.framework.ops.EagerTensor’>.

The displayed tensor value is exactly same as external “anchorboxes_xy”.

Summary

When using class based loss, debugging is ok, model.save() not working
When using function based loss, debugging not working, model.save() working

I hope use class based loss function because of easy debugging. I’m not sure how to fix it.

Tim_Wolfe · January 29, 2024, 3:24pm

To fix the model saving issue with your class-based custom loss function in TensorFlow, you need to ensure that all attributes of the class are serializable. The error you’re encountering is due to the anchorboxes_xy tensor attribute in your ssd_loss class, which TensorFlow can’t serialize into JSON format.

To resolve this, you have a few options:

Convert the Tensor to a Serializable Format: In your get_config method, convert the anchorboxes_xy tensor to a list or numpy array, which is serializable. Then, reconstruct the tensor from this list or array in the from_config method.
Use a Wrapper Function: Instead of a class, use a wrapper function that takes anchorboxes_xy as an input and returns your loss function. This avoids the need to serialize the tensor as part of the class.
Custom Layer for Loss Calculation: Implement the loss calculation inside a custom Keras layer that takes anchorboxes_xy as an input. This integrates the tensor into the model’s computational graph, making serialization straightforward.

For debugging in eager mode, ensure tf.config.run_functions_eagerly(True) is set before model compilation to allow step-through debugging of the loss function.

Sangmin_Suh · January 29, 2024, 10:52pm

Wow, that was the perfect solution. Thank you so much, Tim.