Custom loss function not working properly

Athreya_Prakash · June 3, 2022, 7:38am

I am training a convolutional neural network which predicts one label. I want to ignore loss resulted from some predictions during training. I wrote a custom binary cross entropy loss function for this. I want to ignore the loss when y_true=999.

    def custom_loss_mask(y_true, y_pred):
         y_true = tf.cast(y_true, dtype=y_pred.dtype)
    
         idx_ = (y_true!=999)
         idx = tf.cast(idx_, dtype=y_pred.dtype)
    
         loss = tf.keras.losses.binary_crossentropy(y_true*idx, y_pred*idx)
    
         return loss

I tested out the model for 10 epochs but the loss function does not seem to be working ie., the val_loss is not changing over the epochs and I am getting a 0.5 AUC (Area under ROC curve).

    Epoch 1/10
    26/26 [==============================] - 11s 179ms/step - loss: 1.3587 - accuracy: 0.9106 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 2/10
    26/26 [==============================] - 4s 159ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 3/10
    26/26 [==============================] - 4s 158ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 4/10
    26/26 [==============================] - 4s 158ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 5/10
    26/26 [==============================] - 4s 158ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 6/10
    26/26 [==============================] - 4s 158ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 7/10
    26/26 [==============================] - 4s 159ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 8/10
    26/26 [==============================] - 4s 159ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 9/10
    26/26 [==============================] - 4s 160ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    Epoch 10/10
    26/26 [==============================] - 4s 159ms/step - loss: 1.2572 - accuracy: 0.9185 - val_loss: 1.2569 - val_accuracy: 0.9185
    testing
    **********Testing model**********
    training AUC : 0.5
    testing AUC : 0.5

Some more information on the experiment, I am using an Adam optimizer with 1e-4 learning rate. I am using a data generator for training the model and I a training my model with model.fit_generator().

Does anyone see any mistakes regarding this?
If yes, what might be the possible fix?

varungupta · June 5, 2022, 6:44pm

Hey,
can you please explain idx_ = (y_true!=999)

Does, chaging the code to

def custom_loss_mask(y_true, y_pred):
         y_true = tf.cast(y_true, dtype=y_pred.dtype)
         if(tf.math.not_equal(y_true, tf.constant([999])):
             idx = tf.cast(idx_, dtype=y_pred.dtype)
             loss = tf.keras.losses.binary_crossentropy(y_true*idx, y_pred*idx)    
             return loss

work?

Athreya_Prakash · June 9, 2022, 4:51am

Hi @varungupta
with idx_ = (y_pred!=999) I am trying to get all the indices of labels which do not have 999.
I tried the above code but it does not work.
Is there any possible fix?

varungupta · June 9, 2022, 6:20am

@Athreya_Prakash Are you able to train the model using standard(non-custom) losses (does training metrics update as desired)?

Athreya_Prakash · June 9, 2022, 7:34am

@varungupta I cannot use binary cross entropy or any other standard losses since my y_true has 3 labels - 0,1 and 999(don’t care). I want to ignore the cases where it is 999.

varungupta · June 9, 2022, 9:10am

I think more info on your data is needed. You basically don’t want to train the model on certain data (ignore optimization/loss when 999) . Why to include such data in the first place? Is it not possible to NOT provide the model ground truth data with 999 value, and use a standard loss instead?

Further, your y_true has 3 labels [0,1,999], right? Are you one-hot encoding these?
what you are currently doing is setting the model to predict class 0 for category 999 (NOT ‘don’t care’/‘ignore’). If your’e not hot-encoding these —> then, let’s say
y_true ----> [0,1,999] (tensor)
idx_ = (y_true!=999) —> [True, True, False] ----> cast —> [1, 1, 0] ----> idx * y_true —> [1, 1, 0]* [0, 1, 999] → [0, 1, 0] (final y_true). You have converted 999 to 0, and then computing the binary CE loss in you custom loss. Similarly for y_pred. So whenever 999 data is fed, you automatically set the y_true and y_pred to 0. This contributes to the loss (minimizes it), and is not being ignored. e.g. (in 2nd example, I removed the last 0.0, and the loss went up.

tf.keras.losses.binary_crossentropy([0.0, 1.1, 1.0, 0.0, 0.0],[0.1 , 0.88, 0.98, 0.2 , 0.0])
<tf.Tensor: shape=(), dtype=float32, numpy=0.055459328>

tf.keras.losses.binary_crossentropy([0.0, 1.1, 1.0, 0.0],[0.1 , 0.88, 0.98, 0.2])
<tf.Tensor: shape=(), dtype=float32, numpy=0.06932416>

Thus, there seem to be few logical gaps in your implementation.
The ‘ignore’ effect you want the model to have on 999 data vals, is not happening.
Best suggestion would be reconsider the data you are feeding the model, limiting it to samples you want the model to work on. The custom loss seems to be working, I still don’t have a solid explanation on why metrics are not updating, maybe the 999 samples are balancing the loss? Also check your validation data. Is it majorly comprising on 999 value sample?

This is all I can say, without looking at the exact problem statement, architecture and data samples.
All the best.

Athreya_Prakash · June 10, 2022, 5:32am

@varungupta Let me clarify the problem statement.
We are predicting two labels (A, B) using sigmoid as activation. One of the challenge we had was missing ground truth. The dataset size reduces to 50% if we want to train with both A and B. (There are many entries where A is available but not B and Vicerversa )
if we would like use binary cross entropy (applied for each output), we would end up using only 50%. So we thought of having loss function, where we can ignore the loss term if one label is missing.