Unexpected value of binary_crossentropy loss function in classifier network with two outputs

will-cern · July 27, 2022, 2:21pm

Hello,

I’m having trouble understanding what keras is doing with the binary crossentropy loss function during evaluate (and training) when used with a network with two outputs corresponding to the probabilities of the two classes of a binary classifier. I am already familiar with how to get the desired result (switch to using categorical_crossentropy loss function) but it still remains highly puzzling what is happening when the binary_crossentropy function is used on such a network. Here’s a minimal reproducer:

from tensorflow import keras
import numpy as np

loss_func = keras.losses.BinaryCrossentropy()
nn = keras.Sequential([
  keras.layers.Dense(2**8, input_shape=(1,), activation='relu'),
  keras.layers.Dense(2, activation='softmax')
])
nn.compile(loss=loss_func,optimizer='adam')
train_x = np.array([0.4])
train_y = np.array([[0,1]])
print(nn.predict(train_x))
print("Evaluated loss = ",nn.evaluate(train_x,train_y))
print("Function loss = ",loss_func(train_y,nn.predict(train_x)).numpy())
print("Manual loss = ",np.average( -train_y*np.log(nn.predict(train_x)) -(1-train_y)*np.log(1. - nn.predict(train_x)) ))

Producing:

[[0.5152152  0.48478484]]
1/1 [==============================] - 0s 92ms/step - loss: 0.7085
Evaluated loss =  0.7084982991218567
Function loss =  0.72405
Manual loss =  0.7240501642227173

The function loss and manual loss make complete sense to me. The evaluated loss does not. What is the calculation that is being performed to reach 0.70849 in this case?

Thanks
Will

will-cern · July 27, 2022, 2:21pm

Hello,

I am trying to understand the calculation keras is doing on a classifier network with two outputs when using the binary_crossentropy loss function. I know how to fix my issue by switching to cateogrical_crossentropy but really I’m trying to understand if there’s a bug in keras or just what on earth it is doing with the calculation because I have seen others using this loss function on such a network (rather than a network with a single output).

from tensorflow import keras
import numpy as np

loss_func = keras.losses.BinaryCrossentropy()
nn = keras.Sequential([
  keras.layers.Dense(2**8, input_shape=(1,), activation='relu'),
  keras.layers.Dense(2, activation='softmax')
])
nn.compile(loss=loss_func,optimizer='adam')
train_x = np.array([0.4])
train_y = np.array([[0,1]])
print(nn.predict(train_x))
print("Evaluated loss = ",nn.evaluate(train_x,train_y))
print("Function loss = ",loss_func(train_y,nn.predict(train_x)).numpy())
print("Manual loss = ",np.average( -train_y*np.log(nn.predict(train_x)) -(1-train_y)*np.log(1. - nn.predict(train_x)) ))

produces:

[[0.5152152  0.48478484]]
1/1 [==============================] - 0s 92ms/step - loss: 0.7085
Evaluated loss =  0.7084982991218567
Function loss =  0.72405
Manual loss =  0.7240501642227173

So the latter two numbers make complete sense to me … it’s the first number (from the evaluate method) that I do not understand how it got that.

Does anyone know?
Thanks!
Will

will-cern · August 1, 2022, 4:22pm

poking this issue as no response yet …