I’m having trouble understanding what keras is doing with the binary crossentropy loss function during evaluate (and training) when used with a network with two outputs corresponding to the probabilities of the two classes of a binary classifier. I am already familiar with how to get the desired result (switch to using categorical_crossentropy loss function) but it still remains highly puzzling what is happening when the binary_crossentropy function is used on such a network. Here’s a minimal reproducer:
from tensorflow import keras import numpy as np loss_func = keras.losses.BinaryCrossentropy() nn = keras.Sequential([ keras.layers.Dense(2**8, input_shape=(1,), activation='relu'), keras.layers.Dense(2, activation='softmax') ]) nn.compile(loss=loss_func,optimizer='adam') train_x = np.array([0.4]) train_y = np.array([[0,1]]) print(nn.predict(train_x)) print("Evaluated loss = ",nn.evaluate(train_x,train_y)) print("Function loss = ",loss_func(train_y,nn.predict(train_x)).numpy()) print("Manual loss = ",np.average( -train_y*np.log(nn.predict(train_x)) -(1-train_y)*np.log(1. - nn.predict(train_x)) ))
[[0.5152152 0.48478484]] 1/1 [==============================] - 0s 92ms/step - loss: 0.7085 Evaluated loss = 0.7084982991218567 Function loss = 0.72405 Manual loss = 0.7240501642227173
The function loss and manual loss make complete sense to me. The evaluated loss does not. What is the calculation that is being performed to reach 0.70849 in this case?