Hello,

I’m having trouble understanding what keras is doing with the binary crossentropy loss function during evaluate (and training) when used with a network with two outputs corresponding to the probabilities of the two classes of a binary classifier. I am already familiar with how to get the desired result (switch to using categorical_crossentropy loss function) but it still remains highly puzzling what is happening when the binary_crossentropy function is used on such a network. Here’s a minimal reproducer:

```
from tensorflow import keras
import numpy as np
loss_func = keras.losses.BinaryCrossentropy()
nn = keras.Sequential([
keras.layers.Dense(2**8, input_shape=(1,), activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
nn.compile(loss=loss_func,optimizer='adam')
train_x = np.array([0.4])
train_y = np.array([[0,1]])
print(nn.predict(train_x))
print("Evaluated loss = ",nn.evaluate(train_x,train_y))
print("Function loss = ",loss_func(train_y,nn.predict(train_x)).numpy())
print("Manual loss = ",np.average( -train_y*np.log(nn.predict(train_x)) -(1-train_y)*np.log(1. - nn.predict(train_x)) ))
```

Producing:

```
[[0.5152152 0.48478484]]
1/1 [==============================] - 0s 92ms/step - loss: 0.7085
Evaluated loss = 0.7084982991218567
Function loss = 0.72405
Manual loss = 0.7240501642227173
```

The function loss and manual loss make complete sense to me. The evaluated loss does not. What is the calculation that is being performed to reach 0.70849 in this case?

Thanks

Will