How is BinaryCrossentropy loss calculated in keras fit/evaluate?

Hello,

I am having trouble understanding the behaviour of the BinaryCrossentropy loss function in keras in the evaluate method in certain situations …

Here’s my minimal reproducer:

from tensorflow import keras
import numpy as np
loss_func = keras.losses.BinaryCrossentropy()
nn = keras.Sequential([
  keras.layers.Dense(2**8, input_shape=(1,), activation='relu'),
  keras.layers.Dense(2,activation='softmax')
])
nn.compile(loss=loss_func,optimizer='adam')
train_x = np.array([0.4]) # this is an arbitrary input
train_y = np.array([[1.,0.]])
train_q = nn.predict(train_x)
print("train_q = ",train_q)
print("Evaluated loss = ",nn.evaluate(train_x,train_y))
print("Function loss = ",loss_func(train_y,train_q).numpy())
print("Manual loss = ", -np.log(train_q[0,0]) )

Yielding the following output:

train_q =  [[0.5108817  0.48911828]]
1/1 [==============================] - 0s 438ms/step - loss: 0.6823
Evaluated loss =  0.682330846786499
Function loss =  0.671617
Manual loss =  0.67161715

The function loss makes complete sense, it equals the loss value I calculate ‘manually’ (by hand). What doesn’t make sense is the loss calculated in the evaluate call. How did it get 0.68233 here?

Many thanks!
Will

poking this simple question as still unanswered!

Have you tried with nn(train_x) or nn.predict_step(train_x) instead of nn.predict(train_x)?

Hi. Thanks for the suggestion. I just tried it but both of those alternative yield the same output as nn.predict … so it still seems like nn.evaluate is doing something wrong here … it’s like there’s a bug in nn.evaluate ???

If you suspect a bug, filing this in the tensorflow issue tracker might make sense.
Which library versions do you use, b.t.w.?

I did open an issue months ago unexpected value of binary_crossentropy loss function in network with · Issue #56910 · tensorflow/tensorflow · GitHub but at the time the person who responded wasn’t very helpful and basically told me to post in the keras repo …

You can copy-paste my example into google colab and it will reproduce the issue … so that’s tf 2.8.2

I can reproduce the same results:


# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0

# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
from tensorflow import keras


loss_func = keras.losses.BinaryCrossentropy()
nn = keras.Sequential([
  keras.layers.Dense(2**8, input_shape=(1,), activation='relu', kernel_initializer=keras.initializers.GlorotUniform(seed=1)),
  keras.layers.Dense(2,activation='softmax', kernel_initializer=keras.initializers.GlorotUniform(seed=1))
])
nn.compile(loss=loss_func,optimizer='adam')
train_x = np.array([0.4]) # this is an arbitrary input
train_y = np.array([[1.,0.]])
train_q = nn.predict_step(train_x)
print("train_q = ",train_q)
print("Evaluated loss = ",nn.evaluate(train_x,train_y))
print("Function loss = ",loss_func(train_y,train_q).numpy())