Difference between "train_on_batch()" and "test_on_batch()" return values

take_hamster · September 8, 2021, 1:57am

Hi.

I’m using Keras.

version:
Python 3.6.2
keras 2.6.0
tensorflow 2.6.0

There is a difference between train_on_batch() and test_on_batch() loss.
What is the reason for this?

train_on_batch(), test_on_batch(), evaluate()
--------------------------------------------------------------
:
0.5317689776420593, 0.5236611366271973, 0.5236611366271973
0.5239976644515991, 0.519664466381073, 0.519664466381073
0.5211538076400757, 0.515764057636261, 0.515764057636261
0.5445187091827393, 0.5118800401687622, 0.5118800401687622
0.5287842750549316, 0.5079948902130127, 0.5079948902130127
0.49349671602249146, 0.5042303800582886, 0.5042303800582886
:

The loss of test_on_batch () and evaluate () was the same.

The difference between the values of train_on_batch () and test_on_batch () can be extremely large, confusing which is the correct value.

Bhack · September 8, 2021, 3:37am

Take a look at:

take_hamster · September 8, 2021, 5:02am

Hi, thanks info.

I understand things like Dropout.

However, there was another mystery related to this.
I will try to reproduce it.

BRs
take hamster

take_hamster · September 8, 2021, 5:29am

Hi.

When using evaluate (), the loss caused by train_on_batch () rewinds.
Does anyone know why this happens?
And how can I solve it?

The output is shown below.
After doing train_on_batch () 10 times, evaluate ().
The loss before evaluating () was “0.002397470874711871”,
After performing evaluate (), it returns to “1.6820437908172607”.

start training
[3.641401767730713, 0.2685714364051819]    <- evaluate()

D - train on batch set 1/50
D - train on batch 1/10 (1/50)
train: [1.8222469091415405, 0.6342856884002686]    <- train_on_batch()
43.84374737739563 [s]
D - train on batch 2/10 (1/50)
train: [0.0025117292534559965, 1.0]    <- train_on_batch()
43.25669574737549 [s]
D - train on batch 3/10 (1/50)
train: [0.0025033268611878157, 1.0]    <- train_on_batch()
46.86725831031799 [s]
D - train on batch 4/10 (1/50)
train: [0.0029449830763041973, 1.0]    <- train_on_batch()
47.83594560623169 [s]
D - train on batch 5/10 (1/50)
train: [0.0016231434419751167, 1.0]    <- train_on_batch()
44.506582498550415 [s]
D - train on batch 6/10 (1/50)
train: [0.0032135520596057177, 1.0]    <- train_on_batch()
44.22938537597656 [s]
D - train on batch 7/10 (1/50)
train: [0.0022874141577631235, 1.0]    <- train_on_batch()
44.848825216293335 [s]
D - train on batch 8/10 (1/50)
train: [0.0031975528690963984, 1.0]    <- train_on_batch()
46.838255405426025 [s]
D - train on batch 9/10 (1/50)
train: [0.00267593702301383, 1.0]    <- train_on_batch()
45.724446058273315 [s]
D - train on batch 10/10 (1/50)
train: [0.002397470874711871, 1.0]    <- train_on_batch()
45.83753442764282 [s]
evaluate: [3.362639904022217, 0.3028571307659149]    <- evaluate()
D - train on batch set 2/50
D - train on batch 1/10 (2/50)
train: [1.6820437908172607, 0.6514285802841187]    <- train_on_batch() ?
46.7341628074646 [s]
D - train on batch 2/10 (2/50)
train: [0.0026804585941135883, 1.0]
45.43524193763733 [s]
D - train on batch 3/10 (2/50)
train: [0.0029256173875182867, 1.0]
44.48356604576111 [s]
D - train on batch 4/10 (2/50)
train: [0.0018705641850829124, 1.0]
44.31945013999939 [s]
D - train on batch 5/10 (2/50)
train: [0.0022483128122985363, 1.0]
45.50228929519653 [s]
D - train on batch 6/10 (2/50)
train: [0.002211614977568388, 1.0]
45.58434748649597 [s]
D - train on batch 7/10 (2/50)
train: [0.0023816321045160294, 1.0]
45.84853410720825 [s]
D - train on batch 8/10 (2/50)
train: [0.001954358071088791, 1.0]
45.299145221710205 [s]
D - train on batch 9/10 (2/50)
train: [0.0018027760088443756, 1.0]
45.008939266204834 [s]
D - train on batch 10/10 (2/50)
train: [0.0030839790124446154, 1.0]
45.06165862083435 [s]
evaluate: [2.3175859451293945, 0.36571428179740906]
D - train on batch set 3/50
D - train on batch 1/10 (3/50)
train: [1.1597096920013428, 0.6828571557998657]
45.89957118034363 [s]
D - train on batch 2/10 (3/50)

BRs
take hamster

take_hamster · September 8, 2021, 1:03pm

Hi.

Using evaluate() changes the loss.
I write sample code.

(network)

model = Sequential(name='sample_02')

model.add(Input(shape=(3,)))
model.add(BatchNormalization())
model.add(Dense(10))
model.add(Activation('relu'))
model.add(Dense(4))

model.compile(optimizer='adam', loss='mean_squared_error')

    |
    v

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
batch_normalization (BatchNo (None, 3)                 12        
_________________________________________________________________
dense (Dense)                (None, 10)                40        
_________________________________________________________________
activation (Activation)      (None, 10)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 44        
=================================================================
Total params: 96
Trainable params: 90
Non-trainable params: 6
_________________________________________________________________

Base training code is:

(source code)

# starts training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
#        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
#        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))
print()


(Dump result)

0.584841251373291
0.5780714750289917
0.5713894367218018
0.5647997856140137
0.5583046674728394
0.5519051551818848
0.5456019043922424
:
:

The loss is the same when using test_on_batch ().

(source code)

# starts training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
#        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))
print()


(Dump result)

0.584841251373291
0.5780714750289917
0.5713894367218018
0.5647997856140137
0.5583046674728394
0.5519051551818848
0.5456019043922424
0.5394853949546814
:
:

The loss is different with and without evaluate ().
I would like to know the cause and solution of this.

(source code)

# starts training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
#        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))
print()


(Dump result)

0.584841251373291        <- 1st same
0.5547470450401306      <- different! 
0.5488174557685852
0.5429757833480835
0.537223756313324
0.531562328338623
:
:

BRs
take hamster

Bhack · September 8, 2021, 6:38pm

I don’t have a minimal runnable example as you don’t have shared dummy x and y data.

If you can share a Colab without external input data it could be better.

take_hamster · September 8, 2021, 9:22pm

Hi.

Attach the smallest executable source code.
To build a common network first, enable the commented out part and run it only once.

import numpy as np
from keras.models import Model
from keras.models import Sequential
from keras.layers import Input, Dense, Activation, Dropout
from keras.layers import BatchNormalization
from keras.models import load_model



# Enable this on the first run and build the network.
'''
model = Sequential(name='sample_01')

model.add(Input(shape=(3,)))
model.add(BatchNormalization())
model.add(Dense(10))
model.add(Activation('relu'))
model.add(Dense(4))

model.compile(optimizer='adam', loss='mean_squared_error')

model.summary()

model.save("model_sample_01.h5")

exit()
'''



model = load_model("model_sample_01.h5")

x_data = [
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
]

y_data = [
[0.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.0],
]

x_data_keras = np.array(x_data)
y_data_keras = np.array(y_data)

max_loop = 1
max_in_loop = 10





# training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
#        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))


print("end")

BRs
take hamster

Bhack · September 8, 2021, 11:24pm

The problem is not evaluate is that your are not fixing the seeds for a reproducible run.

I’ve slightly modified your example

import os

os.environ["PYTHONHASHSEED"]=str(1234)

import numpy as np
import unittest
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Activation, Dropout
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import load_model
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
python_random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

# See https://github.com/tensorflow/tensorflow/issues/31149
initializer = tf.keras.initializers.GlorotUniform(seed=42)

def get_model():
  model = Sequential()
  model.add(Input(shape=(3,)))
  model.add(BatchNormalization())
  model.add(Dense(10,kernel_initializer=initializer))
  model.add(Activation('relu'))
  model.add(Dense(4,kernel_initializer=initializer))
  model.compile(optimizer='adam', loss='mean_squared_error')
  model.summary()
  return model

x_data = [
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
]

y_data = [
[0.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.0],
]

x_data_keras = np.array(x_data)
y_data_keras = np.array(y_data)

max_loop = 1
max_in_loop = 10




model = get_model()
result_1={}
result_2={}
result_3={}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

model = get_model()
result_1_1 = {}
result_2_2 = {}
result_3_3 = {}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1_1)

case = unittest.TestCase()
case.assertDictEqual(result_1,result_1_1)

take_hamster · September 9, 2021, 1:10am

Hi.

I commented out some of the code for the first training.

# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

|
v

# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        #result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

The whole source code is below:

import os

os.environ["PYTHONHASHSEED"]=str(1234)

import numpy as np
import unittest
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Activation, Dropout
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import load_model
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
python_random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

# See https://github.com/tensorflow/tensorflow/issues/31149
initializer = tf.keras.initializers.GlorotUniform(seed=42)

def get_model():
  model = Sequential()
  model.add(Input(shape=(3,)))
  model.add(BatchNormalization())
  model.add(Dense(10,kernel_initializer=initializer))
  model.add(Activation('relu'))
  model.add(Dense(4,kernel_initializer=initializer))
  model.compile(optimizer='adam', loss='mean_squared_error')
  model.summary()
  return model

x_data = [
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
]

y_data = [
[0.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.0],
]

x_data_keras = np.array(x_data)
y_data_keras = np.array(y_data)

max_loop = 1
max_in_loop = 10




model = get_model()
result_1={}
result_2={}
result_3={}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        #result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

model = get_model()
result_1_1 = {}
result_2_2 = {}
result_3_3 = {}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1_1)

case = unittest.TestCase()
case.assertDictEqual(result_1,result_1_1)

Dump result

AssertionError: {0: 0[20 chars]: 0.730724573135376, 2: 0.7224830389022827, 3:[15
3 chars]1531} != {0: 0[20 chars]: 0.7113834619522095, 2: 0.704352080821991, 3:[1
51 chars]1235}
  {0: 0.7390658855438232,
-  1: 0.730724573135376,
-  2: 0.7224830389022827,
-  3: 0.7143421769142151,
-  4: 0.7063030004501343,
-  5: 0.6983664035797119,
-  6: 0.6905329823493958,
-  7: 0.6828033328056335,
-  8: 0.6751779317855835,
-  9: 0.6676561236381531}
+  1: 0.7113834619522095,
+  2: 0.704352080821991,
+  3: 0.6973998546600342,
+  4: 0.6905273199081421,
+  5: 0.683735191822052,
+  6: 0.6770237684249878,
+  7: 0.6703934669494629,
+  8: 0.663844645023346,
+  9: 0.6573768854141235}

The loss differs depending on whether the evaluate () is inserted or not.
Why does this happen?

BRs
take hamster

Bhack · September 9, 2021, 2:05am

Yes I think this is known:

github.com/tensorflow/tensorflow

Calling model.test_on_batch after model.evaluate returns corrupted values for the loss and the metrics

opened 09:36AM - 23 Feb 21 UTC

closed 04:51AM - 10 Dec 21 UTC

ogrisel

stat:awaiting response type:bug stalled comp:keras TF 2.4

**System information** - Google colab with tf 2.4.1 (v2.4.1-0-g85c8b2a817f ) -… with CPU or GPU runtimes, it does not matter **Describe the current behavior** Calling `model.test_on_batch` after calling `model.evaluate` gives incorrect results. **Describe the expected behavior** Calling `model.test_on_batch` should return the a value that does not depend on whether`model.evaluate` was called before. **Standalone code to reproduce the issue** Let's define a randomly initialized model evaluated on randomly generated data. If we call `model.test_on_batch` directly, everything is fine: ```python import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense rng = np.random.RandomState(0) batch_size = 32 n_samples, n_features = batch_size * 10, 5 X = rng.normal(size=(n_samples, n_features)) y = rng.randint(low=0, high=2, size=X.shape[0]) model = Sequential([Dense(1, input_shape=(n_features,), activation="sigmoid")]) model.compile(optimizer="adam", loss='binary_crossentropy', metrics=['accuracy']) print("model.test_on_batch without model.evaluate") for i in range(3): loss, acc = model.test_on_batch(X[:batch_size], y[:batch_size]) print(loss, acc) ``` output: ``` model.test_on_batch without model.evaluate 6.294709205627441 0.5625 6.294709205627441 0.5625 6.294709205627441 0.5625 ``` If we then call `evaluate`, the fist call of `model.test_on_batch` return an incorrect value: ```python normal_loss, normal_acc = model.evaluate( X, y, batch_size=batch_size, verbose=0) print("model.test_on_batch *after* model.evaluate") for i in range(3): loss, acc = model.test_on_batch(X[:batch_size], y[:batch_size]) print(loss, acc) ``` output: ``` model.test_on_batch *after* model.evaluate 6.585799694061279 0.4801136255264282 6.294709205627441 0.5625 6.294709205627441 0.5625 ``` Note: this is a minimal reproducer of the report found in the comments of a similar standalone keras issue: https://github.com/keras-team/keras/issues/14086 It is probably also the cause or at least related to corrupted loss/metric values reported when using `evaluate_generator`: https://github.com/keras-team/keras/issues/13780

take_hamster · September 9, 2021, 9:39pm

Hi.

I saw the contents.
For the time being, I’m going to use only test_on_batch (), not evaluate ().
Thanks

BRs
take hamster

Bhack · September 9, 2021, 9:58pm

Yes, please upvote and subscribe to the ticket.

Bhack · September 15, 2021, 3:51pm

I have submitted a candidate fix at https://github.com/keras-team/keras/pull/15342