MeanMetricWrapper produces inconsistent results on multiple runs

Rajesh_V · September 29, 2021, 1:57pm

Hi, I have noticed an inconsistency where wrapping the metric and passing pure lambda produce different results some of time. Google Colab
If you run the last cell multiple times, you will see instances where mean_squared_wrapped and mean_squared_error_fn are not equal to each other.
How can we explain this?

Thanks

Bhack · September 30, 2021, 8:18pm

I think you need to fix your colab:

NameError: name 'custom_mean_squared_error' is not defined

Rajesh_V · September 30, 2021, 8:52pm

Thanks for pointing that. I just updated colab with the missing function

def custom_mean_squared_error(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true - y_pred))

Bhack · September 30, 2021, 10:15pm

I suppose that you need to use something like:

def mean_squared_error_fn(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true - y_pred))

def squared_error_fn(y_true, y_pred):
    return tf.square(y_true - y_pred)
    
mean_squared_wrapped = tf.keras.metrics.MeanMetricWrapper(fn=squared_error_fn, name='mean_squared_wrapped')

Rajesh_V · October 1, 2021, 4:56am

I just tried using squared_error_fn (updated colab as well). It gives the same inconsistent results on the first eval (after compile) sometimes.

Bhack · October 1, 2021, 11:59am

I think that you need to maintain both if you want to compare the wrapped one with with the mean_squared_error_fn.

Bhack · October 1, 2021, 1:12pm

Try to run this:

import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)   

tf.random.set_seed(0)
np.random.seed(0)

def squared_error_fn(y_true, y_pred):
    return tf.square(y_true - y_pred)

def mean_squared_error_fn(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true - y_pred))

def squared_error_fn(y_true, y_pred):
    return tf.square(y_true - y_pred)
    
mean_squared_wrapped = tf.keras.metrics.MeanMetricWrapper(fn=squared_error_fn, name='mean_squared_wrapped')

def custom_mean_squared_error(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true - y_pred))

def get_compiled_model():
    
    inputs = keras.Input(shape=(784,), name="digits")
    x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
    x = layers.Dense(64, activation="relu", name="dense_2")(x)
    outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam',
              loss=custom_mean_squared_error,
              metrics=['accuracy', mean_squared_wrapped, mean_squared_error_fn])
    return model

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255

y_train = y_train.astype("float32")
y_test = y_test.astype("float32")

print("TF version: ",tf.__version__)
compiled_model = get_compiled_model()
one_hot_y_train = tf.one_hot(y_train, depth=10)
print(compiled_model.evaluate(x_train, one_hot_y_train, verbose=2))
print(compiled_model.evaluate(x_train, one_hot_y_train, verbose=2))
print(compiled_model.evaluate(x_train, one_hot_y_train, verbose=2))

Rajesh_V · October 1, 2021, 2:15pm

Thanks for taking a look. So the issue happens only on repeated compiles, not on first compile of model.
So I added a for loop in the new colab so that we can run once and see the mismatch. I am not sure if this is expected.

Pasted the same here

import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)   

tf.random.set_seed(0)
np.random.seed(0)

def squared_error_fn(y_true, y_pred):
    return tf.square(y_true - y_pred)

def mean_squared_error_fn(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true - y_pred))
    
mean_squared_wrapped = tf.keras.metrics.MeanMetricWrapper(fn=squared_error_fn, name='mean_squared_wrapped')

def custom_mean_squared_error(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true - y_pred))

def get_compiled_model():
    inputs = keras.Input(shape=(784,), name="digits")
    x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
    x = layers.Dense(64, activation="relu", name="dense_2")(x)
    outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam',
              loss=custom_mean_squared_error,
              metrics=['accuracy', mean_squared_wrapped, mean_squared_error_fn])
    return model

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255

y_train = y_train.astype("float32")
y_test = y_test.astype("float32")

print("TF version: ",tf.__version__)

one_hot_y_train = tf.one_hot(y_train, depth=10)

for i in range(3):
    compiled_model = get_compiled_model()
    eval1 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    eval2 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    
    metric_index = 2 # mean_squared_wrapped
    if abs(eval1[metric_index] - eval2[metric_index]) > 1e-5:
        print(f"mismatch found in compile: {i}")
        print("eval1: ", eval1)
        print("eval2: ", eval2)

Bhack · October 1, 2021, 2:54pm

Have you tried to move:
mean_squared_wrapped = tf.keras.metrics.MeanMetricWrapper(fn=squared_error_fn, name='mean_squared_wrapped')

inside get_compiled_model function scope?

Rajesh_V · October 1, 2021, 2:59pm

I just tried that and it fixed the issue. No more mismatch. Thanks.

So what is the conclusion for this issue? that MeanMetricWrapper has side effects?

Bhack · October 1, 2021, 3:05pm

I don’t know if something is cached internally.

/cc @Scott_Zhu What do you think?

markdaoust · October 1, 2021, 4:11pm

that MeanMetricWrapper has side effects?

something is cached internally.

The MeanMetricWrapper does have state (the running mean) is it possible that it’s just not getting reset correctly?

I have bumped into a similar issue with compile editing the metric object, and then multiple compile calls stacking up the modifications. This feels a little similar.

Bhack · October 1, 2021, 4:25pm

I think it works also with:

for i in range(3):
    compiled_model = get_compiled_model()
    eval1 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    eval2 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    compiled_model.reset_metrics()

Rajesh_V · October 1, 2021, 4:39pm

Yes, this works

for i in range(3):
    compiled_model = get_compiled_model()
    eval1 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    eval2 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    compiled_model.reset_metrics()

but this doesn’t

for i in range(3):
    compiled_model = get_compiled_model()
    compiled_model.reset_metrics()
    eval1 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)
    eval2 = compiled_model.evaluate(x_train, one_hot_y_train, verbose=2)

Bhack · October 1, 2021, 4:44pm

I suppose that evaluate doesn’t reset at the end of the cycle but just at the beginning:

github.com

keras-team/keras/blob/master/keras/engine/training.py#L1531-L1544

      
        
            function with `force=True`.
            
            
Args:
              force: Whether to regenerate the test function and skip the cached
                function if available.
            
            
Returns:
              Function. The function created by this method should accept a
              `tf.data.Iterator`, and return a `dict` containing values that will
              be passed to `tf.keras.Callbacks.on_test_batch_end`.
            """
            if self.test_function is not None and not force:
              return self.test_function