Wrong metric calculations for masked out timeseries with sample weights

i got a timeseries dataset where i mask out missing data of shorter sequences with sample_weight_mode=‘temporal’. So far so good. the losses are computed as expected but the metrics seem to ignore the sample-weights completly. is it a bug or am i doiing something wrong.

minimal working example:

import tensorflow as tf
from keras import layers
import numpy as np

input_layer = layers.Input(shape=(500, 5))
input_lstm = layers.LSTM(30, return_sequences=True)(input_layer)
output1 = layers.Dense(1)(input_lstm)
output2 = layers.Dense(1)(input_lstm)

model = tf.keras.Model(inputs=input_layer, outputs=[output1, output2])

model.compile(optimizer="adam", run_eagerly=False, sample_weight_mode='temporal', loss="mse", metrics=[["mae"], ["mae"]])

x = np.random.random((2000, 500, 5))

sample_weights = np.ones(x.shape[:-1])
amnt_zeros = np.random.choice(500, 2000)
for idx, zeros in enumerate(amnt_zeros):
    sample_weights[idx, -zeros:] = 0.0

x = x*sample_weights[...,None]
y1 = ((np.sum(x, axis=-1) + 20) * sample_weights)[..., None]
y2 = ((np.sum(x, axis=-1) + 10) * sample_weights)[...,None]

#masked y3 data is increased drasically to show the wrong calculation of the metrics
y2_testsample_weights = np.full_like(y2, -50000) * ((sample_weights - 1)[...,None])
y2 = y2 + y2_testsample_weights

history = model.fit(x=x, y=[y1, y2], sample_weight=sample_weights, epochs=500)

For whom is interested this is my solution so far. it also reverses the standardisation to give a absolute representation of the data:

class Masked_MAE(tf.keras.metrics.Metric):

    def __init__(self, name='masked_mae', mean=0, std=1.0, **kwargs):
        super(Masked_MAE, self).__init__(name=name, **kwargs)
        self.mean = mean,
        self.std = std
        self.factor = self.std
        self.total = self.add_weight(name='total', initializer='zeros')
        self.count = self.add_weight(name='count', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        if sample_weight is not None:
            mask = K.cast(K.not_equal(sample_weight, 0), K.floatx())
            mask = tf.expand_dims(mask, axis=-1)
            multp = y_true.shape[-1]
            mask = tf.ones_like(y_true)
            multp = tf.constant(1.0)
        masked_error = K.abs(y_true - y_pred) * mask
        self.count.assign_add(K.sum(mask) * multp)

    def result(self):
        return tf.math.divide_no_nan(self.total, self.count) * self.factor


Welcome to the Tensorflow Forum!

You are not doing anything wrong, this is expected behaviour.

From the Keras documentation on the fit method:

sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. This argument is not supported when x is a dataset, generator, or keras.utils.Sequence instance, instead provide the sample_weights as the third element of x. Note that sample weighting does not apply to metrics specified via the metrics argument in compile(). To apply sample weighting to your metrics, you can specify them via the weighted_metrics in compile() instead.

This explicitly states that sample_weight is used only for weighting the loss function during training and to apply sample weighting to your metrics, you can specify them via the weighted_metrics in compile() instead.

Thank you!

first of all: Thanks a lot for the answer! I think i read this wrong and thought that loss weights where also applied to the weighted metrics. But it doesnt seem so.
Just one last question. I wanted to weight the metrics to undo the normalisation of the inputs. is there any way to do this natively or do i have to write my own metrics?

I think you can define your own custom metrics function that applies the inverse normalization to the inputs before computing the metric.

Thank you!