I’m confused about the handling of mask since there seems confict to me.
If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
In the guide Understanding masking & padding, section Passing mask tensors directly to layers tells me
Layers that can handle masks (such as the LSTM layer) have a mask argument in their __call__ method.
So based on the above two points, my understanding is
only if there is a
mask argument in the
__call__ method is the layer capable of handling masks. In addition, if there are layers that don’t have this argument when there is a masking layer in upstream, there will be an exception.
But the following example (modified from this example) doesn’t follow the above understanding as there’s no exception throwed, given that the
call method doesn’t have
mask (it does have
attention_mask, but this is distinct)
import tensorflow as tf import numpy as np import tensorflow_models as tfm samples, timesteps, features = 32, 10, 8 inputs = np.random.random([samples, timesteps, features]).astype(np.float32) inputs[:, 3, :] = 0. inputs[:, 5, :] = 0. model = tf.keras.models.Sequential() model.add(tf.keras.layers.Masking(mask_value=0., input_shape=(timesteps, features))) model.add(tfm.nlp.models.TransformerEncoder( num_layers=1, num_attention_heads=2, intermediate_size=16, )) output = model(inputs)
Why isn’t here an exception raised? Is the masking layer actually working?
Any help will be appreciated!