Does TransformerEncoder layer accept built-in mask?

I want to use the Tensorflow/Keras TransformerEncoder layer (TransformerEncoder layer), but I am not sure if it accepts a built-in mask (e.g. generated by the Masking() layer) or the “padding_mask” argument is the only way to feed in the masking information. My code looks like this:

masked_embedding = Masking(mask_value=0.)(pre_masked_embedding)
cont_emb = TransformerEncoder(num_heads=4,intermediate_dim=32)(masked_embedding)

Now I don’t know whether the above code is enough or not. I tried the below approach as well:

masked_embedding = Masking(mask_value=0.)(pre_masked_embedding)
cont_emb = TransformerEncoder(num_heads=4,intermediate_dim=32)(masked_embedding, padding_mask = masked_embedding._keras_mask)

But this throws a warning and an error:

WARNING:absl:You are explicitly setting `padding_mask` while the `inputs` have built-in mask, so the built-in mask is ignored.
Output exceeds the size limit. Open the full output data in a text editor
TypeError                                 Traceback (most recent call last)
/Users/amin/Desktop/PhD/3rd Project/Python codes/STraTS-main/Bert_TS.ipynb Cell 11 in <cell line: 1>()
----> 1 history =
      2     train_input,
      3     train_output,
      4     epochs=1000,
      5     batch_size=70,
      6     validation_split=0.1,
      7      callbacks=[
      8         EarlyStopping(monitor="val_loss", patience=5, mode="min")
      9     ],
     10 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/utils/, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     65 except Exception as e:  # pylint: disable=broad-except
     66   filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67   raise e.with_traceback(filtered_tb) from None
     68 finally:
     69   del filtered_tb

File /var/folders/3m/_t8llt6n10z5xzvm7vxh37nw0000gp/T/, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
     13 try:
     14     do_return = True
---> 15     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/engine/", line 254, in __array__
        raise TypeError(

    TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_15'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

Can someone let me know if my first approach should be ok (I have no idea how to check whether the masking is properly woking), and if not how should I do it?

Hi @Amin_Sh, The TransformerEncoder layer accepts built-in mask by passing True to the masked zero argument

inputs = keras.Input(shape=(None,), dtype="int32")
outputs = keras_nlp.layers.TokenAndPositionEmbedding(vocabulary_size=1000,
outputs = keras_nlp.layers.TransformerEncoder(num_heads=4,

You can also explicitly define by

inputs = keras.Input(shape=(None,), dtype="int32")
mask = tf.keras.Input(shape=(10,), dtype='int32')
outputs = keras_nlp.layers.TokenAndPositionEmbedding(vocabulary_size=1000,
outputs = keras_nlp.layers.TransformerEncoder(num_heads=4,

Thank You.

1 Like