Create_padding_mask in Transformer code uses encoder input sequence for creating padding mask in 2nd attention block of the decoder

I am going through the Transformer code on tensorflow.org .

def create_masks(self, inp, tar):
    # Encoder padding mask (Used in the 2nd attention block in the decoder too.)
    padding_mask = create_padding_mask(inp)

    # Used in the 1st attention block in the decoder.
    # It is used to pad and mask future tokens in the input received by
    # the decoder.
    look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
    dec_target_padding_mask = create_padding_mask(tar)
    look_ahead_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)

    return padding_mask, look_ahead_mask

Transformer class has a method called create_masks which creates padding and look ahead mask. I understand that padding mask for encoder should take input sequence(input to the encoder) for creating padding mask. However, what I do not understand is why should the input sequence to the encoder should be used for creating padding mask for second attention block of the decoder(first line of the code). I think the padding mask for decoder should be created using the target sequence(which is fed to the decoder).

Please help me understand why this is done.

1 Like