How does the Bidirectional layer handle the masked timesteps when merging the outputs of the forward and backward LSTMs?

Hello!

I have variable sequence lengths, so I need to pad and mask them to a fixed length of timesteps.

I was wondering how exactly the Bidirectional layer handles the masked timesteps when merging the outputs of the forward and backward LSTMs (the LSTM has return_sequences=True)?

For example, suppose an input sequence is [1.0, 2.0, 3.0], and I pad it to length-5 by using -1.0 to become [1.0, 2.0, 3.0, -1.0, -1.0]. I use the Masking layer to mask the last two timesteps. I then feed the masked sequence to the Bidirectional(LSTM) like the following:

output = Bidirectional(LSTM(1, return_sequences=True))(masked_input)

Suppose the output of the forward LSTM is [0.1, 0.2, 0.3, 0.0, 0.0] since the last two timesteps are masked, and the output of the backward LSTM is [0.0, 0.0, 0.4, 0.5, 0.6]. If using concatenate mode, will the Bidirectional layer merge these two outputs like the following?

[[0.1, 0.0], [0.2, 0.0], [0.3, 0.4], [0.0, 0.5], [0.0, 0.6]]

Or will it merge them like the following?

[[0.1, 0.4], [0.2, 0.5], [0.3, 0.6], [0.0, 0.0] [0.0, 0.0]]

I hope it is the second case.

If it is the first case, the result would be different from directly using the input [1.0, 2.0, 3.0] without padding and masking, and this is not what we want. Especially when the padding part is longer than the original sequence length (e.g. a length-3 sequence is padded to length-7), all the output values of LSTMs would be concatenated with 0.0.

I would appreciate your advice on how the Bidirectional layer does the merge in Keras when there are masked timesteps.

Thank you very much!

I would appreciate advice from the Keras team or community on this question.
Thank you very much!

@lht9916 I’m also dealing with the zero-padding and masking issue. Please read my Post and share your thoughts. Please let me know if you did not receive a solution to your problem. In this case, I will look into your issue and see what can actually happen.