I’ve scoured various tutorials on applying attention to a LSTM, but they either implement custom Attention layers, or use Keras/TF’s classes using examples that don’t relate to my study. I’m making a Bidirectional LSTM with Attention model for the sake of time series forecasting. After various attempts, I’ve landed on the following setup for which there are still problems needing ironing out:
input = Input(batch_input_shape=self.batch_input_shape) layer = None output = input output = LayerNormalization(name="Normalize", epsilon=1e-7)(output) cell = LSTMCell( name="LSTM", units=self.lstm_units, activation=self.activation, recurrent_activation=self.recurrent_activation, recurrent_regularizer=self.lstm_recurrent_regularizer, kernel_regularizer=self.lstm_kernel_regularizer, bias_regularizer=self.lstm_bias_regularizer, activity_regularizer=self.lstm_activity_regularizer, dropout=self.dropout, recurrent_dropout=self.recurrent_dropout, ) if self.attention_type == "bahdanau": mechanism = BahdanauAttention(units=self.attention_units, memory=output) else: mechanism = LuongAttention(units=self.attention_units, memory=output) cell = AttentionWrapper(cell, mechanism, name="Attention", output_attention=False) layer = RNN(cell, stateful=self.stateful, return_sequences=True) output = Bidirectional(layer, name="Bidirectional")(output) output = Dense(1, name="Reduce")(output)
- Why do the attention mechanisms require the
memoryargument? According to the docs, it’s optional. The way I’ve set this up, it receives the normalized inputs whereas it should be receiving the LSTM hidden layer inputs, no?
- The current error I’m receiving is:
TypeError: To be compatible with tf.eager.defun, Python functions must return zero or more Tensors; in compilation of <function while_loop..wrapped_body at 0x7f6ceb5386a8>, found return value of type <class ‘keras.engine.keras_tensor.KerasTensor’>, which is not a Tensor.
- Is this even the optimal flow? Input > Normalize > LSTM > Attention > Bidirection > Dense
- I’m using BahdanauAttention/LuongAttention with AttentionWrapper solely because of a Tensorflow example; but if there’s a way of using Keras’ Attention class in simpler fashion I’d be happy to learn how that works.