LSTM - need your help with basic sequences

Hi guys!

I need your valuable help to understand better LSTM’s for what I think is a relatively simple sequence , the code below runs however I am not getting the expected results, I am suspecting the way to shape the data, or the sequence definition, could you please shed light ?

import tensorflow as tf
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
import numpy as np 
import matplotlib.pyplot as plt

#function to plot training
def plot_graphs(history, string):

#the data
# eleven samples, each sample has been divided in  5 sub-sequences of 3 elements each
# e.g.    1,2,3  is one part of the first line / sequence,   10,11,12 is the next 

# Expected behaviour:
#  input  50,51   output 52
#  input  61,62   output 70
#  input  2002,2003   output 2010

data = np.array([    

#I am not sure if this is the right way to shape the data
data = data.reshape(11,5,3)


#slice the data , so the 3rd element of each subsequence of 3 elements is the label  and the first 2 are the input
#e.g.  1,2,3       1,2  is the input,  3 is the label

xs = data[:,:,:-1]
ys = data[:,:,-1:]

#print ('xs')
#print (xs)

#print ('ys')
#print (ys)

#define the model
lossf =  tf.keras.losses.MeanAbsoluteError()

model = Sequential()

#tried this but didn't make a difference for good
  #model.add(tf.keras.layers.BatchNormalization(input_shape=( 5,  2) ))
  #model.add(Bidirectional(LSTM(150, activation='relu')))

model.add(Bidirectional(LSTM(50, activation='relu' ), input_shape=( 5,  2)))
adam = Adam(learning_rate=0.0001,  )
model.compile(loss=lossf, optimizer=adam, metrics=['accuracy'])

history =, ys, epochs=120, 
                  verbose=1 , 
                  validation_split=0.1 , 
                  #,  shuffle=True

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')
plot_graphs(history, 'val_accuracy')
plot_graphs(history, 'val_loss')

#try it
predicted =   model.predict( [[[50, 51]]], verbose=0)   # expected  52 
print ('Predicted value ' ,  predicted )

predicted =   model.predict( [[[61, 62]]], verbose=0)   # expected  70
print ('Predicted value ' ,  predicted )

predicted =   model.predict( [[[2002, 2003]]], verbose=0)   # expected  2010
print ('Predicted value ' ,  predicted )
1 Like

For series data like that, you could try to convert the series into windows. Feeding the whole series to the neural net is possible but won’t give accurate results.

Take a look at this.

1 Like

So @GeorgeMR look at it this way.

The way you are creating your sequence training data is incorrect for your expected output.


What this does is equivalent to creating 11 batches with 5 sequences in each batch, each sequence of length 3. If you look at the individual sequence in your training data, you always have a sequence like the ones below:
[401,402,403] and so on.

You never encounter a sequence in your training data like the ones below:
[1022,1030,1031] and so on.

This is happening because you are splitting your sequence of (length 15) into 5 subsequences of (length 3) which does not represent the whole original (length 15) sequence. The smaller (length 3) is missing out to include the seasonality in the longer sequence.

You need to use the sliding window approach as mentioned by @Jean (Perkiraan deret waktu  |  TensorFlow Core) to preprocess your sequence and generate the training data. Hope this helps!


Thanks @Jean and @aditya1601 , I will try the windows technique, regarding aditya’s comment, the model doesn’t even learn to Add 1 to every number as it is supplied via the current split, I get that the rest of the sequence is ‘disconnected’ but the commonality of every 3-consecutive-numbers sets is not being picked up by the model, instead it it give something like 51, 52 … prediction = 4.23


Hi @GeorgeMR. There are many factors due to which the model may not be learning. For starters, the window size is too short. Try experimenting with a window size of at least 5. Try to use a simpler model and overfit it. Try to play with different optimizers and learning rates. All these hyperparameter tuning will definitely help in finding out the solution. Please share with us what worked at the end!


Time series are hard to process compared to other datasets. When converting the series into windows, you can keep the windows of the same sizes. If you are using, you will set the drop_remainder=True to drop all windows not having the size you provided.

In addition to @aditya1601 ideas on improving the model, you can try to schedule the learning rate.

I had the same experiment, tweaked different hyperparameters, but what accounted for the increments in the metric (Mean absolute error) was using the Learning Rate Scheduler and keeping the model simple. It might not be the same there, but this is what I was able to capture.

If you find it hard to write a series processing function with, you can refer to this or this (CC:@Laurence_Moroney).

1 Like