LSTM false predictions when timestep changes

Hi everyone,
I am trying to train a LSTM network with input (train data shape) (80,27) (batch size not included) since each sequence in my training data has 80 timesteps and 27 features
For the prediction part I want to predict each timestep separately and not the 80 together.

For example my prediction batch has size (1,1,27) but the result is wrong (MAE Value 4.0000) since the model was trained with 80 timesteps in the input data.

I tried to switch into a stateful Model when predicting and transfering the weights from the trained model to the new one but still MAE value is 4. So what am I missing?

Network Architecture:

def Model():
    inputs = keras.Input(shape=(None,27))
    x = layers.Bidirectional(layers.LSTM(512,return_sequences=True))(inputs)
    x = layers.Bidirectional(layers.LSTM(512,return_sequences=True))(x)
    x = layers.Bidirectional(layers.LSTM(256,return_sequences=True))(x)
    x = layers.Bidirectional(layers.LSTM(256,return_sequences=True))(x)
    x = layers.Bidirectional(layers.LSTM(128,return_sequences=True))(x)
    x = layers.Bidirectional(layers.LSTM(128,return_sequences=True))(x)
    x = layers.Bidirectional(layers.LSTM(128,return_sequences=True,))(x)
    
    x = layers.Dense(512,activation="selu")(x)
    x = layers.Dense(256,activation="selu")(x)
    
    x = layers.Dense(1)(x)
    
    return keras.Model(inputs=inputs,outputs=x)

Thanks in advance

You should define the input shape in the first layer as tf.keras.Input(shape=(80, 27))
If you want to get a prediction for 1 timestep out of every 80 x 27 data matrix, you can insert x = tf.keras.layers.GlobalAveragePooling1D()(x) after the last Bidirectional layer. Or GlobalMaxPooling1D.

But in the prediction part how should I proceed? I want to predict each timestep separately .Ex. model.predict((1,27)) and based on that predict the next timestep of the same input shape

I’m not sure that I understand your question. To train LSTM model you should pass multiple samples where each sample has the shape n_previous_timesteps x n_features.
In the base case the model predicts one next value for each of the samples.
To preprocess the data you can use tf.keras.preprocessing.sequence.TimeseriesGenerator  |  TensorFlow Core v2.8.0
You pass tabular input data as the first argument and corresponding next values as the second argument. And define length, which in this case should be 80.

Thank you, I think I have understand the training process. My problem now is in the prediction phase.When it comes to predicting I have only one timestep available each time.
For example I want to predict a sample with shape (1,1,27) and based on the outcome of that prediction I want to predict the next sample with the same shape.
If I try to predict a sample with shape (1,80,27) I get 80 different outcomes which is not what I want.
I want to predict one timestep at time until I reach timestep 80, then the next sequence comes in.

Sorry again if I am not very clear