Help! I dont understand "input_shape" more

ChrisXY_Zhao · July 14, 2021, 10:44pm

Hey there,

maybe some of you can help me to drop me off from misunderstanding list of input_shape.

I am using dataset to gen data.

The window is 10, and I divide x with size 5 and y with size 5.

Then the dataset will go ahead to the model, I just work with a simple Sequential one and give input_shape (3,1) to to first layer, fe. lstm.

I can fit the model.
I predict random values from (1, 1) as input_shape to (10, 1) as input_shape.

So what does input_shape mean actually? I lost it so deeply…

Full code:

import numpy as np
import tensorflow as tf

data = np.random.randn(5000)
data = data[...,np.newaxis]

sp = int(len(data) * .5)
train_data = data[:sp]
valid_data = data[sp:] 

def windowed_set(data): 
  win_sz=10
  ds = tf.data.Dataset.from_tensor_slices(data)
  ds = ds.window(size=win_sz, shift=1, drop_remainder=True).flat_map(lambda w: w.batch(win_sz))
  ds = ds.shuffle(win_sz).map(lambda w: (w[:int(win_sz*0.5)], w[int(win_sz*0.5):])) 
  ds = ds.batch(32).prefetch(1)
  return ds

train_set  = windowed_set(train_data)
valid_set = windowed_set(valid_data) 

#Model definition
model = tf.keras.models.Sequential([                           
   tf.keras.layers.LSTM(32, 
                        input_shape=[3, 1],
                        return_sequences=True),
   tf.keras.layers.Dense(1)                                
])

model.summary()
model.compile(loss=tf.keras.losses.Huber(), 
              optimizer=tf.keras.optimizers.Adam())


#Training
model.fit(train_set,
          validation_data=valid_set,
          epochs=50, 
          verbose=0)

#Prediction
for i in range(1, 11):
  print(f"prediction {i} bit:")
  pred_input = np.random.rand(i)
  pred_input = np.expand_dims(pred_input, axis=-1)
  print(pred_input.shape)
  pred = model.predict(np.reshape(pred_input, (1,) + pred_input.shape))
  print(pred)
  print()

anon10279914 · July 15, 2021, 10:07am

Input shape is the dimension of the input data, for example in case you have image data a Keras model needs an input_shape of (height, width, num_channels) , if you are feeding a model with an input of (3, 1) the model will learn dependences of three consecutive elements. larger the window more information the model considers about the temporal dependences of observations in the sequence.

Sometimes you can have dynamic input shapes ex, images with variable height/width. You can check more details here tf.keras.Input | TensorFlow Core v2.8.0

ChrisXY_Zhao · July 15, 2021, 10:18am

thx @anon10279914 .

According to your statement, the training will be performed by the dependencies of input_shape instead of the shape of the batch’s input samples, right? correct me if I am wrong.

So, what I mean here, for example, the samples are with window 100, the training will still be performed with 3 and ignore the rest of 97?

Btw, the compiler doesn’t give any warnings there…

anon10279914 · July 15, 2021, 10:49am

You have the following architecture:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 3, 32)             4352      
_________________________________________________________________
dense (Dense)                (None, 3, 1)              33        
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________

If I feed one tensor created with your code in the model:

train_set = iter(train_set)
batch = next(train_set)
out = model(batch[0])

you will get the warning:

WARNING:tensorflow:Model was constructed with shape (None, 3, 1) for input KerasTensor(type_spec=TensorSpec(shape=(None, 3, 1), dtype=tf.float32, name='lstm_1_input'), name='lstm_1_input', description="created by layer 'lstm_1_input'"), but it was called on an input with incompatible shape (32, 5, 1)

therefore your model is only considering 3 time steps from the 5 you are feeding, have a look here:How to Reshape Input Data for Long Short-Term Memory Networks in Keras , this this could help you to understand the input logic in LSTM models