Timeseries_dataset_from_array returns future samples instead of previous

Victor_Soby · November 5, 2021, 3:05am

I have a dataframe in my ML project it is in the following format initially.

        Feature 1 | Feature 2| Feature 3 | Feature 4 | etc.
Time 1
Time 2
Time 3
Etc.

I am trying to change this dataframe to be 3d, where each value in this dataframe has another dimension into the screen, containing the same value for the same feature, but at previous 192 timesteps.

Here i am trying to use the built in function keras.preprocessing.timeseries_dataset_from_array(), but it returns the opposite of what i’m trying to achieve.

I expect it to return

          Feature 1 | Feature 2| Feature 3 | Feature 4 | etc.
Time 192| [1-192]   | [1-192]  | [1-192]   |           |
Time 193|           |          |           |           |
Time 194|           |          |           |           |
Time End|           |          |           |           |

Here it instead returns:

        Feature 1 | Feature 2| Feature 3 | Feature 4 | etc.
Time 1| [192-1]   | [192-1]  | [192-1]   |           |
Time 2|           |          |           |           |
Time 3|           |          |           |           |
Time End-192|     |          |           |           |

Basically every sample contains the future 192 values, instead of the previous 192 values of the dataset. Therefore it ends 192 samples before it should, and starts 192 samples too early.

My code is the following:

#Past is defined as 192
#x_val is the 2-d dataframe
#y_val is one of the columns in the dataframe.

dataset_historic_train = keras.preprocessing.timeseries_dataset_from_array(
    x_val,
    y_val,
    sequence_length=past,   
    batch_size=len(x_val),
)

Where x_val is the entirety of my 2-d dataframe indexed from first to last time of sample, and y_val is my target feature, which is Feature 1 in this case.

python dat

Ekaterina_Dranitsyna · November 5, 2021, 8:25am

You pass x and y of equal length to the dataset constructor. When it transforms x using sliding window of size 192, the x becomes shorter, because the first 192 rows of your original DataFrame do not have enough previous values. So it drops the last 192 values of y to pair it with x.
To make it work as expected you should pass x and y[192:]. Then it will drop the first 192 values of y.