Aligning time series model predictions with input series?

brendonwp · October 14, 2023, 1:40pm

I’ve been running a simple time series model predicting a scalar variable (the price of Diesel). I need to align the input with the results from model.predict() to plot them together and visually assess model performance. Since the input to model.fit() and the output from model.predict() are both numpy arrays there is no explicit time information linked to the data.
The model uses a window of ten past time steps and predicts ten steps into the future, with a shift of one when moving the window. I expected model.predict() to return the same number of points as were input (dropping the past 10 and adding the future 10) , or 10 fewer (dropping the past values but not adding predicted ones). The prediction is close to the latter but not quite. The documentation has not helped me.
Can someone help me understand what is going on?
Thanks in advance!!

tagoma · October 15, 2023, 8:30am

Hi @brendonwp .
You start talking about a <simple time series model predicting a scalar variable>, then few lines later you wrote down <The model uses a window of ten past time steps and predicts ten steps into the future>, which doesn’t seem quite consistent.
Another thing is that you don’t show any code which makes it quite difficult to <help [you] understand what is going on>.
Can you please share reporducible example, e.g. in Colab, replacing actual with random numbers int tensors/arrays of appropriate shape?
Thank you.

brendonwp · October 15, 2023, 6:58pm

Hi @tagoma - thanks, I am simplifying the notebook and will share the link tomorrow. Looking forward to your comments

brendonwp · October 17, 2023, 5:22am

Hi All

Here is my notebook performing time series forecasting on a “toy” dataset - 1000 “observed” points in total in a synthetic dataset. When I pass the full dataset and the fitted model to model.predict() I receive many fewer points than were observed. I need to align the predicted and observed values to assess the fit and don’t know how to cope with this missing data.
I drop 10 points (past window observations) and pass 990 points to my plotting function which also shows the predicted values. But there are only 960 predicted values to plot so there is a “tail” with observations and no predictions.
If you can look at the code the notebook is open for comments input would be greatly appreciated.
The global variable DEBUG controls the amount of diagnostic information displayed.

noctavian · October 20, 2023, 3:37am

def model_forecast(model, series, window_size, batch_size):
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size))
    ds = ds.batch(batch_size, drop_remainder=True).prefetch(1)
    forecast = model.predict(ds)
    return forecast

In case you haven’t figured it out yet, your batch size is 32 so your model is actually asked to predict 30*32=960 samples since drop_remainder=True when you call batch().

Also,

def normalize_series(data, min, max):
    data = data - min
    data = data / max
    return data

assuming that you’re trying to perform min-max scaling this is slightly wrong, you’re supposed to divide by the difference between the max and the min, data = data / (max-min).

brendonwp · October 22, 2023, 8:21am

Thanks @noctavian. I had not figured this out and your answer makes complete sense Much appreciated!!

The need to align my series is so I can calculate MAE - so I will drop the observed points at the end.

The comment on scaling is also helpful…

Tomydispik_Da_Cat · January 19, 2024, 3:12am

I have a similar issue. When I look at the prediction and compare it to the “last” row I find the model considers the value in row 132 as the row to base the prediction on. When I look at the value in the last row, #417, it is massively off mark.

If I understand your response @noctavian, I need to go back to BiLSTM layer> Unit_LSTM = 32 and change the value 32 to 13.9 or 14? This would be 417/30 based on your response.

Can I take a shortcut here and change “ds = ds.window(window_size, shift=1, drop_remainder=True)” to “drop_remainder=False”?

My data sets are likely going to vary in length due to cleaning and adding more data sets.