Misleading examples in the tf.keras.utils.timeseries_dataset_from_array

Vincent_Yuan · May 9, 2022, 2:01pm

The tf.keras.utils.timeseries_dataset_from_array provides 3 examples, the 2nd of which is misleading and may lead to truncation of input data in the time series output.

The original example is:

# Example 2: Temporal regression.

# Consider an array data of scalar values, of shape (steps,). 
# To generate a dataset that uses the past 10 timesteps to predict the next timestep, you would use:

input_data = data[:-10]
targets = data[10:]
dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
    input_data, targets, sequence_length=10)
for batch in dataset:
  inputs, targets = batch
  assert np.array_equal(inputs[0], data[:10])  # First sequence: steps [0-9]
  assert np.array_equal(targets[0], data[10])  # Corresponding target: step 10
  break

it returns

Input:[[0 1 2 3 4 5 6 7 8 9]], target:[10]

Say we set data = tf.range(20) in fact the steps that it generates is less than what it should have because the slicing of input_data is misleading. If it is to predict the next 1 step, the example should be:

data = tf.range(20)
input_data = data[:-1]
targets = data[10:]
dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
    input_data, targets, sequence_length=10)
for batch in dataset:
  inputs, targets = batch
  assert np.array_equal(inputs[0], data[:10])  # First sequence: steps [0-9]
  assert np.array_equal(targets[0], data[10])  # Corresponding target: step 10
  break

for batch in dataset.as_numpy_iterator():
  input, label  = batch
  print(f"Input:{input}, target:{label}")

It returns:

Input:[[ 0  1  2  3  4  5  6  7  8  9]
 [ 1  2  3  4  5  6  7  8  9 10]
 [ 2  3  4  5  6  7  8  9 10 11]
 [ 3  4  5  6  7  8  9 10 11 12]
 [ 4  5  6  7  8  9 10 11 12 13]
 [ 5  6  7  8  9 10 11 12 13 14]
 [ 6  7  8  9 10 11 12 13 14 15]
 [ 7  8  9 10 11 12 13 14 15 16]
 [ 8  9 10 11 12 13 14 15 16 17]
 [ 9 10 11 12 13 14 15 16 17 18]], target:[10 11 12 13 14 15 16 17 18 19]

Surya_Y · September 20, 2023, 6:00am

Hi @Vincent_Yuan ,

Welcome to Tensorflow Forum.

I have gone through the example-2 in the TF documentation. With existing code we can only generate one batch of dataset.Yeah , I agree to your concern that some users may confuse why we are using input_data = data[:-10] as it can generate only one batch and there will be loss of data. Though the intention in the example is to demo on generating batches of data using the API,Its better to use input_data = data[:-1] to avoid confusion and also it give perfect demo to generate the total possible no of batches without loss of data.

We will raise a PR to correct this and also will try to add the code snippet to generate the batches of data which will help the users to better understand the usage of this API.

Thank you and keep contributing.

Surya_Y · September 20, 2023, 6:51am

Opened a PR for this in keras-team/tf-keras repo here.