tf.data.Dataset.from_generator: How to read data from list into tf.data.Dataset

My data is saved in a list, and each data instance has a variable length. The data is shown in the following screenshot:

enter image description here

How I store this list-indexed data in tf.data. Dataset for the purpose of developing efficient input pipeline?

1 Like

Do you have a few line dummy example?

You can use

tf.data.Dataset.from_tensor_slices(data)

to keep your list in a tf.data.Dataset object. I believe it doesn’t really matter if the arrays are of variable length.

@Bhack
Here is my code:

train_var_len = tf.data.Dataset.from_generator(lambda: data,
                                               tf.float64,
                                               output_shapes=(None, 6)
                                               )


ds_series_batch = dataset.shuffle(batch_size).padded_batch(batch_size)

Now I’m interested in merging data with labels. The label’s dimensions are ‘(200*7)’. The whole line of code is as follows:

train_ds_one = (
    tf.data.Dataset.from_tensor_slices((ds_series_batch, train_y))
)

However it gives me the following error:

    raise ValueError("Slicing dataset elements is not supported for rank 0.")
ValueError: Slicing dataset elements is not supported for rank 0.

Any idea, why it giving this error and how I should solve it?

@Abhiraam_Eranti I can use tf.data.Dataset.from_tensor_slices(data) directly, but sometimes the input is excessively large. To handle this, I am using a generator to manage GPU memory. Yes then I use tf.data.Dataset.from_tensor_slices(data) to combine data and labels. However, I am getting following error:
raise ValueError("Slicing dataset elements is not supported for rank 0.") ValueError: Slicing dataset elements is not supported for rank 0.
Now how should I fix this error? For further detail, you can see the code example in my above comment. Thanks

You will have to use

tf.data.dataset.zip((ds_series_batch, train_y)) to combine features and labels.

Also could you explain your GPU memory problem?

Thank you for the answer. Getting this error: AttributeError: module 'tensorflow._api.v2.data' has no attribute 'dataset'

Sorry for the spelling mistake. it’s tf.data.Dataset.zip

https://www.tensorflow.org/api_docs/python/tf/data/Dataset#zip