Dataset shuffle problem for RNN

melanzanefritte · February 2, 2022, 10:18am

Hi everyone,
i’m approacching RNN and i have a supervised signal recognition task.
I have 7955 different signals, each consisiting of 300 samples.
I proceeded to create a tensor for the signals (7955, 300, 1) and a tensor for the lables (7955, 75) (75 different classes, one hot encoded).
Then i wrapped it up in a dataset using tf.data.dataset.fromtensors().
That lead to my problem, now i’m trying to shuffle and separate this dataset in order to have a train and a test dataset.
I’m trying to use the common .skip() and .take() method, but it doesen’t work out because the cardinality of the dataset seems to be 1.

Any idea to get rid of this?
Thanks in advance and sorry for eventually grammar mistakes.

Renu_Patel · August 11, 2023, 10:39am

Hi @melanzanefritte

Welcome to the Tensorflow Forum!

It’s tough to understand the issue without the proper code how you have created the tensors or tf.from_tensors()
Please share the minimal reproducible code to replicate the error to understand and fix the issue. Thank you.

talhariaz5193 · August 12, 2023, 11:17pm

Hey there,

It sounds like you’re on the right track with using TensorFlow for your RNN task, but got a hiccup with the data manipulation part. If I understand correctly, when you wrapped your tensors using tf.data.Dataset.from_tensors(), you essentially created a dataset of 1 item, where that item contains all your signals and labels. This would explain why you’re seeing a cardinality of 1.

You might want to use tf.data.Dataset.from_tensor_slices() instead. This method is designed to create a dataset from individual slices of arrays, which seems more aligned with what you’re trying to achieve.

Something like:

Copy code

dataset = tf.data.Dataset.from_tensor_slices((signals’_tensor, labels_tensor))

After this, you can shuffle and split the dataset as you intended:

Copy code

dataset = dataset.shuffle(buffer_size=7955)
train_dataset = dataset.take(int(0.8 * 7955)) # assuming you want an 80-20 split
test_dataset = dataset.skip(int(0.8 * 7955))

Hope this helps you out. And don’t worry about grammar mistakes; we’re all here to learn and help each other out!

Stay curious,
Ahmad