I am looking at a kaggle code which is building a tf.data.Dataset object and then splits the data that was placed in the object for further processing. I am having a hard time understanding what is happening to the data in the following two steps:
- Creating the datset object with a tuple made of data and labels
- Splitting the dataset object again into a tuple of (tuple and a list).
Please see below:
def split_labels(x, y): return (x, x), y t_dataset = ( tf.data.Dataset.from_tensor_slices( ( df_train[['premise','hypothesis']].values, keras.utils.to_categorical(df_train['label'], num_classes=3) ) ) ) x_preprocessed = t_dataset.map(split_labels)
Do I understand correctly that the only difference between the data structure before the call to split_labels and after is that :
before the call the data structure is a Tuple made up of two Lists
after the call the data structure is a Tuple made of a Tuple and a List?