I used Cutmix method to augment the data, code available at CutMix data augmentation for image classification.
tf.data.Dataset.from_tensor_slices is used in this code to prepare the input pipeline. In order to feed to training data in Cutmix, two copies of training data is generated like:
train_ds_one = (tf.data.Dataset.from_tensor_slices((train_x, train_y)).shuffle(batch_size * 100)) train_ds_two = (tf.data.Dataset.from_tensor_slices((train_x, train_y)).shuffle(batch_size * 100)) train_ds = tf.data.Dataset.zip((train_ds_one, train_ds_two)) train_ds_cmu = ( train_ds.shuffle(batch_size * 100) .map(cut_mix, num_parallel_calls=AUTO) .batch(batch_size).prefetch(AUTO) )
In Cutmix, new virtual image is generated with combination of two images from the current batch.
- I’m utilizing a 200 batchsize, and cutmix function retrieving 200 augmented images. How did we obtain 200 augmented images? Why not more or less?
- Which two images are chosen to create a new virtual sample? In my point of view it could be a completely random selection of two images. Suppose Image1 and Image5 are selected to create a new image, but why not select Image1 and Image2, then Image1 and Image3, then Image1 and Image4, and so on?