Cutmix Data Augmentation: How many new samples are generated in each batch using Cutmix?

I used Cutmix method to augment the data, code available at CutMix data augmentation for image classification. is used in this code to prepare the input pipeline. In order to feed to training data in Cutmix, two copies of training data is generated like:

  train_ds_one = (, train_y)).shuffle(batch_size * 100))
  train_ds_two = (, train_y)).shuffle(batch_size * 100))
  train_ds =, train_ds_two))
  train_ds_cmu = (
    train_ds.shuffle(batch_size * 100)
    .map(cut_mix, num_parallel_calls=AUTO)

In Cutmix, new virtual image is generated with combination of two images from the current batch.


  1. I’m utilizing a 200 batchsize, and cutmix function retrieving 200 augmented images. How did we obtain 200 augmented images? Why not more or less?
  2. Which two images are chosen to create a new virtual sample? In my point of view it could be a completely random selection of two images. Suppose Image1 and Image5 are selected to create a new image, but why not select Image1 and Image2, then Image1 and Image3, then Image1 and Image4, and so on?

Thank you for the comment. Can you give some detail like how many should be increased as extra?