Does ImageDataGenerator apply data augmentation to validation data if validation_split is specified?

Nithin_A_R · June 27, 2021, 2:11pm

Hi, I was going through the tutorials at tf.keras.preprocessing.image.ImageDataGenerator | TensorFlow Core v2.8.0
and Data augmentation | TensorFlow Core when I came across this doubt.
If I have a training directory with some images and I used ImageDataGenerator to augment the data with a validation_split = 0.2, as shown below.

train_datagen = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255, width_shift_range=0.2,
shear_range=0.2, height_shift_range = 0.2,
zoom_range=0.2, validation_split = 0.2,
horizontal_flip=True)
test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

train_ds = train_datagen.flow_from_directory(
train_dir, seed = 42,
target_size= img_size, subset = ‘training’,
batch_size=32)
valid_ds = train_datagen.flow_from_directory(
train_dir, seed = 42,
target_size= img_size, subset = ‘validation’,
batch_size=32)

test_ds = test_datagen.flow_from_directory(
test_dir, seed = 42,
target_size= img_size,
batch_size=32)

my question is this:
Does the image augmentation applies to the validation_ds by default ?. If so it wouldn’t it create more bias towards the original training data? (as mentioned in Data augmentation | TensorFlow Core We should not augment the validation data.)

What if the validation_split argument was provided in the model.fit() method instead? does it mean that the validation split would have applied in the augmented training data?

Bhack · June 28, 2021, 10:41am

It was just closed 4 days ago

Check Split train data into training and validation when using ImageDataGenerator and model.fit_generator · Issue #5862 · keras-team/keras · GitHub

Nithin_A_R · June 28, 2021, 2:33pm

Thanks a lot
So, In one approach it says to create different ImageDataGenerators for validation and training subsets while keeping a constant seed value. It works!