Can't get TF Dataset to work with Keras ImageDataGenerator.flow_from_directory()

So far I was using a Keras ImageDataGenerator with flow_from_directory() to train my Keras model with all images from the image class input folders. Now I want to train on multiple GPUs, so it seems I need to use a TensorFlow Dataset object.

Thus I came up with this solution:

keras_model = build_model()
train_datagen = ImageDataGenerator()
training_img_generator = train_datagen.flow_from_directory(
    target_size=(image_size, image_size),
train_dataset =
    lambda: training_img_generator,
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, image_size, image_size, 3], [None, len(image_classes)])
# similar for validation_dataset = ...

Now this seem to work, the model is trained as usual. However, during training I get the following warning message, when using a mirrored strategy:

AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Did not find a shardable source, walked to a node which is not a dataset

So I added the following lines between creating the data sets and calling fit():

options =
options.experimental_distribute.auto_shard_policy =

However, I still get the same warning.
This leads me to these two questions:

  1. What do I need to do in order to get rid of this warning?
  2. Even more important: Why is TF not able to split the dataset with the default AutoShardPolicy.FILE policy, since I am using thousands of images per class in the input folder?

Use it like thia

train_ds =, y_train))
validation_ds =, y_test))
data_augmentation = tf.keras.Sequential(
            height_factor=0.2, width_factor=0.2
def preprocess_image(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = tf.image.convert_image_dtype(image, tf.float32) / 255.0
    return image, label
# Training Pipeline
pipeline_train = (
    .map(preprocess_image, num_parallel_calls=AUTO)
    .map(lambda x, y: (data_augmentation(x), y), num_parallel_calls=AUTO)

# Validation Pipeline
pipeline_validation = (
    .map(preprocess_image, num_parallel_calls=AUTO)

Thanks, but we don’t use tensor slices, but images from a directory.
Can’t we use a Dataset with the flow_from_directory() function?

I am not sure that but the most preferred way is to do with tensor slices. Follow this tutorial to get the overall insight.

Ok, so I ended up using your notebook.
However, tihs leads to exactly the same warning when using a mirrored strategy :frowning:

What is the warning?

AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: “TensorSliceDataset/_2”

Even though I tried it with these options:

options = tfd.Options()
options.experimental_distribute.auto_shard_policy = tfd.experimental.AutoShardPolicy.DATA

The warning is still the same.

Here is the full source code:

Thanks will look into this.

1 Like