Suggestions regarding a `tf.data` pipeline

I am currently using the RandAugment class from tf-models (from official.vision.beta.ops import augment). The RandAugment().distort(), however, does not allow batched inputs, and computation-wise it’s expensive as well (especially when you have more than two augmentation operations).

So, following suggestions from this guide, I wanted to be able to map RandAugment().distort() after my dataset is batched. Any workaround for that?

Here’s how I am building my input pipeline for now:

# Recommended is m=2, n=9
augmenter = augment.RandAugment(num_layers=3, magnitude=10)

dataset = load_dataset(filenames)
dataset = dataset.shuffle(batch_size*10)
dataset = dataset.map(augmenter.distort, num_parallel_calls=AUTO)
1 Like

Yes the issue is that It seems to me that we have also duplicated OPS like e.g. cutout not batched in official.vision namespace and batched in TFA.

These are the origins of the current status:

1 Like

So, currently, no workaround, right?

1 Like

My opinion is that we need just to see how we want to standardize our image processing OPS in the ecosystem. I think these duplicates are going to create confusion.

1 Like