How is preprocessing applied? On the whole dataset or on batches?

Fadi_Badine · October 21, 2021, 12:24pm

Hi,

I wanted to know how preprocessing in general and normalization in particular works:

When the preprocessing is applied on dataset using .map, is for example the mean calculated on the whole dataset? Or is it per batch?
Same question when the Normalization is part of the model as a preprocessing layer.

Thanks!

Regards,
Fadi Badine

lgusm · October 21, 2021, 2:47pm

Hi Fadi, I’d say it’s all evaluated lazily so it’d be per batch

markdaoust · October 21, 2021, 5:49pm

all evaluated lazily

tf.data doesn’t run anything until you iterate over the dataset, that part is lazy.

But if you look for examples on tensorflow.org you’ll see they all set the mean/variance one way or another before using the Normalization layer, or any other similar layer.

You can either set them as arguments to the constructor, or use the .adapt which sets the statistics based on all the data you give it.

If you want it to to calculate the mean per batch (or an EMA across batches), Use BatchNormalization.