Effective batch size using tf.distribute.MirroredStrategy

marcocintra · September 13, 2023, 9:55pm

Hi, when using tf.distribute.MirroredStrategy the effective batch size is the configured batch size divided by the number of GPUs or is just the batch size that was configured in the code? For example, if I want an effective batch size = 1, I just use batch size = 1, without using tf.distribute.MirroredStrategy; if I use tf.distribute.MirroredStrategy with 8 GPUs should I set batch size = 8 or = 1? Thanks in advance!

Kiran_Sai_Ramineni · September 15, 2023, 6:02am

Hi @marcocintra, while using distribute training the input given to the model will be divided equally among the multiple replicas. For instance, if you are using the MirroredStrategy with 2 GPUs, with a batch of size 10, it will be divided among the 2 GPUs, with each receiving 5 input examples in each step.

In your case if you use 8 GPUs and set the batch size as 8 each gpu will receive 1 input example for each step. It’s better to increase batch size to make effective use of the extra computing power. Thank You.

marcocintra · September 15, 2023, 10:37pm

Thanks, @Kiran_Sai_Ramineni. So whether or not I use the MirroredStrategy the batch that I define is the effective batch?

Kiran_Sai_Ramineni · September 19, 2023, 7:01am

Hi @marcocintra, The choosing of batch size depends upon the available memory of the GPU. If you use larger batch sizes the training will take less time but require more memory and vice versa for smaller batch size. You can experiment with different batch sizes depending upon the availability of GPU memory for effective training. Thank You.