When trying to find the optimal batch size, I run multiple experiments with graduallyl augmenting bactch size.
Often, I notice my cache is almost full (a few MB left), but I can still increase batch size by ~25% without any out of memory error, and GPU cache usage will be the same.
example: batchsize=10, GPU almost full.
batchsize=13, GPU almost full, still train fine
Why is that ? Is it a good idea to go with 13 instead of 10 ?
Hi @Maxime_G, This may be due to the Gradient accumulation (runing the mini-batches sequentially, while accumulating the gradients). coming to choosing the batch size, if the batch size is too large the model may overfit but model training will be faster. If the batch size is too low the training will be slower .It is better to train with the optimal batch size. In your case as there is less difference you can go with 13. Thank You.