Questions about Keras Beginner Tutorial on Basic Image Classification

Brocco_Lee · September 28, 2021, 1:51am

Hi,

I have a few very very simple questions. Just started learning how to use TensorFlow, am going through the first beginner tutorial. Can someone explain to me what these two numbers represent:

According to the tutorial, there are 70k of images. Are all 60k images used for updating the weights and biases in the neural network at every iteration or is it only 1875? Similarly, are the remaining 10k used for testing or is it 313?

Thanks.

Ekaterina_Dranitsyna · September 28, 2021, 8:37am

It’s the number of batches used for training in each epoch. In this case the model is trained on all available train samples every epoch, which total 1875 batches. Then it is evaluated on all test samples, which total 313 batches.
If you divide the number of samples by the batch size, you should get the same numbers .
In this basic example the number of batches just shows the training progress. But there are some use cases, when it could be beneficial to set the “steps_per_epoch” argument in fit() to some smaller number of batches and validate the model more often.

Brocco_Lee · September 28, 2021, 3:24pm

Hi Ekaterina,

Thank you for your answer. Based on what you said, 60k/1875 == 10k/313, the common number is 32. So it takes 32 iterations of feeding 1875 training samples into the network in order to pass all 60k per epoch OR is it the other way round: 1875 iterations of 32 samples? The way the number was reported in the debug console suggests it’s the latter, but I’m not sure. Could you clarify a little?

Thanks.

Ekaterina_Dranitsyna · September 29, 2021, 7:36am

During one epoch the model “sees” every train sample only once.
Train samples are grouped into batches to make data processing more efficient. In your case each batch contains 32 samples.
Number of epoch defines how many times the model will see every train sample.

Brocco_Lee · September 29, 2021, 3:42pm

Got it! One day I will get the nuances right! Thank you, Ekaterina!

8bitmp3 · October 1, 2021, 9:39pm

Some resources that you may find useful:

Distributed training: Note about dataset batching - Multi-GPU and distributed training
TPUs: Rule of thumb: pick efficient values for batch and feature dimensions - Cloud TPU performance guide | Google Cloud
TPUs: Batch size, learning rate, steps_per_execution - Tensor Processing Units (TPUs) Documentation | Kaggle
Multi-GPU training: “batch size considerations depend on your training framework” - Train With Mixed Precision :: NVIDIA Deep Learning Performance Documentation
“Increasing batch size while mitigating accuracy degradation is actively researched in the ML and systems communities” (Augment your batch: better training with larger batches - Hoffer et al., 2019)
A Recipe for Training Neural Networks (2019, by Andrej Karpathy)