Batch size and Keras.Sequential models

When dealing with large sets of images, we read batches of images at a time, but I am assuming we feed the CNN models one image at a time. This brings up my confusion around Input_Shape definition.

Consider the following code for preprocessing images and modeling:

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    batch_size = Batch_Size, 
    image_size = (Image_Size, Image_Size), 
    shuffle = True)

# Image Processing: Rescaling and Resizing
resize_and_rescale = tf.keras.Sequential([
    layers.experimental.preprocessing.Resizing(Image_Size, Image_Size),

input_shape = (Batch_Size, Image_Size, Image_Size, Channels)
num_classes = 3

model = models.Sequential([
    layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu', input_shape=input_shape),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.Conv2D(128, (3,3), activation='relu'),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.Conv2D(128, (3,3), activation='relu'),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='softmax'),

The first line of the Sequential model is the preprocessing step, which is carried out on a single image. This is clear form the image_dataset_from_directory() calll where the Image_size = (Image_Size, Image_Size).

The second line, the call to Conv2D specifies an input shape that also includes the Batch_Size.

My confusion is, if the model works on one image at a time, why does the batch size need to be included in the Conv2D call?

Hi @Nader_Afshar,

The following is my comprehension of the information:

The batch size is included in the Conv2D call because the convolutional layers in a CNN are typically applied to a batch of images at a time. This is done for efficiency reasons. By processing a batch of images at a time, the convolutional layers can be implemented more efficiently in hardware.

The Batch_Size dimension is included so that the convolutional layer knows how many images to process at a time.

Let’s say you have a batch size of 32 and an image size of 128x128 with 3 color channels. Your input shape would be (32, 128, 128, 3) . This signifies that each batch consists of 32 images, each with a size of 128x128 and 3 color channels. The CNN operations within each batch are performed in parallel on these 32 images.

In summary, the batch size is included in the input shape of the model to ensure that the model processes data efficiently and takes advantage of the parallel processing capabilities of modern hardware. While individual operations within each batch might be performed on single images, the overall network operations involve multiple images within each batch.

I hope this will clear your confusion.


@Laxma_Reddy_Patlolla Thank you for this clarification. If the CNN model does indeed work on all the batched images at once, then why is the preprocessing (1st step) before the call to Conv2D is set-up with a single image shape? see the function resize_and_resscale.

Is this line not part of the CNN model, processing a single image at a time?

It is a bit confusing as to where in the model definition does the “batching” occur and how is that determined by Keras.

Thanks again

Hi @Nader_Afshar ,

The resize_and_rescale step you’ve mentioned, which includes resizing and rescaling, is a preprocessing step applied to individual images before they are passed to the CNN layers.This step is not explicitly part of the CNN model. it’s a separate preprocessing operation that happens before feeding data into the CNN. It’s applied to each image individually, and the intention is to standardize the images before they enter the model.

In a Keras model, the “batching” itself is determined by the data generator or pipeline that prepares your data before it enters the model. The model architecture and input shape work together to process the data efficiently in batches as it flows through the layers.

I hope this clarifies.


@Laxma_Reddy_Patlolla Thank you kindly for these clarifying explanations. I think I now understand to some extent, especially after looking at the tenserflow docs for the Resizing method.

These questions came about as I was trying to find out why I seem to need to skip the normalization on a new single image that I feed into the model, in order to get a correct prediction.

If you have time, please visit my other post " Prediction after loading a saved model

I will look there for any thoughts you may have.

Thanks agian

Hi @Nader_Afshar ,

Sure, I checked other post which you mentioned above, it seems you got the solution to that. Still do you need any help there?


Thank you again. I am all set now.

1 Like