Is it normal to have zero performance gains when processing grayscale?

I’m processing images that actually have a bit-depth of 1. They aren’t just grayscale, they’re black and white. I have a couple of questions that I haven’t been able to find an answer for, but they mostly revolve around “why aren’t I getting a performance improvement over RGB”?

I would expect shifting from RGB to grayscale to generate an improvement in performance by reducing the input set. Is RGB just automatically converted to grayscale, or did I miss something in the declaration of my Conv2d?

Is there a convenient way in Keras to combine clusters of grayscale bits into integer or fp values, or is that something I’d have to use the lower level interface for?

If there is a place where I could derive these answers, please let me know. My Google skills have failed me here.


Can you please make clear of what performance gains you are looking for?

Is it in terms of speed or accuracy?

Ideally, the RGB images have more information about the image and expected to give better accuracy but will have slow down in terms of training since we are dealing with x 3 values.

Therefore, the trade of is to use grayscale values which come with less information but give significant increase in speed.

If the problem being solved does not depend much on colour information, the grayscale images should be enough else you might have to deal with RGB images( For example, the CV models trained on imagenet dataset are trained on 224x224x3 images).

Thank you!

Thanks for responding, Chunduriv. Any input on what I’m doing wrong is welcome.

I’m expecting improvements in speed. Right now the training is taking 15 minutes per epoch regardless of whether I pass it in as RGB or grayscale. Pretty sure I’m doing something wrong.

The images are actually signatures, so the pixels are binary “did the pen touch the paper” values. Signatures are either “good” or “bad,” so two classes. I receive them in RGB PNG, but all three channels are identical, and either 0 or 255.

I translated the corpus translated to grayscale PNG in a separate directory. Each image is 90x800.

The samples are in directories:


Here are the parts of the code that I think matter.
color_mode is either “grayscale” or “rgb”
input_shape is either (32, 90, 800, 1) or not included so it defaults

	train_ds, val_ds = tf.keras.utils.image_dataset_from_directory(
		data_dir, validation_split=0.2, subset="both",
		color_mode="grayscale", seed=123, image_size=(90, 800))

	train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
	val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

	model = Sequential([
		layers.Rescaling(1./255, input_shape=(90, 800, 1)),
		layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 90, 800, 1)),
		layers.Conv2D(32, 3, padding='same', activation='relu', input_shape=(32, 90, 800, 1)),
		layers.Conv2D(64, 3, padding='same', activation='relu', input_shape=(32, 90, 800, 1)),
		layers.Dense(128, activation='relu'),


The code is correct. One of the reason might be the image size, which is 90x800 resulting in 72,000 pixel values.

To improve speed, we need to decrease the image size and run it on GPU.

Keep decreasing the image size until there is no drop in accuracy and has an improvement in terms of speed.

Thank you!

Ok, things to do to reduce time:

  • Reduce image size
  • Run on GPU

Can you tell me why reducing bit depth doesn’t perform the first of those two? I’m still working on getting my Intel Mac to run on GPU. It keeps thinking the GPU has 0 memory available.


Reducing the bit depth should speed up the training. But I am curious to know the train set size as well. The cause of slow training will be usually depth of the model, training set size, image resolution and of course GPU cores.

Thank you!

The training set is 41k images in the “good” category and 2.7 k images in the “bad” category. I’m starting to think that the training is being done on the file names instead of the actual file contents. I’m having trouble stepping through it in the debugger due to the multi-threaded nature of TensorFlow. I’ll reply when I have something more concrete.