How do I need to choose the dimensions for my CNN?

When I run the following code:

cnn5 = Sequential()

# input layer
cnn5.add(Conv2D(32, kernel_size=(21,21), strides=(1,1), padding='same', activation='relu', input_shape=(256,256,1)))

# convolutional layer
cnn5.add(Conv2D(64, kernel_size=(15,15), strides=(1,1), padding='same', activation='relu'))
cnn5.add(MaxPool2D(pool_size=(2,2)))

cnn5.add(Conv2D(128, kernel_size=(9,9), strides=(1,1), padding='same', activation='relu'))
cnn5.add(MaxPool2D(pool_size=(2,2)))

# add another layer
cnn5.add(Conv2D(256, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
cnn5.add(MaxPool2D(pool_size=(2,2)))

# flatten output of conv
cnn5.add(Flatten())

# dense connected layers
cnn5.add(Dense(1000))#, activation='relu'))

# output layer
cnn5.add(Dense(10, activation='softmax'))

#Model compiling and fitting
cnn5.compile(optimizer='adam', \
                  loss='categorical_crossentropy', \
                  metrics=['accuracy'])
run_cnn5 = cnn5.fit(x_train, y_train_onehot, epochs=20, validation_data=(x_test, y_test_onehot))

I get an accuracy of 9% after the first epoch and an accuracy of 6.5% after all the other 19 epochs without any improvement … The images I use for training are Spectrogramms created by this audio dataset (lewtun/music_genres_small · Datasets at Hugging Face).

How can one determine what kind of an issue a CNN has if the accuracy does not even change by a tiny amount after the first epoch? What could be the issue here? Changing kernel sizes or adding / removing layers or channels does not change anything …

Hi @Laulito, Apart from model architecture the accuracy also depends upon the pre processing of the data.If possible could you please share the pre processing steps that you have followed. Thank You.

@Laulito

In your case, you’re working with spectrogram images from an audio dataset.Try doing on of these:

  1. Spectrograms are 2D representations of audio signals over time and frequency. Check the dimensions of your spectrogram images to determine the number of time steps and frequency bins.
  2. The input_shape parameter in the first layer of your CNN (Conv2D layer) should match the dimensions of your spectrogram images. For example, if your spectrograms are 256x256 pixels, use input_shape=(256, 256, 1).
  3. If your spectrograms are grayscale, use input_shape=(height, width, 1). If they are in color (RGB), use input_shape=(height, width, 3).
  4. Normalize pixel values to be within the range [0, 1]. You can do this by dividing the pixel values by the maximum value (e.g., 255).

Here’s an example of how you might modify your code to handle input dimensions:

# Assuming spectrogram dimensions are 256x256
input_shape = (256, 256, 1)  # Adjust dimensions based on your data

cnn5 = Sequential()
cnn5.add(Conv2D(32, kernel_size=(21, 21), strides=(1, 1), padding='same', activation='relu', input_shape=input_shape))
# ... (rest of your model)

Ensure that the input_shape matches the dimensions of your spectrogram images. If your data has a different structure, adjust the input dimensions accordingly.

Let me know if this helped