The cost function gets stuck at 120 epochs

I did a neural network in c++ to recognize handwritten digits using the MNIST dataset without any neural network pre-existing libraries. My network has 784 inputs neuron (the pixel of the image), 100 neuron in the single hidden layer and 10 output neurons. I also have 1 bias neuron per layer. My activation function is the sigmoid function (top get ouputs between 0 and 1). My cost function is the root mean squared error. i have a learning rate 0.1 and a gradient descent momentum of 0.3.

I get an accuracy of about 90% on the MNIST training set. However on the testing set I only get a 30% accuracy. I did check the error rate during the training and I saw that it decreases for the first 120 epochs. Then it gets stuck at 0.30.

That is a plot of the cost function during all 60 000 epochs of the training :

This is also the cost function during the training but only the first 600 epochs :
image

And this is a plot of the cost function during the validation set (10 000 epochs) :
image

First I suspected an overfitting problem, so I tried stopping the learning right after the plateau appeared (at 120 epochs). It was worse : in the testing set, I only got 10% accuracy. I also tried dropout regularization which didn’t work, so I don’t really know if it’s an overfitting related problem.

I tried using the ReLU activation function for the hidden layers instead of the sigmoid to avoid the vanishing gradient problem without success (I got down to 10% accuracy).

Then, I tried changing the learning rate, but the plateau still appeared at 120 epochs, as I thought it would at least delay the “plateau” apparition. I also tried disabling the gradient descent momentum without any success.

I thought my cost function was stuck in a local minimum (explaining the plateau at 0.3), so I tried decreasing the learning rate as the network WAS training (starting at 0.5 and decreasing it by a fraction of number of epochs), hoping it would make the cost function “unstuck” from that local minumum but it didn’t.

I tried changing the size of the hidden layer, even adding one more, but it didn’t change anything. There is still a “plateau” and it still starts at 120 epochs.

What can I do to solve this problem ?

Hi @kripi ,

It seems like you have tried several approaches to improving the performance of your neural network but haven’t achieved the desired results.Could you please try doing the Batch normalization, different network architectures (CNN for image classification)?

I hope this helps!

Thanks.