Large MAPE gap between LSTM Optimizers

I used MAPE measurement to compare Adam, Adagrad and SGD. There is a large gap between the results, in stacked LSTM with network size 256, as following: MAPE Adam: 4.43 MAPE Adagrad: 86.30 MAPE SGD: 70.07

Is it normal to have such large difference?

Hi @Narsis, Generally, the adam, adagrad, sgd optimizers work differently to update the model parameters.

  • SGD( Stochastic Gradient Descent): adjust each model parameter individually based on its own learning rate.

  • Adagrad : It is an adaptive learning rate optimization algorithm that adjusts the learning rate for each parameter based on the historical gradients. It is designed to give larger updates to parameters with less frequent updates and smaller updates to frequently updated parameters.

  • Adam : It updates the model parameters by adjusting the learning rate based on both magnitude and direction of past updates.

As the different optimizers update model parameters differently resulting in the model to give different prediction values. This might be the reason why you are getting the Mean absolute percentage error differently.

Also whenever we define the neural network initially random weights are assigned to the model, and different optimization algorithms might converge to different local minima based on their random initialization.

Choosing the right optimization depends on your specific problem, dataset, and model architecture etc

Thank You.

Thank you very much for your clear response.