I am going through the code of Transformer model - [here] .
I noticed that in the call method of Decoder class the input encoding is multiplied by the square root of d_model. There is no explanation given for this step. Can someone please explain why this is done in the Decoder class.