Please explain why the following backpropagation calculations are done

TheLongNight · August 30, 2023, 1:20pm

Hi,

I have been teaching myself the mathematics of backpropagation and been using keras to check my results for errors.

Using a simple modal with input layer with 1 input, dense with 1 neuron (d1), dense with 1 neuron (d2), and output with 1 neuron (o1), I expect the following calculation to have been performed:

(error derivative) * (o1 activation derivative) * (d2 output value) * (d2 activation derivative) * (d1 output value) * (d1 activation derivative) * (input value)

instead, the result comes from the following calculation.
(error derivative) * (o1 activation derivative) * (d1 output value) * (d2 activation derivative) * (d1 activation derivative)

No matter how many layers are added in a row, only the final output layer is being calculated with the output value from the previous neuron connected to it. Why?

Having understood that happens and adjusting for it, I moved on to having more neurons in layers with the following modal:

input layer with 3 inputs, dense with 1 neuron (d1), dense with 4 neurons (d2), and output with 2 neurons (o1), I expect the following calculations to have been performed:

foreach output neuron:
gradient = (error derivative) * (o1 activation derivative) * (d2 output value) foreach output neuron]

foreach d2 neuron:
gradient = (sum of gradient from each output neuron) * (d2 activation derivative)

finally
(sum of gradient from each d2 neuron) * (d1 activation derivative)

Instead, in the final calculation I am seeing:

((sum of gradient from each d2 neuron) / (number of input neurons)) * (d1 activation derivative)

Why is there division when calculating the gradients for weights from input layer?

note: batch size = 1, epochs = 1

Renu_Patel · October 26, 2023, 11:32am

Hi @TheLongNight

Welcome to the TensorFlow Forum!

Could you please share the reproducible code to replicate the error because the given description is difficult to understand the issue.

Also, Please have a look at this TensorFlow AutoDiff doc for the reference which might be helpful. Thank you.