Discrepency in the number of parameters

Stochastic Depth [1] is a training-time technique to drop outputs of a layer.

Here (Colab Notebook) I’m using Stochastic Depth which should not cause the number of model parameters to drop but unfortunately, it seems to be dropping it. Otherwise, there’s something wrong with my understanding and I appreciate any help.


[1] https://arxiv.org/abs/1603.09382

The Stochastic Depth implementation comes from here:

Which was originally referred from timm's implementation. I verified the above-mentioned post with and without Stochastic Depth (stochastic_depth_rate=0.1 and stochastic_depth_rate=0.0 respectively). The model parameters were exactly the same as they should be.

Ccing @Luke_Wood

Did some sanity checking with the layer with other simpler components to see if the number of parameters varies. Even if drop rate is set to a non-zero floating-point number ([0, 1] range) the number of parameters stays the same with and without it. It likely confirms there’s something else that’s wrong:

Sorry for the flood of replies.

I figured that I made a mistake in the Transformer block. Specifically, before doing the final residual connection, I was assigning the wrong variable. Fixing that helped me resolve the issue.

1 Like