Default weight initialization

I’m wondering why all Keras layers use Glorot initialization as default. Since Relu is the most popular activation function, shouldn’t He be the default initialization?

The prebuilt application models such as ResNet50, also use Glorot initialization as default and there is no parameter to pass and modify it.

Are you talking about this:

Exactly! In my case I’m using the default ResNet50, trained from scratch and the network is training and converging. My inputs have an arbitrary number of channels that’s why I cannot use ImageNet weights. However, I’m wondering if initialization with He method would improve the results. I noticed a big difference in overfitting rom run to run depending on the initials weights from each run.

@markdaoust Do you know the history of this default?

Interesting, I wonder how they trained the VGG19 in keras.applications

Here it is in mid 2016:

It’s probably one of those things that got set at one point when it made sense and then got locked in by backwards compatibility guarantees.

Aside from updating the keras.applications to allow initializers as arguments. Annother possible solution would be for keras to implements a global “default_initializer” or something like that. Either one would take some work.

But if I remember correctly something similar didn’t pass in 2019:

didn’t pass in 2019:

Ah, so scratch that one. Thanks.

…Or as team changes in 2021 we could have a different evaluation.