Clarity on weight_decay_factor

dhruvil_karani · July 19, 2022, 11:18am

I was looking at the documentation of optimizers for TPUs. (SGD optimizer

One of the parameters is weight_decay_factor. According to the docs it is

amount of weight decay to apply; None means that the weights are not decayed. Weights are decayed by multiplying the weight by this factor each step.

So if the value of the factor is 0.3, are the weights (w) updated as follows?

w = (1-0.3)*w

I want to understand how to set the value for this parameter. What are some standard ranges?

Thank you!

chunduriv · February 24, 2023, 11:10am

Welcome to the Tensorflow Forum!

So if the value of the factor is 0.3, are the weights (w) updated as follows?

w = (1-0.3)*w

Yes. The standard ranges for the weight_decay_factor typically vary based on the dataset and the specific use case.

In general, it is a hyper parameter that is set through experimentation and tuning. It is common to try values ranging from 1e-4 to 1e-2.

Thank you!