Is AdamW identical to Adam in Keras?

python - AdamW and Adam with weight decay - Stack Overflow

According to the above StackOverflow discussion, Adam and AdamW are identical in Keras (except for the default value of weight_decay). I have checked the update_step method of these classes and they are indeed identical, whereas the pseudocode provided in the above link shows that they should be different.

Is this a Keras bug? And can anyone tell me whether the Keras implementation of these classes is for Adam or AdamW?

Jon.

Hi @Jonathan_Ford ,

As per my understanding ,The Keras implementation of AdamW is not a bug, but it is different from the reference implementation described in the paper “Decoupled Weight Decay Regularization”.

The main difference is that the Keras implementation of AdamW applies weight decay after the parameter update, while the reference implementation applies weight decay before the parameter update, which they show leads to better performance in practice.

Whether or not the Keras implementation of Adam is a bug is a matter of opinion. Some people might argue that it is a bug because it does not implement the AdamW algorithm correctly. Others might argue that it is not a bug because it is still a valid implementation of the Adam algorithm, even if it does not include the AdamW weight decay term.

Please let me know if you have any pointers for above explanation.

Thanks.

2 Likes