Masked Autoencoders for self-supervised pretraining of images

Sayak_Paul · November 24, 2021, 3:49pm

Hi folks,

I hope you are doing well. Since last year, the computer vision community experienced a boom in self-supervised pertaining methods for images. While most of these methods share common recipes (augmentation, projection head, LR schedules, etc.)

reconstruction-based pertaining methods differ from them and make the process simpler and more scalable. Such a method is Masked Autoencoding released a couple of days ago from FAIR. Today, we (@ariG23498 and myself) are happy to share a pure TensorFlow implementation of the method along with commentary and promising results:

Masked image modeling with Autoencoders.

It is along similar lines to BERT’s pretraining objective: masked language modeling.

Some advantages of this method:

Does not rely upon sophisticated augmentation transforms
Easy to implement (barring a few nuts and bolts)
Quite faster pre-training
Implicit handling of representation collapse
On par with SoTA for self-supervision in the field of computer vision

I hope you folks will find the article useful and as always, we are happy to answer questions.

Jaco_Verster · December 2, 2021, 5:09am

Great work on this Sayak, this will definitely be useful for many.

Small thing, but the keras page is future dated. But this is so cool it might as well be in the future