Masked Autoencoders for self-supervised pretraining of images

Hi folks,

I hope you are doing well. Since last year, the computer vision community experienced a boom in self-supervised pertaining methods for images. While most of these methods share common recipes (augmentation, projection head, LR schedules, etc.)

reconstruction-based pertaining methods differ from them and make the process simpler and more scalable. Such a method is Masked Autoencoding released a couple of days ago from FAIR. Today, we (@ariG23498 and myself) are happy to share a pure TensorFlow implementation of the method along with commentary and promising results:

Masked image modeling with Autoencoders.

It is along similar lines to BERT’s pretraining objective: masked language modeling.

Some advantages of this method:

  • Does not rely upon sophisticated augmentation transforms
  • Easy to implement (barring a few nuts and bolts)
  • Quite faster pre-training
  • Implicit handling of representation collapse
  • On par with SoTA for self-supervision in the field of computer vision

I hope you folks will find the article useful and as always, we are happy to answer questions.


Great work on this Sayak, this will definitely be useful for many. :pray:t2::raised_hands:t2::pray:t2::raised_hands:t2:

Small thing, but the keras page is future dated. But this is so cool it might as well be in the future :sunglasses: