Hi folks,
I hope you are doing well. Since last year, the computer vision community experienced a boom in self-supervised pertaining methods for images. While most of these methods share common recipes (augmentation, projection head, LR schedules, etc.)
reconstruction-based pertaining methods differ from them and make the process simpler and more scalable. Such a method is Masked Autoencoding released a couple of days ago from FAIR. Today, we (@ariG23498 and myself) are happy to share a pure TensorFlow implementation of the method along with commentary and promising results:
Masked image modeling with Autoencoders.
It is along similar lines to BERT’s pretraining objective: masked language modeling.
Some advantages of this method:
- Does not rely upon sophisticated augmentation transforms
- Easy to implement (barring a few nuts and bolts)
- Quite faster pre-training
- Implicit handling of representation collapse
- On par with SoTA for self-supervision in the field of computer vision
I hope you folks will find the article useful and as always, we are happy to answer questions.