I wanted to share my new work with Pin-Yu Chen (IBM Research) - “Vision Transformers are Robust Learners” .
For some time now, Transformers have taken the vision world by storm. In this work, we question the robustness aspects of Vision Transformers. Specifically, we investigate the question:
With the virtue of self-attention, can Vision Transformers provide improved robustness to common corruptions, perturbations, etc.? If so, why?
We build on top of existing works & investigate the robustness aspects of ViT. Through a series of six systematically designed experiments, we present analyses that provide both quantitative & qualitative indications to explain why ViTs are indeed more robust learners.