Image classification with MobileViT

Sayak_Paul · October 23, 2021, 10:29am

Combining the benefits of convolutions (for spatial relationships) and transformers (for global relationships) is an emerging research trend in computer vision. In my latest example, I present the MobileViT architecture (Mehta et al.) that presents a simple yet unique way to reap benefits of the two.

With about a million parameters, it achieves a top-1 accuracy of ~86% on the tf_flowers dataset on 256x256 resolution. Furthermore, the training recipes are simple and the model runs efficiently on mobile devices (which is atypical for transformer-based models).

8bitmp3 · October 29, 2021, 6:12pm

That’s amazing! Great work as always