Swin Transformers in TensorFlow

Sayak_Paul · May 17, 2022, 5:06pm

Original vision transformers don’t provide multi-scale hierarchical representations of images. While this may not be a problem for tasks like image classification but dense prediction tasks like object detection, segmentation, etc. benefit from multi-scale representations.

Swin transformers introduce a sense of hierarchies by operating on windows of patches and then shifting the windows to induce connection between the windows. It uses a variant of multi-head attention with a linear complexity w.r.t the input image size. This makes Swin Transformers a better backbone for object detection, segmentation, etc. that require high-resolution inputs and outputs.

In my latest project, I implement 13 different variants of Swin Transformers in TensorFlow and port the original pre-trained parameters into them. Code, pre-trained TF models, notebooks, etc. are available here:

The project comes with TF implementations of window self-attention, and shifted-window self-attention that introduce linear time complexity in Swin Transformers allowing them to scale to larger image resolutions.

The ported models have been verified to ensure they match the reported performance:

innat · May 18, 2022, 7:51am

@Sayak_Paul
Great work. Could you please take a loot at This request? It’s welcomed.

github.com/keras-team/keras

Add Swin-Transformer to keras.applications

opened 10:51AM - 25 Oct 21 UTC

innat

type:feature stat:awaiting keras-team

If you open a GitHub issue, here is our policy: It must be a bug, a feature r…equest, or a significant problem with the documentation (for small docs fixes please send a PR instead). The form below must be filled out. **Here's why we have that policy:**. Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow. **System information**. TensorFlow version (you are using): 2.6 Are you willing to contribute it (Yes/No): Yes **Describe the feature and the current behavior/state**. Describe the feature clearly here. Be sure to convey here why the requested feature is needed. Any brief description of the use-case would help. Paper: [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) Original Code: https://github.com/microsoft/Swin-Transformer?utm_source=catalyzex.com It's a variant of the transformer model and achieves state-of-the-art performance or comparable performance with the best CNN-based models. It also contains enough citations (~250 at this moment) for addition to the package. On ImageNet-1K and 22K, below is the comparable results with EfficientNet (CNN) models. |- | Img Size |Top 1K acc | - | Img Size |Top 1K acc | Top 22K acc | |---| ---| ---|---|---|---|---| |E3 | 300 | 81.6 | EfficientNetV2-S | - | 83.9 | 84.9 | |E5 | 456 | 83.6 | EfficientNetV2-M | - | 85.1 | 86.2 | |E7 | 600 | 84.3 | EfficientNetV2-L | - | 85.7 | 86.8 | |- | - | - | EfficientNetV2-XL | - | - | 87.3 | |Swin-T | 224 | 81.3 | Swin-B | 224 | - | 85.2 | |Swin-S | 224 | 83.0 | Swin-B | 384 | - | 86.4 | |Swin-B | 224 | 83.5 | Swin-L | 384 | - | 87.3 | |Swin-B | 384 | 84.5 | - | - | - | - | **Will this change the current api? How?** Yes. It will change as follows ```python tensorflow.keras.applications.SwinT tensorflow.keras.applications.SwinS tensorflow.keras.applications.SwinB tensorflow.keras.applications.SwinL ``` **Who will benefit from this feature?** Keras users. **[Contributing](https://github.com/keras-team/keras/blob/master/CONTRIBUTING.md)** - Do you want to contribute a PR? (yes/no): yes. - If yes, please read [this page](https://github.com/keras-team/keras/blob/master/CONTRIBUTING.md) for instructions - Briefly describe your candidate solution(if contributing):

Sayak_Paul · July 1, 2022, 2:25am

KerasCV in the future I guess