Largest vision checkpoint in TensorFlow via 🤗. Transformers

Sayak_Paul · July 1, 2022, 2:28am

The largest vision model checkpoint (public) in TensorFlow (10 Billion params) just landed through Hugging Face transformers.

The underlying model is RegNet, known for its ability to scale. A 100GB RAM machine should be good enough to load the checkpoint. Glad to have contributed the model in TF with @ariG23498.

Efficient sharding of model checkpoints is what makes this possible under the hood. Shoutout to the Hugging Face team for enabling this.

Check out the model documentation here: RegNet.

This doesn’t require any code change. You’d load the model in the same way and everything else will be taken care of. Now that’s a monumental moment for TF folks that are into vision, isn’t it?

innat · July 1, 2022, 4:27am

Is this model added in the stable release or currently in the nightly package? I tried to load it in version transformer == 4.20.1 but faced ImportError.

Apart from that, do you happen to know if it’s possible to extract features from multiple branches of the model? Something like

feature_extractor = ViTFeatureExtractor.from_pretrained("..")
features_list = [layer.output for layer in feature_extractor.layers]
new_feature_extractor = Model(input, feature_list)

Sayak_Paul · July 1, 2022, 4:45am

It was merged yesterday so you need to install it from source:

pip install git+https://github.com/huggingface/transformers

This is currently a limitation for the TF models in transformers. Since the blocks are coded as keras.layers.Layer you won’t get much with the .layers attribute. And since the models don’t use the functional or the sequential API (they are implemented by overriding the call() function) you don’t get the acyclic graph.

The TF models are coded this way because otherwise the cross-loading automation (from_pretrained()) won’t work. This is a restrictive factor.

Cc: @merve

innat · July 1, 2022, 5:59am

So, this is kind of similar to tf-hub models? It’s fine though until it’s needed feature from the intermediate layer.

Sayak_Paul · July 1, 2022, 6:15am

So, this is kind of similar with tf-hub models?

Majority of the Hub models yes. But the recent ones (Swin, DeiT, ConvNeXt, etc.) I contributed can be fully expanded.

innat · July 1, 2022, 5:32pm

Okay, first thanks for the fantastic contribution. I did a quick test to see if we could extract features from intermediate layers with recent additions (swin, diet, connect) but faced a few issues. Could you please check, HERE? ( Or, I might have missed something, this could be the known issue with current tf-hub models.)

Sayak_Paul · July 2, 2022, 3:03am

These models were coded by using a combination of custom layers, custom blocks (via the functional API), and by overriding the call() function so as to concatenate the CLS token to the projected embeddings of the image patches and support layer scaling. This means the models weren’t purely written using the functional or the sequential API. This again restricts you to develop multi-feature branching models.

To mitigate some of the pain, I decided to return the attention scores from each of the attention blocks (except for ConvNeXt). But I agree that it does not fair well with the other Keras models supporting multi-feature branching.

keras.applications.ConvNeXt* (available via tf-nightly) should be able to support this but with caution. Since the stages inside ConvNeXt have nested structures one needs to be careful about how to take out a sub-graph from that structure.

So, to allow users some more flexibility and transparency I kept my model code, conversion code, and evaluation code all open and I think the respective TF Hub model pages mention the respective code repositories.

Sorry, I couldn’t be of more help.