[Google Research 🎉] Self-supervised learning - Compressed SimCLR / BYOL with Conditional Entropy Bottleneck (with TensorFlow code)

8bitmp3 · December 9, 2021, 11:30pm

New self-supervised methods—Compressed SimCLR and Compressed BYOL with Conditional Entropy Bottleneck—for learning effective and robust visual representations, which enable learning visual classifiers with limited data.

arXiv: Compressive Visual Representations (Lee et al., 2021) (Google Research)

Learning effective visual representations that generalize well without human supervision is a fundamental problem in order to apply Machine Learning to a wide variety of tasks. Recently, two families of self-supervised methods, contrastive learning and latent bootstrapping, exemplified by SimCLR and BYOL respectively, have made significant progress. In this work, we hypothesize that adding explicit information compression to these algorithms yields better and more robust representations. We verify this by developing SimCLR and BYOL formulations compatible with the Conditional Entropy Bottleneck (CEB) objective, allowing us to both measure and control the amount of compression in the learned representation, and observe their impact on downstream tasks. Furthermore, we explore the relationship between Lipschitz continuity and compression, showing a tractable lower bound on the Lipschitz constant of the encoders we learn. As Lipschitz continuity is closely related to robustness, this provides a new explanation for why compressed models are more robust. Our experiments confirm that adding compression to SimCLR and BYOL significantly improves linear evaluation accuracies and model robustness across a wide range of domain shifts. In particular, the compressed version of BYOL achieves 76.0% Top-1 linear evaluation accuracy on ImageNet with ResNet-50, and 78.8% with ResNet-50 2x.1

Recent contrastive approaches to self-supervised visual representation learning aim to learn representations that maximally capture the mutual information between two transformed views of an image… The primary idea of these approaches is that this mutual information corresponds to a general shared context that is invariant to various transformations of the input, and it is assumed that such invariant features will be effective for various downstream higher-level tasks. However, although existing contrastive approaches maximize mutual information between augmented views of the same input, they do not necessarily compress away the irrelevant information from these views… retaining irrelevant information often leads to less stable representations and to failures in robustness and generalization, hampering the efficacy of the learned representations. An alternative state-of-the-art self-supervised learning approach is BYOL [30], which uses a slow-moving average network to learn consistent, view-invariant representations of the inputs. However, it also does not explicitly capture relevant compression in its objective.

In this work, we modify SimCLR [12], a state-of-the-art contrastive representation method, by adding information compression using the Conditional Entropy Bottleneck (CEB) [27]. Similarly, we show how BYOL [30] representations can also be compressed using CEB. By using CEB we are able to measure and control the amount of information compression in the learned representation [26], and observe its impact on downstream tasks. We empirically demonstrate that our compressive variants of SimCLR and BYOL, which we name C-SimCLR and C-BYOL, significantly improve accuracy and robustness to domain shifts across a number of scenarios.

Code: GitHub - google-research/compressive-visual-representations: Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Related work:

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations (Chen et al., 2020) (Google Research, Brain Team)
- SimCLRv2: Big Self-Supervised Models are Strong Semi-Supervised Learners (Chen et al., 2020) (Google Research, Brain Team)
- GitHub with TF 2 implementation
BYOL: Bootstrap your own latent: A new approach to self-supervised Learning (Grill et al., 2020) (DeepMind/Imperial College)

8bitmp3 · December 10, 2021, 12:25am

Interested in learning about self-supervised methods? Here are some resources:

Some code examples and other posts made by the ML community members:

Keras: Self-supervised contrastive learning with SimSiam (by @Sayak_Paul)
GitHub - sayakpaul/SimCLR-in-TensorFlow-2: (Minimally) implements SimCLR (https://arxiv.org/abs/2002.05709) in TensorFlow 2. (by @Sayak_Paul)
GitHub - ayulockin/SwAV-TF: TensorFlow implementation of "Unsupervised Learning of Visual Features by Contrasting Cluster Assignments". (by ayulockin and @Sayak_Paul)
GitHub - sayakpaul/SimSiam-TF: Minimal implementation of SimSiam (https://arxiv.org/abs/2011.10566) in TensorFlow 2. (by @Sayak_Paul)
Lilian Weng’s blog post: Self Supervised Learning (2019)
Lilian Weng’s blog post: Contrastive Representation Learning (2021)

Sayak_Paul · December 10, 2021, 2:57am

Thanks for sharing. If the sole purpose is to compress a bigger self-supervised model and have it perform well under limited supervised data, I think SimCLRV2 is by far the simplest approach.

Sayak_Paul · December 10, 2021, 3:03am

Thanks for sharing the links. Adding some of my own favorites and others:

NeurIPS tutorial on self-supervision by Lilian Weng and Jong Wook Kim: Self-Supervised Learning: Self-Prediction and Contrastive Learning (slides)
A better minimal implementation of SimCLR with lots of cool stuff: Semi-supervised image classification using contrastive pretraining with SimCLR
SimSiam blog: Self-supervised contrastive learning with SimSiam (originally from FAIR)
Masked Image Modeling with Autoencoders (arguably the simplest one): Masked image modeling with Autoencoders (with @ariG23498, originally from FAIR)
Interview with Ishan Misra: #55 Dr. ISHAN MISRA - Self-Supervised Vision Models - YouTube.

Using self-supervision in a supervised setting: