TY - JOUR
T1 - Hyperspherically Regularized Networks for Self-Supervision
AU - Durrant, Aiden
AU - Leontidis, Georgios
N1 - Acknowledgments
This work used the Cirrus UK National Tier-2 HPC Service at EPCC (http://www.cirrus.ac.uk). Access granted through the project: ec173 - Next gen self-supervised learn- ing systems for vision tasks.
Open access via Elsevier Agreement
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Bootstrap Your Own Latent (BYOL) introduced an approach to self-supervised learning avoiding the contrastive paradigm and subsequently removing the com- putational burden of negative sampling associated with such methods. However, we empirically find that the image representations produced under the BYOL’s self-distillation paradigm are poorly distributed in representation space compared to contrastive methods. This work empirically demonstrates that feature diver- sity enforced by contrastive losses is beneficial to image representation uniformity when employed in BYOL, and as such, provides greater inter-class representa- tion separability. Additionally, we explore and advocate the use of regularization methods, specifically the layer-wise minimization of hyperspherical energy (i.e. maximization of entropy) of network weights to encourage representation unifor- mity. We show that directly optimizing a measure of uniformity alongside the standard loss, or regularizing the networks of the BYOL architecture to minimize the hyperspherical energy of neurons can produce more uniformly distributed and therefore better performing representations for downstream tasks.
AB - Bootstrap Your Own Latent (BYOL) introduced an approach to self-supervised learning avoiding the contrastive paradigm and subsequently removing the com- putational burden of negative sampling associated with such methods. However, we empirically find that the image representations produced under the BYOL’s self-distillation paradigm are poorly distributed in representation space compared to contrastive methods. This work empirically demonstrates that feature diver- sity enforced by contrastive losses is beneficial to image representation uniformity when employed in BYOL, and as such, provides greater inter-class representa- tion separability. Additionally, we explore and advocate the use of regularization methods, specifically the layer-wise minimization of hyperspherical energy (i.e. maximization of entropy) of network weights to encourage representation unifor- mity. We show that directly optimizing a measure of uniformity alongside the standard loss, or regularizing the networks of the BYOL architecture to minimize the hyperspherical energy of neurons can produce more uniformly distributed and therefore better performing representations for downstream tasks.
KW - Self-supervised learning
KW - Representation learning
KW - Representation separability
KW - Image classification
U2 - 10.1016/j.imavis.2022.104494
DO - 10.1016/j.imavis.2022.104494
M3 - Article
VL - 124
JO - Image and Vision Computing
JF - Image and Vision Computing
SN - 0262-8856
M1 - 104494
ER -