InfoNCE Induces Gaussian Distribution

ICLR 2026, Oral

* Equal contribution

Abstract

Contrastive learning has become a cornerstone of modern representation learning, enabling scalable training on massive unlabeled data. We show that the population InfoNCE objective induces asymptotically Gaussian structure in representations that emerge from contrastive training, and establish this through two complementary analytical routes. We support the analysis with experiments on synthetic data and CIFAR-10, demonstrating consistent Gaussian behavior across settings and architectures.

Poster

Key Insights

Route 1: Empirical idealization

Route 1 diagram

Assuming (i) an alignment plateau and (ii) thin-shell norm concentration, the population objective reduces to spherical uniformity. By the spherical CLT, fixed-dimensional projections become Gaussian as dimension grows.

Route 2: Regularized objective

Regularized route diagram

With a vanishing convex regularizer promoting low norm and high entropy, the isotropic solution is prioritized at the population level. This yields the same asymptotic Gaussian structure without relying on training dynamics and with a much milder assumption.

Synthetic experiments

Synthetic results

Across Laplace, Gaussian-mixture, and discrete sparse-binary inputs, InfoNCE training drives norm concentration and coordinate-wise normality. The binary setting rules out latent Gaussian recovery, supporting Gaussianity as an induced effect.

CIFAR-10: supervised vs contrastive (ResNet-18)

CIFAR comparison

With the same ResNet-18 architecture, contrastive training yields near-Gaussian coordinates and tighter norm concentration, while supervised training shows substantial deviations from Gaussianity.

BibTeX

@inproceedings{betser2026infonce,
  title     = {InfoNCE Induces Gaussian Distribution},
  author    = {Betser, Roy and Gofer, Eyal and Levi, Meir Yossef and Gilboa, Guy},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}