The Universal Normal Embedding

CVPR 2026

* Equal contribution

Abstract

Generative models and visual encoders are typically studied separately, yet both appear to rely on latent spaces with approximately Gaussian structure. We hypothesize a shared latent source, the Universal Normal Embedding (UNE), from which encoder embeddings and generative latents arise as noisy linear projections. Across diffusion and encoder spaces, we find aligned linear semantics for attribute prediction and editing, and show that simple geometric operations can mitigate spurious entanglements.

Poster

Hypotheses

Hypothesis 1: Universal Normal Embedding (UNE)

Universal Normal Embedding illustration

We hypothesize that the data domain is linked to a latent Gaussian space, the Universal Normal Embedding (UNE), through an invertible information-preserving mapping. In this space, semantic properties are simple, in the sense that they are linearly separable.

Hypothesis 2: Induced Normal Embeddings (INE)

Induced Normal Embeddings illustration

We hypothesize that encoder and generative-model latents are induced by the UNE: each latent code is approximately a noisy linear projection of the same underlying Gaussian variable. As a result, these latent spaces are themselves approximately Gaussian and inherit shared semantic structure.

Key Insights

Linear classification in diffusion latent space

Linear classification illustration

Diffusion latents support strong linear prediction of semantic attributes, indicating that meaningful structure is directly accessible in the latent space. This mirrors the linear separability long observed in encoder representations.

Linear editing via latent directions

Linear editing illustration

Semantic directions in the latent space enable faithful and controllable edits such as age, smile, and gender. This suggests that diffusion latents do not merely reconstruct images, but also organize semantics in a geometrically simple way.

Mitigating spurious correlations

Spurious correlations mitigation illustration

Simple orthogonalization between semantic directions reduces unwanted entanglement and mitigates spurious correlations. This improves interpretability and helps isolate cleaner attribute manipulations in the shared latent geometry.

BibTeX

@inproceedings{tasker2026universal,
  title     = {The Universal Normal Embedding},
  author    = {Tasker, Chen and Betser, Roy and Gofer, Eyal and Levi, Meir Yossef and Gilboa, Guy},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}