CLIP

CLIP (Contrastive Language-Image Pretraining) maps text and images into a shared representation space for similarity and retrieval. It powers capabilities such as Find Similar Designs and works well with Vision Transformer (ViT) style architectures.

Related terms

Overflow

Layout

What happens when content exceeds its container's boundaries—it can be visible, hidden, scrollable, or clipped. Overflow settings affect scrolling behavior and whether content bleeds outside containers. Hidden overflow is useful for clipping decorative elements and creating scroll containers.

Embeddings

Embeddings are numerical vectors that capture semantic meaning, enabling similarity search, clustering, and retrieval workflows.

Related AI terms: CLIP and Textual Inversion.

Vision Transformer (ViT)

Vision Transformer (ViT) applies transformer attention mechanisms to image patches for classification and representation learning. It is widely used in multimodal stacks with CLIP and in segmentation systems like Segment Anything Model (SAM).

Academy