Skip to main content
This chapter covers multimodal AI systems that combine vision and language understanding.

Topics

VLM Overview

Introduction to vision-language models.

CLIP

Contrastive Language-Image Pre-training.

LLaVA

Large Language and Vision Assistant.

BLIP-2

Bootstrapping Language-Image Pre-training.