Multimodal Reasoning - aegean.ai

Topics

This chapter covers multimodal AI systems that combine vision and language understanding.

Topics

VLM Overview

Introduction to vision-language models.

CLIP

Contrastive Language-Image Pre-training.

LLaVA

Large Language and Vision Assistant.

BLIP-2

Bootstrapping Language-Image Pre-training.

Edit this page on GitHub or file an issue.

Visual Language Models