Skip to main content
aegean.ai home page
Book
Search...
⌘K
GitHub
LinkedIn
Search...
Navigation
Vision-Language Models
Multimodal Reasoning
Introduction
Foundations
Neural Networks
Perception
World Models
LLMs
Reasoning
VLMs
Planning
MDPs
RL
VLA Agents
Annexes
Vision-Language Models
Overview
VLM Overview
Index
LLaVA
BLIP-2
On this page
Topics
Vision-Language Models
Multimodal Reasoning
Vision-language models and multimodal AI systems.
This chapter covers multimodal AI systems that combine vision and language understanding.
Topics
VLM Overview
Introduction to vision-language models.
CLIP
Contrastive Language-Image Pre-training.
LLaVA
Large Language and Vision Assistant.
BLIP-2
Bootstrapping Language-Image Pre-training.
Edit this page on GitHub
or
file an issue
.
Connect these docs
to Claude, VSCode, and more via MCP for real-time answers.
Visual Language Models
Next
⌘I