Skip to main content
This section is under construction.
Vision-language models extend LLM reasoning into the visual domain, enabling agents to interpret images, ground language in visual context, and reason across modalities.

Topics

Visual Chain-of-Thought

Applying chain-of-thought prompting to VLMs for multi-step visual reasoning tasks.

Grounded Tool Use

Combining visual grounding with tool-calling to enable VLMs to interact with structured environments.