Skip to main content
Vision-language models extend LLM reasoning into the visual domain. The labs in this section apply the same RL post-training tools used for LLM reasoning (GRPO, PPO via TRL) to two VLM settings: browser-control with a 450M-parameter compact VLM, and visual reasoning with Qwen3-VL-2B-Instruct.

Labs

Visual GRPO with TRL (Qwen3-VL-2B)

GRPO fine-tuning of Qwen3-VL-2B-Instruct on a visual reasoning task using the TRL library.
The browser-control GRPO lab on LFM2-VL-450M now lives under Computer Using Agents, where the broader CUA topic (OpenCUA, OS-Atlas, OSWorld, Mind2Web) is introduced.