RL-based reasoning for vision-language models — GRPO/PPO fine-tuning labs on browser control and visual math.
Vision-language models extend LLM reasoning into the visual domain. The labs in this section apply the same RL post-training tools used for LLM reasoning (GRPO, PPO via TRL) to two VLM settings: browser-control with a 450M-parameter compact VLM, and visual reasoning with Qwen3-VL-2B-Instruct.
GRPO fine-tuning of Qwen3-VL-2B-Instruct on a visual reasoning task using the TRL library.
The browser-control GRPO lab on LFM2-VL-450M now lives under Computer Using Agents, where the broader CUA topic (OpenCUA, OS-Atlas, OSWorld, Mind2Web) is introduced.