SmolVLA

SmolVLA is Hugging Face’s compact Vision-Language-Action model, built on top of the LeRobot library. It is designed to be small enough to fine-tune and deploy on a single consumer GPU, while remaining competitive with larger open VLAs on standard manipulation benchmarks.

Pointers

Paper: Shukor et al. (2025). SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Code and weights: huggingface/lerobot, the upstream library that hosts SmolVLA training, inference, and dataset utilities
LeRobot documentation: huggingface.co/docs/lerobot
Model hub: SmolVLA on Hugging Face

Edit this page on GitHub or file an issue.

OpenVLA

Sim-to-Real Transfer

Physical AI

Imitation Learning

World Models

VLA Models

Sim-to-Real Transfer

Pointers

​Pointers

Pointers