SOP: A Scalable Online Post-Training System for Vision-Language-Action Models
Abstract
A scalable online post-training system enables real-world robot policy adaptation through distributed, multi-task learning that maintains generality while improving task proficiency.
Vision-language-action (VLA) models achieve strong generalization through large-scale pre-training, but real-world deployment requires expert-level task proficiency in addition to broad generality. Existing post-training approaches for VLA models are typically offline, single-robot, or task-specific, limiting effective on-policy adaptation and scalable learning from real-world interaction. We introduce a Scalable Online Post-training (SOP) system that enables online, distributed, multi-task post-training of generalist VLA models directly in the physical world. SOP tightly couples execution and learning through a closed-loop architecture in which a fleet of robots continuously streams on-policy experience and human intervention signals to a centralized cloud learner, and asynchronously receives updated policies. This design supports prompt on-policy correction, scales experience collection through parallel deployment, and preserves generality during adaptation. SOP is agnostic to the choice of post-training algorithm; we instantiate it with both interactive imitation learning (HG-DAgger) and reinforcement learning (RECAP). Across a range of real-world manipulation tasks including cloth folding, box assembly, and grocery restocking, we show that SOP substantially improves the performance of large pretrained VLA models while maintaining a single shared policy across tasks. Effective post-training can be achieved within hours of real-world interaction, and performance scales near-linearly with the number of robots in the fleet. These results suggest that tightly coupling online learning with fleet-scale deployment is instrumental to enabling efficient, reliable, and scalable post-training of generalist robot policies in the physical world.
Community
๐ Website: https://www.agibot.com/research/sop
We introduce SOP for online post-training of generalist VLAs in the real world โ
unlocking persistent, reliable deployment of generalist robots in physical environments.
๐ 36 hours of continuous cloth folding: video
๐ฆ 36 hours of continuous cardboard box assembly: video
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- $\pi^{*}_{0.6}$: a VLA That Learns From Experience (2025)
- EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models (2025)
- ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval (2025)
- MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent (2025)
- InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy (2025)
- STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models (2025)
- See Once, Then Act: Vision-Language-Action Model with Task Learning from One-Shot Video Demonstrations (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper