
Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, our approach demonstrates significantly greater sample efficiency and success rates. This enables a practical path forward for world modeling and planning from online observation and interaction.
← Use the left sidebar to switch between tasks
Comparison with SOTA
← Use the left sidebar to switch between tasks
(Trained with only Yellow Cube in scene)
(Demonstrations contain only the table camera, with no proprioceptive/Wrist-mounted camera from the robot in demonstrations)