Planning from Observation and Interaction

Main

Summary

Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, our approach demonstrates significantly greater sample efficiency and success rates. This enables a practical path forward for world modeling and planning from online observation and interaction.

Sample Efficiency

← Use the left sidebar to switch between tasks

Demonstrations

All demonstrations used in training

Training Time-lapse

0
0

Comparison with SOTA

MPAIL2

Full Training Video

MPAIL2[-P]

Full Training Video

Transfer Learning

← Use the left sidebar to switch between tasks

Demonstrations

All demonstrations used in initial training

Training Time-lapse

0
0

Transferred

Full Training Video

From Scratch

Full Training Video

Generalization

(Trained with only Yellow Cube in scene)

Novel Cube 1
Novel Cube 2
Multi-Cube Scene 1
Multi-Cube Scene 2

Minimal Observations

(Demonstrations contain only the table camera, with no proprioceptive/Wrist-mounted camera from the robot in demonstrations)

Achieved in ~500 iterations. The ArUco marker on the cube is used solely for trajectory recording, not as part of the observation.