Online world modeling
enables real-world
Inverse Reinforcement Learning from Observation

meaning... NO

rewards
action supervision
pre-training
play data
failure examples
prior models
interventions
simulation

Only 15 observation-only demonstrations and < 40 minutes of real-world training from scratch

Training 100% ←→

Citation

If you find this work useful, please cite:

@article{han2026mpail2,
  title   = {Online World Modeling Enables Real-World Inverse Reinforcement Learning from Observation},
  author  = {Han, Tyler and Nemekhbold, Bat and Shen, Siyang and Baijal, Rohan and
             Ebock, Richard and Ravichandiran, Harine and Jung, Sanghun and
             Huang, Kevin and Boots, Byron},
  year    = {2026},
}

Tyler Hanthan123
Bat Nemekhboldbxtbold
Siyang Shenandyshen
Rohan Baijalrbaijal
Richard Ebockebockr
Harine Ravichandiranharine
Sanghun Jungshjung
Kevin Huangkehuang
Byron Bootsbboots