RoboMimic — A Study of Imitation Learning from Observations

RoboMimic is a comprehensive imitation learning benchmark dataset and framework developed by Stanford University, UT Austin, and NVIDIA. Released in 2021 under MIT license, it contains 50,000 demonstration episodes for Franka Panda and Sawyer robots across manipulation tasks including can picking, lift, square peg insertion, and transport. RoboMimic was the first large-scale systematic study of imitation learning algorithm performance across dataset quality levels — it introduced the concept of multi-quality datasets with proficient, mixed-quality, and machine-generated demonstrations to study how data quality affects algorithm performance. The dataset and framework are widely used as the standard benchmark for comparing imitation learning algorithms including BC, HBC, IRIS, TD3-BC, and diffusion-based methods. RoboMimic's rigorous experimental design and reproducible evaluation protocol make it the reference framework for algorithm development in robot imitation learning.

Dataset specifications
Year2021
Episodes50,000
EmbodimentsFranka Panda, Sawyer
Modalitiesrgb, proprioception
Task categoriesmanipulation, pick-and-place, dexterous-manipulation
Data formathdf5
LicenseMIT
Accessopen — commercial use permitted
MaintainerStanford University, UT Austin, NVIDIA
Origin countryUS

What is it?

RoboMimic is a comprehensive imitation learning benchmark dataset and framework developed by Stanford University, UT Austin, and NVIDIA. Released in 2021 under MIT license, it contains 50,000 demonstration episodes for Franka Panda and Sawyer robots across tasks including can picking, lift, square peg insertion, and transport. It was the first systematic study of imitation learning performance across dataset quality levels — introducing proficient, mixed-quality, and machine-generated splits.

Who is it for?

Robot learning researchers benchmarking imitation learning algorithms. Used by virtually all robotics researchers as a standard algorithm comparison framework. Teams developing new robot learning methods validate on RoboMimic before testing on larger real-robot datasets.

Key specifications

How it compares

A benchmark framework rather than a pretraining dataset. PushT serves a similar role for 2D tasks; RoboMimic is the standard for 3D manipulation benchmarking with multiple data quality levels that no other benchmark addresses systematically.

Limitations and access notes

Partially simulated and partially real-robot. Primary tasks are relatively simple compared to real-world deployment. MIT license permits unrestricted commercial use.

Linked professions

Frequently asked questions

What makes RoboMimic different from other manipulation datasets?

RoboMimic introduced multi-quality datasets — providing proficient-human, mixed-human, and machine-generated demonstration splits for the same tasks. This enables systematic study of how data quality affects imitation learning algorithm performance.

What is the difference between PH, MH, and MG splits?

PH (Proficient-Human) contains expert demonstrations. MH (Multi-Human) contains demonstrations from operators with varying skill levels. MG (Machine-Generated) contains demonstrations from a trained RL policy. Comparing across splits reveals algorithm sensitivity to data quality.

Can RoboMimic be used commercially?

Yes. RoboMimic is MIT licensed, permitting unrestricted commercial use.

How do I access RoboMimic?

RoboMimic is available on Hugging Face and via the GitHub repository github.com/ARISE-Initiative/robomimic. No registration is required.

Which algorithms have been benchmarked on RoboMimic?

RoboMimic has benchmarked BC, HBC, IRIS, TD3-BC, IQL, and diffusion-based methods including Diffusion Policy. It is the standard comparison table cited in most robot imitation learning papers.