Open X-Embodiment — Robotic Learning Datasets
Open X-Embodiment is the largest cross-embodiment robot learning dataset ever assembled, developed by Google DeepMind in collaboration with 33 research institutions worldwide. Released in 2023 under the Apache 2.0 license for unrestricted commercial use, it aggregates over one million robot demonstration episodes spanning 22 different robot embodiments across dozens of task types including manipulation, pick-and-place, navigation, tool use, and household tasks. The collection unifies heterogeneous datasets from institutions including Stanford, UC Berkeley, Carnegie Mellon, Columbia, and Google under a common RT-X data format. The dataset enabled training of the RT-X family of models demonstrating positive transfer across robot platforms. Embodiments include Franka Panda, WidowX, Kuka iiwa, Universal Robots, Google RT-2, Stretch, and 16 additional platforms. Data is stored in TFRecord and Zarr formats. Open X-Embodiment is the foundational pretraining dataset for cross-embodiment generalisation research and the reference dataset for evaluating whether robot policies can transfer across hardware.
| Year | 2023 |
|---|---|
| Episodes | 1,000,000 |
| Embodiments | Franka Panda, Google RT-2, WidowX 250s, Kuka iiwa, Sawyer, Universal Robots UR5, ABB YuMi, Stretch RE1, Hello Robot Stretch, xArm, CMU Stretch, Berkeley Autolab UR5, Stanford ViperX, Google Screw Robot, Jaco, Kinova Gen3, Stanford Hydra Hand, Bridge, RT-1 Robot, Google Robot, Toto, Columbia Grasp Dataset, UCSD Kitchen Dataset |
| Modalities | rgb, proprioception, language |
| View type | third-person, wrist-cam, egocentric |
| Task categories | manipulation, pick-and-place, cleaning, cooking, human-robot-interaction, inspection, warehouse |
| Data format | zarr, tfrecord |
| License | Apache 2.0 |
| Access | open — commercial use permitted |
| Maintainer | Google DeepMind, 33 collaborating institutions |
| Origin country | US |
What is it?
Open X-Embodiment is the largest cross-embodiment robot learning dataset ever assembled, developed by Google DeepMind in collaboration with 33 research institutions. Released in 2023 under Apache 2.0, it aggregates over one million demonstration episodes across 22 robot embodiments spanning manipulation, navigation, tool use, and household tasks. The collection unifies datasets from Stanford, UC Berkeley, Carnegie Mellon, Columbia, and Google under a common RT-X data format, enabling training of the RT-X family of models that demonstrated positive transfer across robot platforms.
Who is it for?
Open X-Embodiment is the primary pretraining dataset for cross-embodiment generalisation research. It is essential for researchers working on foundation models for robotics — models that should work across multiple robot types rather than a single platform. Teams building generalised manipulation policies, cross-embodiment transfer learning systems, and large-scale robot learning models use Open X-Embodiment as their primary data source.
Key specifications
- Episodes: 1,000,000+ demonstrations
- Embodiments: 22 robot platforms including Franka Panda, WidowX, Kuka iiwa, Google RT-2 robot, Universal Robots UR5, Hello Robot Stretch, Sawyer, xArm, Jaco, and 13 others
- Tasks: Manipulation, pick-and-place, navigation, tool use, household tasks, cleaning
- Format: TFRecord, Zarr
- License: Apache 2.0 — unrestricted commercial use
- Access: Open — Hugging Face and Google Cloud Storage
How it compares
Open X-Embodiment is unmatched in scale and embodiment diversity. DROID (76,000 episodes) and EgoVerse (57,761 episodes) are larger per-embodiment collections but cover fewer platforms. The key advantage of Open X-Embodiment is cross-embodiment coverage — the same policy training run can leverage data from 22 different robots simultaneously. The tradeoff is data heterogeneity: different datasets within the collection have different quality levels, collection protocols, and task distributions.
Limitations and access notes
The dataset is highly heterogeneous — quality, task diversity, and annotation consistency vary significantly across the 33 contributing institutions. Some constituent datasets are small (under 1,000 episodes). Apache 2.0 permits unrestricted commercial use, modification, and redistribution.
Linked professions
- Assembly Line Worker Repetitive
- Warehouse Picker Packer
- Hotel Housekeeper
- Fast Food Worker
- Commercial Floor Cleaner
Frequently asked questions
What is Open X-Embodiment?
Open X-Embodiment is a collection of over one million robot demonstration episodes from 22 different robot platforms, assembled by Google DeepMind and 33 collaborating institutions. It is the largest cross-embodiment robot learning dataset available and the foundation for training generalised robot policies that work across multiple hardware platforms.
How many robot types are in Open X-Embodiment?
Open X-Embodiment covers 22 robot embodiments including Franka Panda, WidowX 250s, Kuka iiwa, Google RT-2 robot, Universal Robots UR5, Hello Robot Stretch, Sawyer, xArm, Jaco, Kinova Gen3, and 12 additional platforms from contributing institutions.
Can Open X-Embodiment be used commercially?
Yes. Open X-Embodiment is licensed under Apache 2.0, which permits unrestricted commercial use, modification, and redistribution. No attribution beyond preserving the license notice is required.
How do I download Open X-Embodiment?
Open X-Embodiment is available on Hugging Face at huggingface.co/datasets/google-deepmind/open_x_embodiment and on Google Cloud Storage. The full dataset is very large — selective download by constituent dataset or embodiment is supported.
What is RT-X?
RT-X (Robotics Transformer X) is a family of robot foundation models trained on the Open X-Embodiment dataset by Google DeepMind. The X refers to cross-embodiment training — RT-X models demonstrated that training on data from multiple robot types improves performance across all of them compared to single-embodiment training.