Ego4D — Around the World in 3,025 Hours of Egocentric Video
Ego4D is the largest egocentric human activity video dataset ever assembled, developed by Meta AI Research in collaboration with over 50 institutions across 9 countries. Released in 2022 under the Ego4D Research License for non-commercial use, it contains 3,025 hours of egocentric video recorded by 931 camera wearers across diverse daily activities including cooking, construction, health activities, sports, and social interactions captured in 74 worldwide locations. The dataset includes rich annotations covering narrations, temporal action segments, object interactions, and social interaction segments. Ego4D provides the largest source of human demonstration data for robot learning pre-training, particularly for egocentric manipulation policy training. Access requires signing the Ego4D license agreement. The dataset has been used to pre-train vision encoders and action models that transfer to robot manipulation tasks, bridging the gap between human and robot data.
| Year | 2022 |
|---|---|
| Total hours | 3,025 |
| Frame rate | 30 fps |
| Embodiments | human (wearable camera) |
| Modalities | rgb, audio, language |
| View type | egocentric |
| Task categories | manipulation, cooking, cleaning, dishwashing, folding, construction, human-robot-interaction, long-horizon |
| Data format | mp4, json |
| License | Ego4D License — non-commercial research only |
| Access | gated |
| Maintainer | Meta AI Research, University of Indiana, 50+ global institutions |
| Origin country | US |
What is it?
Ego4D is the largest egocentric human activity video dataset ever assembled, developed by Meta AI Research in collaboration with over 50 institutions across 9 countries. Released in 2022 under the Ego4D Research License, it contains 3,025 hours of egocentric video recorded by 931 camera wearers across 74 worldwide locations. Activities span cooking, construction, healthcare, sports, crafts, agriculture, and social interactions. The dataset includes rich annotations covering narrations, temporal action segments, object interactions, and social interaction labels.
Who is it for?
Ego4D is primarily used to pretrain vision encoders and action models for robot learning. Human egocentric video provides a vastly larger and more diverse source of manipulation demonstrations than any robot-collected dataset — humans performing everyday tasks generate implicit robot training signal when the tasks overlap. Researchers use Ego4D to pretrain visual representations that then transfer to robot manipulation, reducing the amount of real-robot data needed for policy training.
Key specifications
- Duration: 3,025 hours of egocentric video
- Camera wearers: 931 participants
- Locations: 74 worldwide locations across 9 countries
- Frame rate: 30 fps
- Annotations: Narrations, temporal actions, object interactions, social interactions
- Format: MP4, JSON annotations
- License: Ego4D License — non-commercial research only
- Access: Gated — requires signing Ego4D license agreement
How it compares
Ego4D is by far the largest egocentric dataset available (3,025 hours vs EgoVerse's 1,213 hours), but covers human activities rather than robot demonstrations directly. EgoVerse provides real robot demonstrations in egocentric format with task labels — more directly usable for robot policy training. Ego4D's value is scale for pretraining visual representations; EgoVerse's value is task-labelled robot demonstrations.
Limitations and access notes
The Ego4D license prohibits commercial use. Access requires signing the license agreement which includes providing institutional affiliation. The dataset covers human activities — transfer to robot manipulation requires domain adaptation. Not all activities are relevant to robot manipulation tasks.
Linked professions
- Assembly Line Worker Repetitive
- Warehouse Picker Packer
- Hotel Housekeeper
- Fast Food Worker
- Commercial Floor Cleaner
- Laundry Worker Commercial
Frequently asked questions
How is Ego4D used for robot learning?
Ego4D is primarily used to pretrain visual representations (encoders) that are then fine-tuned on smaller robot datasets. The scale and diversity of human egocentric video allows models to learn rich visual features for manipulation understanding before seeing any robot data.
Can Ego4D be used commercially?
No. The Ego4D Research License restricts use to non-commercial academic research only. Commercial use requires a separate arrangement with Meta AI Research.
How do I access Ego4D?
Access requires signing the Ego4D license agreement at ego4d-data.org. You must provide institutional affiliation and agree to non-commercial use restrictions. Once approved, data is available via the Ego4D downloader tool.
How does Ego4D compare to EgoVerse for robot learning?
Ego4D (3,025 hours) is larger than EgoVerse (1,213 hours) but contains human activities rather than robot demonstrations. EgoVerse provides task-labelled robot demonstrations with known robot kinematics — more directly usable for robot policy training. Ego4D is better for visual pretraining; EgoVerse is better for imitation learning.
What countries contributed to Ego4D?
Ego4D was collected across 74 locations in 9 countries by over 50 institutions worldwide. Camera wearers performed daily activities in their local environments, providing geographic and cultural diversity in the dataset that single-country collections cannot match.