Ego4D — Around the World in 3,025 Hours of Egocentric Video

Ego4D is the largest egocentric human activity video dataset ever assembled, developed by Meta AI Research in collaboration with over 50 institutions across 9 countries. Released in 2022 under the Ego4D Research License for non-commercial use, it contains 3,025 hours of egocentric video recorded by 931 camera wearers across diverse daily activities including cooking, construction, health activities, sports, and social interactions captured in 74 worldwide locations. The dataset includes rich annotations covering narrations, temporal action segments, object interactions, and social interaction segments. Ego4D provides the largest source of human demonstration data for robot learning pre-training, particularly for egocentric manipulation policy training. Access requires signing the Ego4D license agreement. The dataset has been used to pre-train vision encoders and action models that transfer to robot manipulation tasks, bridging the gap between human and robot data.

Dataset specifications
Year	2022
Total hours	3,025
Frame rate	30 fps
Embodiments	human (wearable camera)
Modalities	rgb, audio, language
View type	egocentric
Task categories	manipulation, cooking, cleaning, dishwashing, folding, construction, human-robot-interaction, long-horizon
Data format	mp4, json
License	Ego4D License — non-commercial research only
Access	gated
Maintainer	Meta AI Research, University of Indiana, 50+ global institutions
Origin country	US

What is it?

Who is it for?

Ego4D is primarily used to pretrain vision encoders and action models for robot learning. Human egocentric video provides a vastly larger and more diverse source of manipulation demonstrations than any robot-collected dataset — humans performing everyday tasks generate implicit robot training signal when the tasks overlap. Researchers use Ego4D to pretrain visual representations that then transfer to robot manipulation, reducing the amount of real-robot data needed for policy training.

Key specifications

Duration: 3,025 hours of egocentric video
Camera wearers: 931 participants
Locations: 74 worldwide locations across 9 countries
Frame rate: 30 fps
Annotations: Narrations, temporal actions, object interactions, social interactions
Format: MP4, JSON annotations
License: Ego4D License — non-commercial research only
Access: Gated — requires signing Ego4D license agreement

How it compares

Ego4D is by far the largest egocentric dataset available (3,025 hours vs EgoVerse's 1,213 hours), but covers human activities rather than robot demonstrations directly. EgoVerse provides real robot demonstrations in egocentric format with task labels — more directly usable for robot policy training. Ego4D's value is scale for pretraining visual representations; EgoVerse's value is task-labelled robot demonstrations.

Limitations and access notes

The Ego4D license prohibits commercial use. Access requires signing the license agreement which includes providing institutional affiliation. The dataset covers human activities — transfer to robot manipulation requires domain adaptation. Not all activities are relevant to robot manipulation tasks.

Linked professions

Frequently asked questions

How is Ego4D used for robot learning?

Ego4D is primarily used to pretrain visual representations (encoders) that are then fine-tuned on smaller robot datasets. The scale and diversity of human egocentric video allows models to learn rich visual features for manipulation understanding before seeing any robot data.

Can Ego4D be used commercially?

No. The Ego4D Research License restricts use to non-commercial academic research only. Commercial use requires a separate arrangement with Meta AI Research.

How do I access Ego4D?

Access requires signing the Ego4D license agreement at ego4d-data.org. You must provide institutional affiliation and agree to non-commercial use restrictions. Once approved, data is available via the Ego4D downloader tool.

How does Ego4D compare to EgoVerse for robot learning?

Ego4D (3,025 hours) is larger than EgoVerse (1,213 hours) but contains human activities rather than robot demonstrations. EgoVerse provides task-labelled robot demonstrations with known robot kinematics — more directly usable for robot policy training. Ego4D is better for visual pretraining; EgoVerse is better for imitation learning.

What countries contributed to Ego4D?

Ego4D was collected across 74 locations in 9 countries by over 50 institutions worldwide. Camera wearers performed daily activities in their local environments, providing geographic and cultural diversity in the dataset that single-country collections cannot match.