Egocentric-1M — The Internet for Physical AI

Egocentric-1M is the largest egocentric video dataset ever released — approximately 1 million hours of first-person factory and manual-labor footage from head-mounted fisheye cameras, released by Build AI on April 8, 2026 under Apache 2.0. It is the third and largest release in Build AI's scaling series: Egocentric-10K (10,000 hours, November 2025), Egocentric-100K (100,000+ hours, December 2025), and Egocentric-1M (~1 million hours, April 2026). Founder Eddy Xu described it as 'the internet for physical AI.' At roughly 1 million hours it exceeds Ego4D (3,025 hours) by more than 300x. The dataset focuses on real production environments — assembly lines, sorting, packaging, and machining — captured on custom head-mounted glasses with state-of-the-art hand visibility and active manipulation density. Access requires accepting conditions on Hugging Face; the Apache 2.0 license permits commercial use.

Dataset specifications
Year2026
Total hours1,000,000
Embodimentshuman (head-mounted fisheye camera)
Modalitiesrgb, proprioception
Task categoriesmanipulation, pick-and-place, assembly, inspection, long-horizon
Data formatmp4, json, webdataset
LicenseApache 2.0
Accessgated — commercial use permitted
MaintainerBuild AI
Origin countryUS

What is it?

Egocentric-1M is the largest egocentric video dataset ever released — approximately 1 million hours of first-person recordings from head-mounted fisheye cameras, published by Build AI on April 8, 2026. It is the culmination of a rapid scaling series: Egocentric-10K (10,000 hours, 2,153 factory workers, 1.08 billion frames, November 2025), Egocentric-100K (100,000+ hours, December 2025), and Egocentric-1M (~1 million hours, April 2026). Build AI founder Eddy Xu described the dataset as 'the internet for physical AI' — a foundation-scale pretraining corpus for embodied intelligence.

Who is it for?

Teams pretraining robot foundation models, world models, and manipulation policies at scale. At 1M hours, Egocentric-1M is the first egocentric dataset large enough to support foundation-model pretraining from a single source. The factory and manual-labor domain focus makes it directly relevant to industrial manipulation and human-to-robot skill transfer.

Key specifications

How it compares

Egocentric-1M is more than 300x larger than Ego4D (3,025 hours) and roughly 10x larger than its predecessor Egocentric-100K. Xperience-10M (10,000 hours) offers richer per-session modalities such as motion capture and depth but at a fraction of the scale. The tradeoff is modality depth versus raw scale: Egocentric-1M is the scale play; Xperience-10M is the modality play.

Limitations and access notes

RGB fisheye video only — no depth, motion capture, or tactile modalities. Human data requires retargeting to robot kinematics. Fisheye distortion requires correction or fisheye-aware model architectures. The domain is factory and manual labor rather than domestic settings. Access requires Hugging Face login and condition acceptance; Apache 2.0 permits commercial use.

Linked professions

Frequently asked questions

How large is Egocentric-1M?

Approximately 1 million hours of egocentric video — the largest egocentric dataset ever released, exceeding Ego4D (3,025 hours) by more than 300x. Build AI released it on April 8, 2026.

How does Egocentric-1M relate to Egocentric-100K and Egocentric-10K?

All three are from Build AI using the same head-mounted fisheye camera configuration and factory manual-labor focus. Egocentric-10K (10,000 hours) launched in November 2025, Egocentric-100K (100,000+ hours) in December 2025, and Egocentric-1M (~1 million hours) in April 2026. Each is roughly 10x the previous release.

Can Egocentric-1M be used commercially?

Yes. It is Apache 2.0 licensed, permitting commercial use, modification, and redistribution. Access requires accepting the dataset conditions on Hugging Face first.

What domains does Egocentric-1M cover?

Real production environments — assembly lines, sorting, packaging, and machining — recorded from the first-person perspective of factory workers wearing head-mounted cameras. This industrial focus distinguishes it from home-focused datasets like Ego4D.

What does 'the internet for physical AI' mean?

Build AI founder Eddy Xu used this phrase to position Egocentric-1M as the foundation-scale pretraining corpus for embodied AI — playing the role for robot learning that web-scale text corpora played for large language models.