Physical AIStartup LaunchJun 18, 2026, 11:36 AM· 4 min read· #3 of 3 in business

Robotics Data Startup XDOF Secures $70 Million and Open-Sources Landmark Training Dataset

XDOF has emerged from stealth with $70 million in funding to solve the biggest bottleneck in physical AI: the lack of real-world training data for robots. Alongside the launch, the startup released the world's largest open-source dataset for bimanual robot manipulation.

By Factlen Editorial Team

AI Infrastructure Providers 35%Academic & Open-Source Community 35%Frontier AI Developers 30%
AI Infrastructure Providers
Focus on the business model of providing 'pick-and-shovel' physical data services to AI labs.
Academic & Open-Source Community
Value the release of the ABC-130K dataset as a massive leap forward for reproducible, accessible robotics research.
Frontier AI Developers
View high-quality, scalable physical data as the final bottleneck to achieving general-purpose embodied AI.

What's not represented

  • · Human Teleoperators
  • · Industrial Automation Buyers

Why this matters

While language models like ChatGPT trained on the entire internet, physical robots have lacked a comparable data source to learn from. XDOF's infrastructure and open-source dataset could dramatically accelerate the development of general-purpose robots capable of performing complex physical tasks in homes and factories.

Key points

  • XDOF emerged from stealth with $70 million from investors including Thrive Capital and a16z.
  • The startup provides outsourced data pipelines and teleoperation infrastructure for training physical robots.
  • XDOF released ABC-130K, the world's largest open-source dataset for bimanual robot manipulation.
  • The company already serves roughly 20 customers, including major frontier AI laboratories.
$70 million
Funding raised by XDOF
130,000
Robot manipulation trajectories in ABC-130K
3,500
Hours of real-world interaction data released
195
Distinct physical tasks covered

As the artificial intelligence industry races to build machines that can operate in the physical world, a critical bottleneck has emerged: the lack of high-quality training data. While large language models achieved breakthroughs by ingesting the entire internet, robots require precise, physical interaction data that simply does not exist online. To solve this "chicken-and-egg" problem, robotics data infrastructure startup XDOF officially emerged from stealth this week, announcing a $70 million funding round.[1][4]

The round drew participation from a roster of heavyweight venture capital firms, including Thrive Capital, Spark Capital, Andreessen Horowitz (a16z), Lux Capital, and WndrCo. Founded in October 2024 by Philipp Wu, Fred Shentu, and Nemo Jin—alumni of UC Berkeley, Tesla, and Meta—XDOF operates as an outsourced data factory for the robotics industry. The company builds the specialized data pipelines, collection hardware, and annotation systems required to train robots for real-world physical interaction.[2][3][4]

"We didn't have large-scale data to work with," CEO Philipp Wu explained, recalling his time as a Ph.D. student. "We first needed to actually collect data before we could even ask how to train a foundation model for robotics." To prove the efficacy of its infrastructure, XDOF simultaneously released ABC-130K, which it describes as the world's largest open-source bimanual robot manipulation dataset.[1][5]

The ABC-130K release represents the largest open-source collection of bimanual robot manipulation data to date.
The ABC-130K release represents the largest open-source collection of bimanual robot manipulation data to date.

Developed in collaboration with researchers from UC Berkeley, Carnegie Mellon University, MIT, and Amazon, the landmark dataset provides the academic community with an unprecedented foundation for training physical AI. The release includes 130,000 robot manipulation trajectories, representing 3,500 hours of real-world interaction data. The dataset spans 195 distinct physical tasks, capturing a wide spectrum of manipulation primitives such as pick-and-place actions, handovers, and tool use.[2][5][6]

The release includes 130,000 robot manipulation trajectories, representing 3,500 hours of real-world interaction data.

More impressively, the dataset includes highly dexterous behaviors that have traditionally confounded robotic systems, such as folding T-shirts, flattening cardboard boxes, and precisely placing AirPods into their plastic charging cases. By open-sourcing this massive corpus, XDOF aims to establish a universal baseline, allowing researchers to iterate on model designs without spending millions of dollars to build their own data collection warehouses.[1][5][6]

Physical AI requires high-fidelity data to master dexterous tasks like folding clothes or assembling small parts.
Physical AI requires high-fidelity data to master dexterous tasks like folding clothes or assembling small parts.

Generating this caliber of data requires a massive, labor-intensive operation. Existing workarounds, such as scraping YouTube videos or using low-quality footage captured by gig workers, have proven inadequate because they lack the precise spatial fidelity and control inputs needed for effective robotic learning. Instead, XDOF employs a three-tier data acquisition strategy. The primary method involves direct teleoperation, where trained human operators use specialized rigs to remotely control robotic arms, effectively demonstrating the exact physical movements the AI needs to mimic.[1][2][4]

The company's secondary and tertiary data collection tiers involve lower-cost teleoperation devices—stemming from an open-source project called GELLO—and egocentric wearable sensors that capture everyday human movements from a first-person perspective. This layered approach allows XDOF to scale its data production efficiently, transforming a bespoke research experiment into a standardized, industrial-grade infrastructure business.[1][2]

XDOF utilizes a layered approach to capture diverse, high-quality physical interaction data at scale.
XDOF utilizes a layered approach to capture diverse, high-quality physical interaction data at scale.

The market demand for this infrastructure is already evident. Despite operating under the radar for nearly two years, XDOF has grown to 60 employees and secured approximately 20 active customers, including several of the world's leading frontier AI laboratories. The fact that these well-funded labs are choosing to pay XDOF rather than build their own internal pipelines reveals a deliberate strategic shift. By outsourcing data collection, AI developers can keep the operational complexity of maintaining warehouses, calibrating hundreds of robots, and managing teleoperators off their balance sheets.[1][2][4]

XDOF's launch arrives at a pivotal moment for the broader technology sector. Just weeks ago, OpenAI announced the revival of its internal robotics training program—an initiative it had previously shuttered in 2021 to focus exclusively on software models. This pivot signals a growing consensus that "physical AI" is the next major frontier. With its massive open-source contribution and robust commercial pipeline, XDOF is positioning itself as the foundational utility for the incoming robotics revolution.[1][4][5]

How we got here

  1. 2021

    OpenAI shuts down its initial robotics research program to focus entirely on software-based models.

  2. October 2024

    XDOF is founded by researchers from UC Berkeley, Meta, and Tesla to solve the robotics data bottleneck.

  3. May 2026

    OpenAI announces the revival of its robotics training program, signaling renewed industry focus on physical AI.

  4. June 2026

    XDOF emerges from stealth, announcing $70 million in funding and releasing the ABC-130K dataset.

Viewpoints in depth

Frontier AI Laboratories

View outsourced data infrastructure as a strategic necessity.

For the world's leading AI labs, building a massive physical data operation is a distraction from their core competency: designing neural networks. By outsourcing to companies like XDOF, these labs can keep the immense operational complexity of maintaining warehouses, calibrating hundreds of robots, and managing global teams of teleoperators off their balance sheets. This allows them to scale their robotics programs rapidly without diluting their focus on model architecture.

Academic Researchers

Celebrate the open-source ABC-130K dataset as a democratizing force.

Historically, state-of-the-art robotic systems have been developed behind closed doors in well-funded corporate labs, leaving university researchers without the resources to compete. The academic community views the release of the ABC-130K dataset—and its accompanying simulation pipelines—as a leveling of the playing field. By providing 3,500 hours of high-quality interaction data for free, researchers can now test new algorithms and architectural designs without needing millions of dollars for hardware and data collection.

Robotics Infrastructure Founders

Argue that the next defensible layer in the AI boom is deeply physical.

Founders and investors in the embodied AI space believe that the era of purely software-based AI moats is ending. They argue that the next massive value creation will happen in the physical realm. Because collecting real-world interaction data requires specialized hardware, massive physical footprints, and trained human operators, it creates a highly defensible business model that cannot be easily replicated by simply renting more cloud computing power.

What we don't know

  • Which specific frontier AI laboratories make up XDOF's 20 active customers.
  • How quickly the open-source ABC-130K dataset will translate into commercially viable robotic capabilities.
  • Whether the cost of human teleoperation can be driven down enough to make physical data collection as scalable as web scraping.

Key terms

Physical AI
Artificial intelligence systems designed to operate and interact within the physical world, typically embodied in robots.
Teleoperation
The remote control of a machine or robot by a human operator, often used to demonstrate tasks so the robot can learn them.
Behavior Cloning
A machine learning method where an AI model learns to perform a task by mimicking the recorded actions of a human expert.
Bimanual Manipulation
The ability of a robot to use two arms or hands in coordination to perform complex tasks, such as folding clothes.
Degrees of Freedom (DOF)
The number of independent parameters that define the configuration or state of a mechanical system, such as the joints in a robotic arm.

Frequently asked

Why can't robots just learn from YouTube videos?

YouTube videos and gig-worker footage lack the precise, multi-dimensional spatial data and control inputs required to effectively train a robot for complex physical manipulation.

What exactly is the ABC-130K dataset?

It is the world's largest open-source dataset for bimanual robot manipulation, containing 130,000 trajectories and 3,500 hours of real-world interaction data.

Who are XDOF's customers?

While specific names are confidential, XDOF already serves about 20 customers, including several leading frontier AI laboratories racing to develop general-purpose robots.

Sources

Source coverage

6 outlets

3 viewpoints surfaced

AI Infrastructure Providers 35%Academic & Open-Source Community 35%Frontier AI Developers 30%
  1. [1]SiliconANGLEFrontier AI Developers

    Robotic teleoperation data startup XDOF launches with $70M in funding

    Read on SiliconANGLE
  2. [2]AI WeeklyAI Infrastructure Providers

    XDOF Lands $70M to Build Robot Training Data Pipelines

    Read on AI Weekly
  3. [3]The SaaS NewsAI Infrastructure Providers

    XDOF Raises $70M in Funding

    Read on The SaaS News
  4. [4]Hyper AIFrontier AI Developers

    XDOF Raises $70M to Build Data Pipelines for Robot Training

    Read on Hyper AI
  5. [5]DiggAcademic & Open-Source Community

    Robotics startup XDOF raises $70M and releases ABC-130K, the largest open-source bimanual teleoperation dataset

    Read on Digg
  6. [6]ABC Bot ResearchAcademic & Open-Source Community

    ABC: A fully open-source stack for manipulation with behavior cloning

    Read on ABC Bot Research
Stay informed

Every angle. Every day.

Get business stories with full source coverage and perspective breakdowns delivered to your inbox.