How Humanoid Robots Are Learning to Move by Watching Factory Workers
To solve the massive data bottleneck in physical AI, robotics companies are paying thousands of workers worldwide to record their daily tasks from a first-person perspective, sparking a new frontier in the gig economy and a debate over who owns 'bodily knowledge.'
By Factlen Editorial Team
- Data Aggregators & Robotics Firms
- Argue that massive human data collection is the only viable path to solving the physical AI bottleneck and bringing useful robots to market.
- Labor & Tech Ethicists
- Raise concerns about the uncompensated extraction of 'bodily knowledge' and the paradox of workers training their own automated replacements.
- Data Contributors
- View the data collection boom as a lucrative, accessible gig-economy opportunity that pays significantly above local minimum wages for simple tasks.
What's not represented
- · Legal scholars specializing in biometric copyright
- · Traditional manufacturing labor unions
Why this matters
The race to build useful humanoid robots has quietly spawned a massive new global gig economy. How society chooses to value and compensate the human 'bodily knowledge' powering these machines will set the precedent for labor rights and data ownership in the physical AI era.
Key points
- Robotics companies are paying workers to record first-person video of daily tasks to train physical AI.
- This 'egocentric data' teaches robots how to understand depth, grip strength, and spatial reasoning.
- The practice aims to digitize 'bodily knowledge'—the human physical intuition that cannot be easily coded.
- Ethicists warn of a paradox where workers are actively generating the data that will automate their jobs.
- The data collection boom has created a lucrative new gig economy, particularly in the Global South.
The artificial intelligence industry has encountered a formidable physical wall. While large language models like ChatGPT achieved remarkable fluency by ingesting trillions of words from the internet, humanoid robots have no equivalent "internet of movement" to learn from. This absence of physical training data has become the single largest bottleneck in the projected $38 billion robotics industry. To solve it, technology companies are quietly building a massive, hidden supply chain of human motion, turning everyday physical labor into the raw material for the next generation of autonomous machines.[1][2]
Across the globe, thousands of factory workers, gig economy contractors, and homemakers are being paid to strap cameras to their heads and record themselves performing mundane tasks. From folding hand towels in Chennai to slicing mangoes in domestic kitchens, every human movement is being meticulously cataloged. This first-person footage, known in the industry as "egocentric data," provides the contact-rich demonstrations that physical AI systems desperately need. Robots require these intimate viewpoints to understand depth perception, grip strength, and the spatial reasoning required to navigate unpredictable environments.[2][4]
The scale of the data collection effort is staggering, reflecting the immense complexity of the physical world. Startups like Objectways, Micro1, and EgoLab are aggregating millions of hours of human activity each month, feeding the data pipelines of robotics giants like Tesla, Figure AI, and AgiBot. These data aggregators act as the crucial middlemen, bridging the gap between the high-tech laboratories of Silicon Valley and the factory floors of the Global South, where human motion is abundant and relatively inexpensive to capture.[1][7]
The mechanism of this learning process is fascinating. When a worker in India folds a towel while wearing a head-mounted GoPro, the system captures far more than just a visual sequence of events; it records the subtle, invisible physics of human interaction. Data engineers meticulously annotate this footage to map how human fingers sense the weight of the fabric, how the wrist flips at the precise moment of tension, and how the eyes track the edges of the material throughout the fold.[1][4]

This process represents a technological attempt to digitize what philosopher Michael Polanyi famously called "tacit knowledge"—the profound reality that humans know far more than they can articulate. You cannot write a line of code that perfectly describes how to balance a bicycle, or exactly how much pressure to apply when holding a ripe piece of fruit without bruising it. These are embodied instincts, learned through years of physical interaction with the world, which have historically been impossible to transfer to a machine.[1]
These are embodied instincts, learned through years of physical interaction with the world, which have historically been impossible to transfer to a machine.
By capturing egocentric video at scale, AI researchers are attempting to externalize this "bodily knowledge." They are converting human physical intuition into a mathematical dataset that neural networks can process, imitate, and eventually master. This marks a fundamental shift in how artificial intelligence is developed: moving away from programming explicit rules and toward a paradigm of pure observation and imitation, where the machine learns by watching humans navigate the friction and gravity of reality.[1][2]
Beyond passive video recording, the industry is also heavily utilizing active "teleoperation" to gather high-fidelity data, placing humans directly in the loop of robotic action. In massive data-collection facilities in Shanghai and Shenzhen, operators wear virtual reality headsets and haptic gloves to remotely pilot humanoid robots through simulated environments. This allows the AI to learn not just from human video, but from the robot's own mechanical sensors as it is guided through a task by a human mind.[3][6]
As the human operator guides the robot to pick up a cup, organize a shelf, or pack a box, the machine records the exact torque, force, and joint movements required to complete the task successfully from its own mechanical perspective. Every success and every failure becomes vital training data. Through a process called reinforcement learning, the AI model gradually learns to associate specific visual inputs with the correct motor outputs, tightening the feedback loop until it can perform the task autonomously without human intervention.[6][7]

However, this massive extraction of human motion has sparked a complex ethical debate regarding the ownership of bodily knowledge. Labor advocates and tech ethicists point out a profound paradox at the heart of the industry: workers in the Global South are actively generating the training data that will eventually power the autonomous machines explicitly designed to replace them on the factory floor. The very skills that make these workers employable are being systematically downloaded into their mechanical successors.[1][4]
Furthermore, unlike traditional workplace data—such as keystrokes, login times, or email metadata—egocentric data is deeply intimate. It captures a worker's physical reflexes, their unique physical cadence, and the specialized tactile skills they have honed over a lifetime of manual labor. Once this bodily knowledge is extracted, digitized, and sold into the global AI supply chain, it is permanently separated from the worker who generated it, leaving them with no ongoing stake in the value their physical intuition creates.[1]
This dynamic raises unprecedented legal and philosophical questions about whether physical intuition can or should be treated as intellectual property, and what rights a worker retains over their own digitized reflexes once they are uploaded to a corporate server. For now, the gig-to-robotics pipeline is creating a lucrative new micro-economy. In regions where traditional wages are low, the opportunity to earn money simply by recording daily chores is highly appealing, with some workers earning $2 to $3 an hour for their footage—a significant premium over local minimum wages.[1][4]

Yet, the appetite for this data is nearly insatiable. Industry executives estimate that achieving generalized robotic autonomy will require billions of hours of real-world footage, spanning every conceivable human environment and edge case. As humanoid robots move closer to commercial deployment, the invisible workforce of human trainers will remain the crucial bridge between the digital realm of artificial intelligence and the messy, unpredictable physics of the real world, setting the definitive precedent for labor rights and data ownership in the impending automated era.[1][2][5]
How we got here
Early 2024
Robotics companies identify the lack of physical training data as the primary bottleneck for humanoid development.
Late 2024
Startups begin deploying teleoperation rigs to remotely control robots and gather initial movement datasets.
Mid 2025
Data aggregation firms launch massive global campaigns, paying gig workers to record first-person video of daily chores.
June 2026
The scale of human data collection reaches millions of hours monthly, sparking debates over the ownership of 'bodily knowledge.'
Viewpoints in depth
Robotics Developers
Focus on overcoming the physical AI data bottleneck.
For robotics engineers, the physical world presents a distribution of edge cases that cannot be solved by code alone. They argue that 'egocentric data' and teleoperation are the only ways to teach a neural network the intuitive physics of gravity, friction, and fragile objects. From their perspective, paying humans to demonstrate these tasks is a necessary transitional phase to build robots that can safely operate in human-centric environments like homes and hospitals.
Labor Rights Advocates
Focus on the extraction and ownership of human physical intuition.
Ethicists and labor researchers warn of a new form of digital extraction. They argue that a worker's 'bodily knowledge'—the muscle memory and physical intuition honed over years—is being harvested without long-term compensation or royalties. They highlight the stark paradox of the current dynamic: low-wage workers in the Global South are actively generating the training data that will eventually power the autonomous machines designed to replace them on the factory floor.
The Gig Workforce
Focus on the immediate economic benefits of data collection.
For many of the workers participating in these programs, the ethical debates are secondary to immediate economic realities. Earning $2 to $3 an hour to film household chores or factory tasks often represents a significant premium over traditional local wages. Many view the work as physically less demanding than standard manual labor, treating it as a lucrative, albeit potentially temporary, micro-tasking opportunity in the expanding global gig economy.
What we don't know
- Whether synthetic data and simulation will eventually replace the need for human-generated physical data.
- How labor laws will adapt to address the ownership and copyright of a worker's physical reflexes and bodily knowledge.
- When, or if, the volume of collected data will be sufficient to achieve generalized autonomy in unpredictable home environments.
Key terms
- Egocentric Data
- First-person video and sensor recordings captured from the perspective of the person performing an action.
- Bodily Knowledge
- The physical intuition, muscle memory, and reflexes a human uses to interact with the world, which cannot be easily explained in words.
- Teleoperation
- The remote control of a robot by a human operator, often used to generate precise training data for autonomous systems.
- Physical AI
- Artificial intelligence systems designed to operate in and interact with the physical world, rather than just processing digital information.
Frequently asked
What is egocentric data?
Egocentric data is first-person video and sensor data captured from the perspective of a human performing a task, used to teach robots how to interact with the physical world.
Why can't robots learn from the internet like ChatGPT?
While text models learn from written language, physical robots need to understand gravity, friction, and spatial reasoning—information that isn't captured in text and must be demonstrated physically.
How much are workers paid to collect this data?
Compensation varies globally, but many workers in countries like India earn between $2 and $3 an hour to record themselves performing routine tasks.
What is teleoperation?
Teleoperation involves a human operator using VR headsets and haptic controls to remotely guide a robot through a task, generating precise training data from the robot's own sensors.
Sources
[1]The GuardianLabor & Tech Ethicists
The Indian factory workers told to film themselves for AI
Read on The Guardian →[2]Los Angeles TimesData Aggregators & Robotics Firms
In an Indian town, workers fold towels while wearing cameras, providing data to teach AI robots
Read on Los Angeles Times →[3]Sixth ToneData Contributors
Inside China's Robot Training Factory: Where Humanoids Learn to Work
Read on Sixth Tone →[4]YnetnewsLabor & Tech Ethicists
Smartphone strapped to head, mango in hand, and $2 per hour wage: Dark side of the robot revolution
Read on Ynetnews →[5]Singularity HubData Aggregators & Robotics Firms
Teleoperation provides training data for robots and is needed to help them deal with unexpected events
Read on Singularity Hub →[6]Click Petróleo e GásData Contributors
In India, workers film household tasks to train AI robots in a billion-dollar market
Read on Click Petróleo e Gás →[7]AI CertsData Aggregators & Robotics Firms
Humanoid Robot Teleoperation in China's vast manufacturing engine
Read on AI Certs →
Every angle. Every day.
Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.









