Over the past two years, Facebook AI Research (FAIR) has worked with 13 universities around the world to compile the largest dataset ever for first person video—specifically to train deep learning image recognition models. AI systems trained on the data set will be better at controlling robots that interact with people, or interpreting images from smart glasses. “Machines will only be able to help us in our daily lives if they truly understand the world through our eyes,” says Kristen Grumman of FAIR, who is leading the project.
This technology can support people who need help around the house, or guide people in tasks they are learning to complete. “The video in this data set is a lot closer to how humans observe the world,” says Michael Rio, a computer vision researcher at Google Brain and Stony Brook University in New York, who is not involved in Ego4D.
But the potential abuse is obvious and alarming. The research was funded by Facebook, the social media giant that was recently impeached in the US Senate Putting profits on people’s welfare—As confirmed by MIT Technology Review’s Special Investigations.
The business model of Facebook and other big tech companies is to extract as much data as possible from people’s online behavior and sell it to advertisers. The artificial intelligence demonstrated in the project can extend this access to people’s everyday offline behavior, revealing what things are around your home, the activities you’ve enjoyed, the people you’ve spent time with, and even where your gaze has stayed — an unprecedented degree of personal information.
“There is work on privacy to be done and you are taking this out of the realm of exploratory research into something that is considered productive,” Grumman says. “This work could even be inspired by this project.”
The largest previous dataset of first person video consisted of 100 hours of footage of people in the kitchen. The Ego4D dataset consists of 3,025 hours of video recorded by 855 people in 73 different locations in nine countries (US, UK, India, Japan, Italy, Singapore, Saudi Arabia, Colombia and Rwanda).
The participants were of different ages and backgrounds. Some were recruited for their visually interesting occupations, such as bakers, mechanics, carpenters, and landscapers.
Previous datasets usually consisted of half-text videos of only a few seconds in length. For Ego4D, participants wore head-mounted cameras for up to 10 hours at a time and captured first-person video of everyday unrecorded activities, including walking along the street, reading, washing, shopping, playing with pets, playing board games, and interacting with others. . Some footage also includes audio, data on where participants’ gaze is focused, and multiple perspectives in the same scene. It’s the first data set of its kind, Rio says.