In a chilly basement room at Carnegie Mellon University sits a giant dome that looks like part physics experiment, part that chamber Darth Vader kicks back in. Wires and electronic boxes stud the walls, which curve nearly 14 feet in the air. But this space wasn’t built for subatomic particles, and it wasn’t built for space villains. It was built for the betterment of robots.
This is the so-called Panoptic Studio. The wires and electronics are actually an elaborate system of 500 cameras, both 2-D and 3-D, that capture groups of people inside the dome—none of those silly ball-covered suits required. What you get is a glimpse of how robots will one day watch us, picking out subtle gestures that will be pivotal to the interaction between humans and machines. To do that, these researchers are collecting a mountain of data to train 3-D machine vision algorithms, a lot like how visual recognition already works in 2-D.
At the moment, robots aren’t half bad at verbally communicating with humans. A companion robot like Kuri, for example, is more or less a digital assistant on wheels that responds to your commands. But really, there’s so much more to communication than words. As robots grow more sophisticated, so will our relationship with them. They’ll need to recognize our gestures if we want them to bring us things we point to. More subtly, something like a security robot would do well to analyze postures to detect aggression. And a healthcare robot should be able to spot facial expressions that might indicate pain.
Robots are miles away from being able to do all that, but the Panoptic Studio is an intriguing bridge to that future. It works by marrying feeds from that mess of cameras—mostly low-resolution, but also 31 in high-resolution and 10 Kinect 3-D cameras. The system computes the location of each camera in space, so when all the feeds are fused, it can accurately overlay a skeleton on the subjects within the dome, showing limbs and even individual fingers.
All those feeds means a whole lot of data coming in—600 gigabytes a minute—so the researchers use an army of GPUs to crunch it all. To date, they’ve collected more than a petabyte of data, or one million gigabytes, showing everything from people shaking hands to dancing to playing musical instruments. If you were trying to teach an algorithm to recognize photos of people doing those activities, you could pretty easily assemble a training set by combing through stock photos online. But no such data set exists for 3-D imagery. “That’s why we need this studio, because at this moment this is maybe the only easy way to collect such data,” says Carnegie Mellon roboticist Hanbyul Joo, who helped develop the system.
Problem, though: Most robots don’t envelop their subjects and point 500 cameras at them to monitor their movements. It’s not like this system would do you any good in the wild. (Your average robot out there as one or two cameras, in addition to a lidar system that maps the world in lasers.) Which is why Joo and his colleagues have taken what they’ve learned with the Panoptic Studio and developed a program called OpenPose, which can model your movements as a skeleton in real time, right through your webcam. And if it can run on your webcam, it can run on a robot one day. Go ahead, try it here.
It’s this kind of lightweight system that could be combined with voice recognition in a robot to facilitate more natural communication. “In social interaction, the small details are also very important,” says Joo. “Small facial expressions and small hand gestures that dramatically affect our interaction. We can directly catch the meaning of this details.”
This will be pivotal in a wide range of interactions with machines. Robot arms in the workplace have to be able to detect our presence if they’re going to be truly collaborative. You might even imagine therapy robots that interact with autistic children, precisely monitoring their facial expressions. We’re already seeing this kind of sensing technology in a robot that’s helping deaf children communicate.
So what began in a camera-laden dome in a basement could one day bring us closer to the machines. Darth Vader would be so proud.