Movies and scifi books inspire roboticists to push the envelope, but they've also skewed the public's perception of robot capabilities. This problem is being exacerbated by researchers. In the last three months, I've had to shatter a few dreams: "Your $300 AR.Drone or $150 Ladybird will not be able to perform insane autonomous aerial maneuvers (yet). The UPenn quadrotors rely on $20k-$50k camera-based (Vicon) motion capture systems, which provide global pose estimation of each UAV at millimeter-accuracies at up to 1kHz (and often uses an external, centralized motion planning computer too)." That this crucial aspect of the videos does not register with intelligent people means that researchers are being disingenuous and violating their duty to the public -- which sucks, because their projects and research are awesome! And this is just the example that happens to be most salient to me at the moment. In this post I'd like to explore some "best practices" for robot videos so that we can quit misleading one another.
Every roboticist I know has encountered someone who insists, "Robots can already do that. I've seen it in the movies." Real robots are not as capable as Rosie (Jetsons), Sonny (iRobot), or the mechs in Avatar / District 9 / Matrix.
I forgive Hollywood (and scifi authors). Their job is to make entertaining fiction. Besides, their work is inspirational. I also forgive robots built as "art" and robot performances. For example, the PR2 robot dancing at its launch party and the recent quadrotor light show. No one attending these events cared that the robots were scripted and used external localization, respectively.
But research videos (naturally) need to be held to a higher standard. They must represent the work honestly.
I'm going to explore four factors "by example" that could be addressed through simple watermarks on the videos. I deeply respect all the research in these videos and the researchers who made them. I'm using you guys out of respect -- I love you all. ;-)
Let me start by saying that the following excuses are insufficient: The video needs to stand alone.
But what you may have missed... the inconspicuous infrared cameras that form the backbone of a "20-camera Vicon motion capture system" that costs between $20k and $50k:
Clearly the UPenn researchers aren't omitting this maliciously... In the "aggressive maneuvers" video, they explicitly mention the Vicon in the audio. But in the formation-flying swarm video, there is no mention of camera localization. This is problematic. Some of the early maneuvers (eg. single flips) do not require the Vicon, whereas flying in formation does. I am aware of these subtleties... but many (most?) people watching the video will not be!
"External Camera Localization" watermarked in the bottom right corner of the video when appropriate would really help.
Here's another pet peeve. Back before the days of the PR2, researchers at Stanford build the PR1:
The PR1 was an amazing piece of hardware that ultimately led to the PR2 robot by Willow Garage. But unbeknownst the the watcher: this video was 100% teleoperated and appears to have been sped up many times over. While the PR1 video demonstrates the hardware's capability, we are still many years away from robots operating in such unstructured environments with this level of fidelity. And yet any sane person watching this video is apt to think we've already achieved those capabilities. The same can be said for robots performing scripted actions.
"Under Teleoperation" or "Scripted Motions" or "Autonomous" (for bragging rights) watermarked in the bottom right corner of the video when appropriate would really help.
We kinda touched on this already, but this example is just too apt. Pieter Abbeel (and crew) from UC Berkeley enabled a PR2 robot to fold towels. This is awesome:
Just watching that video, it's tough to tell that the video is 50 times realtime. It's mentioned in the video's title (especially if you click through over to YouTube), but it's not readily apparent from the video itself. And here's the thing: who cares that the robot can only fold one towel every 20-25 minutes! The robot could fold towels all day while I'm away at work.
The problem is public perception. Here's what CNET has to say:
"It can fold up to 25 towels per minute." Oops. No, it really can't. But now the public won't understand the monumental progress required to make the PR2 fold a towel in just 5 minutes (aka, ongoing research). That's bad.
"50 x Realtime" or "Sped-Up 5000%" watermarked in the bottom right corner of the video when appropriate would really help.
This one is a little more dicey (for reasons I'll get to later). But look at PetMan:
That's one of the most compelling videos from 2011. And yet, one of the big challenges with walking robots right now is finding lightweight, high-density power supplies (and quiet, in the case of internal combustion engines). There are ongoing, multi-million dollar grants to address this issue. In some cases, it probably makes sense to include an additional disclosure that the robot is tethered. For other domains, communication tethers are a crucial distinguishing characteristic (eg. for latency).
"Tethered" watermarked in the bottom right corner of the video when appropriate would really help.
Disclosing caveats via watermarks would go a long way toward informing the public (and other researchers). This helps keep the work honest and paves the way for truly-groundbreaking improvements: the KMel robots operating without Vicon, the PR2 tidying up a room autonomously, towel folding at 25 towels / minute, and an untethered PetMan. But for those giant leaps to be recognized as such... We need to communicate key limitations of existing robots -- especially via the videos.
But how much is too much? I'm not sure. Clearly, not all caveats should be reported via watermarks on the video. We understand that cameras don't usually function too well in bright sunlight. We know that these small UAV's don't work outdoors. Ultimately, it falls on the research community to establish sound practices, and I hope that this blog post will (at least!) get the conversation started.
I'd like to thank my advisor (Dr. Charlie Kemp at Georgia Tech's Healthcare Robotics Lab) for instilling me with a sense of "robot video ethics." I didn't always share his perspective, and probably even rebelled against it a bit in grad school. But I've come to appreciate his wisdom!