Machines are becoming increasingly adept at creating content. Whether it be news articles, poetry, or visual art, computers are learning how to mimic human creativity in novel — and sometimes disturbing — ways.
Text-based content is fairly easy for computers to generate. Anyone who has used a smartphone to text knows that operating systems are pretty savvy in predicting speech patterns. But videos and other visual mediums are a little more challenging — not only does a computer need to predict a logical thought, it also needs to visualize that thought in a coherent manner.
It’s a challenge that came to light last week with the revelation that Youtube is home to some decidedly unsettling children’s videos. They feature popular characters like Elsa from “Frozen” or Spiderman and the kind of simple songs and colorful graphics every parent is familiar with. Watch these videos for more than a few seconds, though, and it’s hard not to feel creeped out.
Though some feature scenes of explicit violence, there’s a certain “wrongness” to most of them, as if they were alien content attempting to masquerade as “human” creations. Which, essentially, is what some of them are.
Writer James Bridle recently touched on the topic in a popular Medium article. With so many kids watching YouTube videos, he explains, certain channels are pumping out auto-generated content to earn advertising dollars. Some videos seem to have benefited from human input, but others are clearly automated jumbles.
It’s about as far as you can be from the dedicated — and human — teams crafting beloved children’s movies at Disney and Pixar. It’s also the result of an emerging effort to shift some of the burden of video production to computers. It’s something that’s attracted the attention of both artists and researchers, and we’re sure to see more in the future. Whether it’s recreating a deceased “Star Wars” character or churning out children’s videos for a quick buck, the industry is still in its infancy.
One way that computers can “cheat” in creating believable visual content is by extrapolating from an already existing image or video. The combination of an existing starting point and a bit of training allows the computer to create video.
In the world of auto-generated visual content, that training usually comes from absorbing content from other videos — lots of videos. In this study out of MIT and the University of Maryland Baltimore County, the system was trained on a year’s worth of video content.
In that case, a still image was used to generate small videos predicting what would happen next in the scene. For example, images of beaches result in crashing waves and photos of people become videos of walking or running. Due to the shaky, low-resolution quality of the video, they’re all pretty creepy (especially the babies), but the study is promising.
“In the future, we will be able to generate longer and higher resolution videos,” says the video associated with the study.
In some ways, training a computer to create animated videos is a lot easier than extrapolating from photos, although the sense of uncanniness often remains. An animator can create characters, scenes, and movements, and then simply give the computer a set of broad instructions for what to do with them. Once the computer has all the inputs, it can create a wide array of animated outputs.
Using the inputs, videos are assembled based on a variety of tags and themes. As these themes begin to stack, the plot of the videos becomes a strange game of content telephone. What once may have been a coherent, harmless video undergoes multiple reiterations and reformations until it becomes a meaningless assembly of random characters and plot.
Some of these videos are normal and tame, and others become a deeply disconcerting mash-up of inputs. It’s likely that such videos were able to fly under the radar so long simply because children aren’t really very picky about what they watch.
But not all auto-generated animation is so off-putting. One of the most mainstream (and profitable) applications for automated animation is in the world of video games. Much like children’s videos, video game animators can frequently get away with less than perfect animation. Due to their length and the immense amount of animation work required, it’s sometimes better to let an algorithm shoulder the load.
In the open-world video game The Witcher 3, animators created an algorithm to generate dialogue scenes with characters throughout the game. Piotr Tominski, an animator on the project, explained the system to PCGamer.
“It sounds crazy, especially for the artist, but we do generate dialogues by code,” he says. “The generator’s purpose is to fill the timeline with basic units. It creates the first pass of the dialogue loop. We found out it’s much faster to fix or modify existing events than to preset every event every time for every character. The generator works so well that some less important dialogues will be untouched by the human hand.”
An Awkward Future?
Of course, all of this is a little clumsy now — you wouldn’t confuse these videos or animations for something a real, skilled human created. And, even the algorithms that are helping to create content still require some human finessing. But computer learning has progressed by leaps and bounds in the past five years, enough to indicate that fully computer-generated imagery could play a vital role in the future of movies and animation.
Powerhouse companies like Disney and Google are investing in computer-generated animations: Disney through research into text-to-speech animation systems, and Google through its DeepMind AI animation projects. With so many varied approaches to auto-generating animation and movies, the future seems promising. Watch your backs, animators.