The Uncanny Valley Nobody's Talking About: Eerie Robot Voices

Call it the Great Convergence of Creepiness. The first bit, the uncanny valley, we’re all familiar with by now: If a humanoid robot looks super realistic, but not quite realistic enough, it freaks us out. So far that idea has been applied almost entirely to robot faces and bodies, but it’s less known as a phenomenon in robot voices.

Except, that is, to Kozminski University roboticist Aleksandra Przegalinska, also a research fellow at MIT. Przegalinska is bringing a scientific ear to the booming economy of chatbots and voice assistants like Alexa. WIRED sat down with Przegalinska at SXSW this week to talk about the monumental challenge of replicating human intonation, why the future of humanoid robots may not be particularly bright, and what happens when you let students teach a chatbot how to talk.

This conversation has been edited for length and clarity.

WIRED: So why study robot voices, of all things?

When you think about robots, the creepiness is not only in the face and in the gaze, although that’s very powerful. It’s also very often in the voice, the way it speaks. The tonality itself is a very important thing here. That’s why we got interested in chatbots and so we built our own.

The chatbot was talking to my students for a whole year, mainly learning from them, so you can gather what kind of knowledge it got in the end! (How many curse words!) They were humiliating it constantly. Which is part perhaps of the uncanny valley, because when you think about it, why are they being so nasty to chatbots? Maybe they’re nasty because the chatbot is just a chatbot, or maybe they’re nasty because they’re insecure—is there a human inside that thing, what’s going on with that?

Or even to physical robots. There was a study in Japan where they put a robot in a mall and let children go up to it and see what the kids would do, and they ended up kicking it and punching it and calling it names.

With kids—I have a 6-year-old—it’s a jungle. They are on that level where nature is still strong and culture is not so strong. When you create a very open system that is going to learn from you, what do you want it to learn? My students always talk to that chatbot and they’re so hateful.

Maybe it’s cathartic for them. Maybe it’s like therapy.

Maybe it’s therapy related to the fact that you’re processing these uncanny-valley feelings. So you’re angry and you’re not sure what it is that you’re interacting with. I feel that these weird relationships with chatbots that are assistants, and they’re super polite, and people are just throwing garbage at them is just a weird situation, as if they were some lower-level humans.

These chatbots can take different forms, right? So it might be just text-based, or it might come with a digital avatar.

We found that the chatbot that also had an avatar was very annoying to people. It gave in most cases the same response as the text one, but the differences were huge. In the case of the text chatbot, the participants found it very competent to talk to about various topics. Then there was another group that had to interact with one that had the face and gaze. In terms of the affective response, it was very negative. People were constantly stressed out. The group that was talking to the text-based chatbot usually had twice as long conversations.

What about how your chatbot behaved? How was it as a good conversationalist?

Whenever you had a conversation the chatbot would try to mirror what the other person was saying. For instance, if you said you hated sports, and the conversation was long enough, the chatbot would say, “I hate sports too.”

So it could be lying to you.

Of course. Constantly. It was also flipping a lot. So for instance, you had one interaction where it presented itself as Republican, and you had another interaction where it presented itself as a Democrat and a very progressive person. Hating sports and loving sports. Hating certain nationalities. It was interesting to see, but it was signaling certain potential dangers related to these interactions. When you think of yourself as a company, let’s say you build yourself a chatbot. You’re Nike, and then the chatbot says it hates sports. What would you do about that?

Or worse, it gets racist.

Which happens, actually. I think our chatbot was still very controllable in many ways and it was surprising to us to see how frequently it was flipping, because we did curate some of the content. It was presenting and then it diverged from that so easily through interactions with other people.

Beyond the semantics, when it comes to current robot voices, what is throwing people off specifically?

Even if it’s a short sentence, bots finish it in such a way as if it’s a long one. It’s so conclusive in a way, it sounds like you expect a long statement and then the sentence ends. So there’s a problem with understanding the tonality and context of what you’re saying. So linking the semantics with the tonaily, that’s the part that goes wrong.

What about the extra level of complexity when embodying that level of intelligence in a physical robot like Sophia, which most people know from her talk show circuits?

Maybe the problem is integrating it all together. We know that systems like that are very modular, in the sense that there’s a system responsible for moving the head and another one for the smiling, all these modules sometimes are poorly integrated in a way that never happens with humans, or at least very rarely. I think that’s the uncanny valley, the delays in the responses. It requires really big computational power. But I have no doubt that that’s the future. Maybe not for this company, maybe not with this particular case. Unless humanoid robots get abandoned altogether. That’s also an option. I think it’s possible.

Really? Why would you say that?

Because I think that if you have some sort of system that’s easily classifiable as a machine, but is still super smart and responsive, perhaps that’s enough. Why would you care? It could even be a box that leans forward or backwards, making those little gestures that indicate that it knows what kind of emotion that is. Perhaps people want something that looks like a vacuum cleaner and speaks to them, rather than have a Sophia, which is already so disturbing.

More Great WIRED Stories