Throat cancer, stroke and paralysis can rob people’s voices and strip away their ability to speak. Now, researchers have developed a decoder that translates brain activity into a synthetic voice. The new technology is a significant step toward restoring lost speech.
“We want to create technologies that can reproduce speech directly from human brain activity,” Edward Chang, a neurosurgeon at the University of California San Francisco, who led the new research, said in a press briefing. “This study provides a proof of principle that this is possible.”
People who have lost the ability to speak currently rely on brain-computer interfaces or devices that track eye or head movements to communicate. The late physicist Stephen Hawking, for example, used his cheek muscle to control a cursor that would slowly spell out words.
These technologies move a cursor to spell out words letter by letter. Though these tools enable communication, they are slow, stringing together five to 10 words a minute. But people talk much faster — human speech clips along at 120 to 150 words per minute. Chang and colleagues wanted to create a device that could speed up communication.
The scientists’ solution is a speech decoder that uses patients’ brain activity to control a simulated version of their vocal tract, which includes the lips, tongue, jaw and voice box.
“The brain translates… thoughts of what you want to say into movement of the vocal tract and that’s what we’re trying to decode,” explained Chang.
The researchers placed electrodes directly on the surface of participants’ brains in areas that control movements of the vocal tract. The participants, who did not have communication disorders and were able to speak, then read several hundred simple sentences out loud. The electrodes recorded the participants’ brain activity as they spoke. Then the researchers used machine learning to decode brain activity that controls the movements of the vocal tract. They could then synthesize speech from these decoded movements, Chang and colleagues report today in the journal Nature.
The device decoded some sounds like “shh” in the word “ship” very well. But others, such as the “b” sound in “Bob,” need improvement. Still, listeners were able to transcribe about 70 percent of the synthesized speech correctly. Often, the mistaken words were similar in meaning to the sound of the original word, the researchers said, so that in many cases the gist of the sentence was still there.
The original sentence might have read, “Those thieves stole thirty jewels,” but the listener heard the synthetic voice say, “Thirty thieves stole thirty jewels,” for example.
To get a better grasp on how the technology might help people with communication disabilities, the researchers repeated the experiment. But this time, instead of speaking the sentences aloud, one participant silently mouthed the sentences. The decoder was able to produce speech from this participant’s brain activity even though the participant never produced any sound.
“It was really remarkable to find we still generate an audio signal from an act that did not generate audio at all,” Josh Chartier, a neuroscientist at UCSF who led the work with Chang, said in the briefing.
The researchers have not yet tested the decoder on patients with speech disabilities, so it’s unknown whether the same algorithms will work in a population that cannot speak.
“That may only really be able to be figured out through further steps in terms of clinical trial,” Chang said. But some outstanding questions, such as how the decoder might work in someone who is paralyzed, linger. For the researchers, the next step is to improve speech quality.
“We want to make the technology better. We’ve got to make it more natural, more intelligible,” Chang said. “This is really the first proof of principle that’s out there in this really rapidly developing field,” he added.