The popular YouTube channel Bad Lip Reading has given us another gem — a hilarious reading of the first Republican debate that’s gotten over 4.5 million hits in its first few days. There is possibly nothing funnier than Dr. Ben Carson playing with puzzles, Sen. Marco Rubio (R-FL) singing about bald tigers, and Sen. Rand Paul (R-KY) interjecting with angry cries of “genital warts!” (Unless it’s his previous clips, “Redneck Avengers: Tulsa Nights” and “Herman Cain: a BLR soundbite.”)
But in addition to being almost universally hysterical, Bad Lip Reading’s newest hit shows us some interesting things about how our brains work. ThinkProgress talked to Dr. Jenny Roche, a cognitive psychologist in the Speech Pathology and Audiology department at Kent State, about bad lip reading, communication, and humor.
How The Illusion Works
What we eventually recognize as words and sentences is actually a jumbled, noisy stream of sound. To make sense of it, our brains use whatever information available, including what we see, what we hear, and what we expect to be logical in a given situation. Verbal communication is something called multimodal — an aggregation of senses.
One way to demonstrate the multimodal nature of communication is with an illusion known as the McGurk Effect, which is clearly demonstrated in this video from the BBC. In it, man repeats “ba, ba, ba.” When paired with an image of a man clearly mouthing a “ba” sound, that’s what you hear. But when an image of the man clearly mouthing a “va” sound is shown instead, what you hear changes to “va, va, va.” The key is, the sound never changes — if you close your eyes, it goes back to “ba.” You’re switching back and forth from unimodal and multimodal interpretations — and in this case, the multimodal interpretation is tricking you.
In a natural environment, what you hear is usually what you see. Put another way — if you’re actually talking to a man who says “ba ba ba,” that sound is reflected on his lips; he physically can’t make a “va” image and a “ba” sound without the help of a camera.
“Auditory and visual speech information necessarily occur at an interactive level. Because when you’re talking to someone you usually see them — As we were evolving we didn’t have cell phones or telephones; the way that we processed language or speech information was typically face to face,” said Roche.
Aggregating the image and the sound gives your brain more information to go off of — since they usually match up, it makes evolutionary sense to use them both when they’re available. They can, however, be separated.
“The research does say when you integrate auditory and visual information together it boosts perception,” said Roche. For example, she said, if you’re hiking and someone you can’t see screams “BEAR!” you need to understand them even if you can’t see them. Likewise, if you’re in a loud party and your friend asks you to dance, you probably rely more on your vision, because the party sounds interfere with your hearing. But together, the two make for stronger signal.
Illusions typically take advantage of cases where natural logic breaks down. With Bad Lip Reading, what’s happening is your brain wants the visual and the auditory signals to match up, because that’s what we would normally predict and it wants to use all the information available. As with the McGurk effect, said Roche, “What is happening is kind of like this averaging of the auditory and visual information.” Yet in the case of the McGurk effect, the clear visual information influences how you interpret the auditory information, while in this case, the clear auditory information overrides the visual.
“What is happening with the visual signal is that the visual signal is not crisp enough or salient enough for us to recognize that the auditory information doesn’t actually fit,” said Roche. He’s not really saying what the candidates are saying — but the image matches the sounds close enough that our brains just go with it.
If you watch the video closely, the illusion falls apart and you can see it doesn’t always match up exactly.
When that happens, “there’s something kind of weird about it because it violates our expectations and things don’t go how they’re supposed to,” said Roche. This is also why watching badly dubbed foreign films is uncomfortable. They violate our predictions about how the world works — and our brains have a hard time reconciling with that. After a while of watching the film, you’ll notice the feeling subsides — you’ve stopped paying attention to the image, and are subconsciously relying more strongly on the sound.
How Does The Bad Lip Reader Do It?
This wouldn’t work with just any words — they have to match the visual signal enough to be plausible. If someone says “carrot,” and the dubbed word is “onomatopoeia” it’s going to be a very bad lip reading — and it’s not going to sync.
In an interview with the Washington Post in 2011, the anonymous figure behind Bad Lip Reading said that he started by trying to lip-read a video of a talk radio host mouthing words to himself.
“My brain kept coming up with completely random, strange interpretations. They were mainly random word combinations like “Bacon Hobbit” and “Moose potion, poke me” — things like that. So I grabbed my microphone and recorded these phrases into the computer, and when I played that back in sync with the video, it really looked like the guy was saying it,” he said.
How was he able to pull out random words that synced so well? According to Roche, when we hear a sound, we make predictions about what the mouth usually looks like to create that sound. So when Bad Lip Reader’s brain is pulling out words, it’s making predictions about the sounds that usually correspond to the image.
Roche said: “He’s really good at making predictions of where the mouth is moving as it corresponds to a number of different sounds. So, because of the way he is representing both auditory information and visual information and integrating them together, he’s able to find constructions that fit with the visual production.”
However, one of the reasons lip reading is so hard is so much of sound production occurs inside our mouths. One lip movement may correspond to a number of sounds, posing a serious challenge. Bad Lip Reader is actually a decently good lip reader — he’s finding really well-matching words, just the wrong ones.
Yet even despite the inherent ridiculousness of the sentences, the video has a sort of logic. When Mike Huckabee turns to Bret Baier and makes a comment about his “pretty gelled head,” the moderator replies “thanks, I’m getting it permed.” It’s hilariously weird for the context, but it’s still on the subject of hair. Likewise, when Megyn Kelly asks Chris Christie about his favorite snack, he replies with something about potatoes.
This is because of the way we pick which words we’re going to use next, said Roche. “It is based on priming…I would imagine that if he tries to deviate from the topic of that current speech segment it would be much harder to do.”
If the topic at the moment is hair, we’re likely to keep talking about hair — so we “activate” words related to hair and make them easier to produce. In that case, “permed” is actually a likely choice.
Why Is Bad Lip Reading So Funny?
According to Roche, the humor comes from the direct juxtaposition of what we would expect to be said in a political debate and what is dubbed in.
“We have a social register for how a Republican debate should go. We know what’s supposed to happen. So if you violate that register, that’s where it says oh, that’s indirect language, that’s irony, that’s sarcasm, that’s humor.”
“This guy who makes these videos, he is not only manipulating the auditory visual channel of speech, he’s doing it at so many levels. He’s doing it from low level up to high level,” she said.
So on one hand, the dubbing works because it fits with our expectations of vision and sound going together. On the other, it’s almost universally hilarious because it so directly violates our expectation of what should happen at a debate — though given the actual content of the debate, perhaps the lip-reading isn’t too far-fetched.