Tag Archives: speech perception

Sharpening and vowel shifts


Look at these two pictures. They’re the same photo, of course. Do you detect a slight difference? Does the second one seem somehow… sharper than the first?

It’s had some sharpening applied to it. Not a massive amount, but enough to make a difference. It’s something that I often do after resizing photos, since sharpness is often lost in the process. And it’s something that a lot of digital cameras do automatically to their JPEGs so they’ll look, well, sharper.

How does it work? Here are close-ups (500% magnification) of details from the two. What do you see?


I’ll tell you what you see: increased contrast, especially at edges – that is to say, places where there is already some contrast. It’s not that every last dark is darker and every last light is lighter; it’s that near the places where dark and light, or two different colours, come together, the difference is increased slightly.

If you oversharpen a photo, it can looks pretty frickin’ bad. Like someone wearing really excessive lipliner, heavy eyeliner with heavy highlighter right next to it…

It’s just gone too far. But you know, when it works, it works for the same reason that lipliner and eyeliner work: our eyes (and brains) love not just contrast but edges.

Look at kids’ drawings (or the average adult’s, for that matter). If they draw someone in solid clothing on a solid background, do they just make two fields of colour? Or do they draw outlines (and sometimes just lines)? (Answer: the latter, natch.)

When the light comes into our eyes, and when our eyes send it to the brain, what we’re seeing is just colour next to colour. But we look for edges. We even fill i edges in places where we don’t actually see them. Part of that is coloured by real-world experience – we can identify a figure even when the contrast within the figure is greater than the contrast at the edges because we have expectations regarding the shape of the figure. But part of it is just that we are made to find edges and we like contrast. Clarity. It’s well adapted. It makes it easier to deal with the real world. We see what we see, but we think of it how we think of it.

This also applies to sounds. We hear a continuous flow of sound, but we are able to parse it into separate phonemes when we know the language. We also perceive different sounds as being the same if they fit into the same expected phoneme – and we can hear the same sound as different it is presenting different phonemes (for instance, many people will say both vowels in kitchen the same but hearers will still perceive them as different). I talk about this phenomenon – categorical perception – in “Nothing to chauffeur a classiomatic” and “oot & aboot.”

It also plays a role in another phonological process, one that happens not in the instance of production and reception but over time over large areas: vowel shifts.

Vowel shifts are when some of the vowels (anywhere from one to all, but usually a certain set in a mutualle affecting way) in a language, or at least one dialect of a language, come to be pronounced differently from how they had been before. Many languages have undergone vowel shifts, and they are still taking place – a thing called the Northern Cities Shift has been going on in northeastern US cities for several decades, resulting in Buffalonians sounding to Torontonians as though they’re saying “Ian has gan to the affice” when they’re saying “Ann has gone to the office.”

The causes of vowel shifts are much argued over and certainly not exceedingly clear to anyone. Some people even argue that what we think of as shifts are often not shifts but mergers and similar other movements. I’m not going to hazard as guess as to why shifts happen. But there is one thing that vowels in shifts often – not always, but with a certain frequency – tend to do: diphthongize. They become a movement from one vowel sound to another.

Some examples: A sound like the a in father may become like the a in fate. A sound like o in toll may become like the o a in to all. A sound like the oo in loot may become like the ou in lout. A sound like the e in ell may become like the ye in yell. A sound like the i in machine may become like the i in mine. A sound like the a in bat may become like the i in bite.

Not all of these happen in the same language – some are not too likely to happen together in the same language, in fact. Not all of these are found in English. But what they all have in common is that they heighten the contrast. They use a glide (“w”, “y”) or contrasting vowel sound to make the original sound stand out more, and they may also move the original sound farther in the other direction from the glide. A high and tight sound (“ee”, “oo”) may get a leap into it from a lower, more open sound (becoming “ay”, “ow”). It may happen the opposite way: a glide opens into the sound (“et” becomes “yet”). Or the sound releases out (“toll” to “to all”). Or it becomes two sounds on opposite sides of the original (“bat” to “bite”).

In a way it’s similar to what we do to some consonants when we emphasize them: add an “uh” after them, or at least a strong puff of air. Think of the Barbara Woodhouse style of dog training: “Sit-tuh!”

These are certainly not the only kinds of vowel shifts. Sometimes a vowel simply moves in one direction or another. In English, as I discuss in “An appreciation of English: A language in motion,” [a:] moved to [eɪ], [e:] moved to [i:], and [i:] moved to [aɪ], while [o:] moved to [u:] and [u:] moved to [aʊ]. The vowels at the top, not being able to move farther in the same direction as the others, added a contrast element to make them stand out. They emphasized their position at the top by the addition of a contrast from the bottom. The others just moved, maybe adding just a little bit of diphthongization.

It can go the other way, too. Sometimes a diphthong is even smoothed out into a single sound. Think of how southern Americans often say I: “Ah” – something that had become a diphthing has stoppped being one, but by deletion of exactly that part that was the original sound. There are always two opposing forces operating: ease of saying and clarity of hearing. The contrast effect wins out when there is need for a greater distinction of the vowel. Other vowels may have come to have sounds that are a bit too similar, for instance, so this vowel takes on a bit of sharpening. It’s sort of like a backswing that allows you to deliver a stronger blow. In golf, I mean, of course.

I won’t go into whether similar effects can also be discerned in other sensory input. But I have suddenly developed a strange craving for salty caramel…

Nothing to chauffeur a classiomatic

One of my favourite records (now CDs) of all time is Duran Duran’s Rio. I’ve listened to it countless times, and almost all of those times on speakers, not headphones, until recently, when I started listening to music at work in the afternoon to keep from getting drowsy.

Towards the end of the last track, “The Chauffeur,” there’s some speech and other sounds. The speech is in a resonant male voice with a somewhat toasty British accent. For years I really didn’t know what the voice was saying. You can’t tell that well over speakers, especially with the pan pipes, synthesizer and especially drums going all at the same time. I amused myself imagining the most audible bit was “It’s Maury Niska-Nagay, and Maury’s… covered in shit.” I knew, of course, that that certainly wasn’t it, though there were sounds of that general order.

But recently, listening to it on headphones, I thought, “No, really, what is that dude saying?” Continue reading