Tag Archives: phonology

A Word Taster’s Companion: Wow! Yay! Glides!

Today: the thirteenth installment of my how-to guide for word tasting, A Word Taster’s Companion.

Wow! Yay! Glides!

Glide. Come glide with me. You’ll get the hang of it. In fact, you already have the hang of it. You may never have been on a hang glider, but you have certainly glided smoothly on open air. If you’re flying a hang glider you may say “Wow! Yay!” But any time you say “Wow! Yay!” you’re gliding, no matter where you are and what you’re doing.

A glide is really a high and tight vowel sound serving as a consonant, the open air flowing smoothly but somehow making a consonant. In English, we have two glides: /j/ and /w/, the first sounds in yay and wow. You know (if you’ve been paying attention and have read “The vowel circle”) that the ay in yay and the ow in wow are diphthongs: vowel sounds that involve a movement. These ones in particular move to narrower vowels, [ɪ] and [ʊ]. But you can also hear, especially if you say “wow wow wow wow” and “yay yay yay yay,” or if you hold the opening sound (“wwwwwow” and “yyyyay”) that the opening sounds are pretty much the same as the final sounds of the diphthongs.

Glides illustrate even more clearly than liquids the fact that what is a consonant is often a matter of how it is used and thought of as much as of its characteristics. This is not true of all sounds; /a/ will never be a consonant, and /t/ will never be a vowel. But there is a grey area where consonants and vowels blur together, and the glides are in it (although I think glides sound more blue and yellow than grey).

This is not to say that the glides are absolutely identical with the vowels except for how they’re used. They may or may not be. Say “ye ye ye ye ye woo woo woo woo woo.” Notice how you can tell where the glide stops and the vowel starts. In these words, the glides have to be tighter than the vowels in order to be distinguished from them. Watch how you say ya and you and we and wa. See if they’re as tight.

But now say “ow ow ow ow a wa wa wa wa” and “ay ay ay ay a ya ya ya ya.” Watch how you say them. How closed are the glides? What else are you doing to make the distinction so it doesn’t just sound like “owowowow” and “ayayayayay”?

Glides are voiced. They don’t have to be. But we no longer have phonemic voiceless glides in English. We almost still do: if you want to distinguish which clearly from witch, you may devoice the /w/ – or just say a /h/ before it that spreads the devoicing onto the /w/, which is not quite the same thing. A similar effect can happen in words such as human and humour. Glides are also susceptible to the same devoicing caused by aspiration that affects liquids: try pure twit. Say that slowly, perhaps as if you’re describing someone with great disdain. Listen to the glides: /pjur twɪt/ – the aspiration from the /p/ and /t/ spreads onto the /j/ and /w/ and devoices them.

Glides can also be nasal or non-nasal (oral), just like the vowels they resemble – and, as with those vowels, this variation is allophonic but not phonemic in English. It spreads from a nearby nasal: compare mute (/mjut/) with beauty (/bjuti/). You may find it hard to hear the difference, but it’s there.

What do glides feel like to say? They’re sort of like the yo-yos of the mouth, perhaps in part because yo-yo uses them. The tongue (and, in /w/, the lips) swings in close and then pulls back, like an upside-down bungee jump. These are also things your mouth does while tasting – tasting wine, for instance. A wine taster will have a sip of wine and, holding it in the mouth, inhale on a [w] gesture to aerate it. Then, lips closed, the taster may make a series of [j] gestures ([jajajaja]) with the tongue to swish the taste in the mouth and get it into the nose. When held, glides can have a sense similar to that of nasals: they can express hesitation (“Yyyyyyyeah… wwwwwell…”) or enthusiasm (“Yyyyess! Wwwwwow!”)

Next: Huh. Is that all? Uh-uh.

A Word Taster’s Companion: Ah, frick it

Today: the eleventh installment of my how-to guide for word tasting, A Word Taster’s Companion.

Ah, frick it

Affricate. I do like this word, affricate, though it actually doesn’t contain the sound it names. “Affricate” is not “African” said with a cold and laryngitis, nor is it an expression of dismay or frustration (“I forget!” “Ah, frick it!”). Well, some affricates may be expressions of dismay – [ts] gets used for this at times – but it’s not essential to their nature. An affricate is a stop that releases to a fricative: a single gesture of the tongue, thought of by the speaker as a single sound, but made of two parts: the tongue moves, making a sort of breaking sound. It’s a consonant equivalent of a diphthong. Judge for yourself: Say “judge” and listen to the consonants in the word – is there more to them than in “dud” or “shush”

We don’t have a lot of affricates in English. If you look at the consonant list in “Sushi thief!” you’ll see a reason why: an affricate requires a stop and a fricative in the same place, and we don’t have that many pairs like that. Actually, we have even fewer than we could. Our only affricate phonemes in English are /tʃ/ and /dʒ/: “ch” and “j.”

We may occasionally say the available other stop-fricative combinations – [ts] and [dz] – and sometimes we may even say them so they’re not across syllable boundaries (as what’s up sometimes becomes ’tsup, for instance). But we don’t think of them as single sounds. In fact, many people will have a resistance to saying them where we can say /tʃ/ and /dʒ/, or will even think they can’t say them because we don’t start syllables with a stop followed by a fricative. Many English speakers have problems saying something like “tsump” and “dzump” – or tsar, or tsunami. But we have no problem saying “chump” and “jump,” or “char” (or “chunami,” if that were a word), even though they’re also a stop plus a fricative in a very similar place in the mouth. This is because we see them – and perform them – as one gesture. You’re saying char, not tshar. It’s the difference between courtship and core chip, for instance. To show in phonetic transcriptions that they’re a single phoneme, sometimes a joining line is written under the two letters. But that’s not supported by many character sets, so you don’t see it all the time.

We also say affricates as versions of stops. For instance, say choo-choo train. You may have noticed that you make the t is as the same sound as the ch. You’ll find the same thing, but voiced, in juju drain. In many places where [t] and [d] release with the tongue flexing towards the palate – nature, gradual, dread – the gesture results in affrication: as you release the stop you make a fricative on the way to the next sound. So our target phoneme is /t/ or /d/ and we have it in mind to say that sound and we hear it as a version of that sound, but it actually comes out as  [tʃ] and [dʒ].

But those aren’t quite the only affricates we have as allophones. Say cute. Now say it with emphasis, especially on the start – draw it out: Cute! Notice how the hump of your tongue is actually fairly far forward in your mouth when you say the [k]? And how air escapes past it as it releases to the vowel? Congratulations. You’ve just made an affricate that most Anglophones can hardly even conceive of existing – even though they make it: a voiceless palatal affricate. (The International Phonetic Alphabet way of writing it is [cç].)

It’s the further progress of that movement, by the way, that led Latin c, originally [k] in all positions, to become [tʃ] before [e] and [i], as it is in Italian and as one hears it in church music. It’s very easy to move [cç] forward just a little more to [tʃ]. (The process was a little different with [sk]: it dropped the stop as it softened up and it became [ʃ] without passing through [stʃ] – which is why excelsis is “ek-shell-cease” and not “ex-chell-cease,” and prosciutto is “pro-shoot-toe” and not “pros-choo-toe.”) That movement, from [k] to [tʃ], is also one way English came to have these affricates; cheap, for instance, is related to words and roots in other Germanic languages that start with [k] – German kauf, for instance.

It also goes in the other direction: the “y” sound as in yes and yellow – written as [j] in IPA – can be made so narrow that it touches the palate and makes an affricate. You can hear this in some dialects of Spanish: llave, [jave], has moved to [dʒave] in some South American versions, and the same accent can cause its speakers to pronounce English with the same effect: for instance, your sounding like jor. This same process is in fact a way that Latin words with j, which was really i in Latin, came to be said with [dʒ] in English.

What do affricates feel and sound like to say? [tʃ] can have a kind of mechanical or metallic crispness, which shows up in chug, cha-ching, and similar words. It sounds like bells, small change, machines… That effect is softened when you add voice, but there can still be a certain sturdiness, as for instance in Jack and jug. I’d say this also draws on the effect produced by a sense of jutting jaw and meeting teeth, which can be a movement you make when you say these sounds. On the other hand, the crispness of the release and the involvement of the most delicate of our stops, [t] and [d], can make these seem light and pretty in the right context, for instance Chelsea and Jennifer.

Consider the different sound effects between guy and chap, or coffee and java. Try swapping in affricates for stops, or vice versa: choffee? Gava, dava? Does it make it feel sturdier or more delicate, or something else entirely? One thing’s sure: that extra little break does add a little more richness to the flavour.

Next: Lovely, lyrical liquids

A Word Taster’s Companion: The vowel circle

Today: the fifth installment of my how-to guide for word tasting, A Word Taster’s Companion.

The vowel circle

Vowels are the blood of words. They’re what allow words to move, to project, to be sung.

As I’ve explained in “The world speaks in harmony,” what vowel you’re saying is determined by where your tongue constricts the airflow in your mouth. That can be anywhere in your mouth that allows air to pass through the middle. But, in practice, languages have typically between five and twelve sounds that are recognized as distinct vowel sounds, and as long as a sound is close enough to one of those, it will be interpreted as that sound. And the acceptable sounds – the phonemes – are, depending on the language, mostly or entirely in a somewhat circular arrangement around the mouth.

The single-sound vowel phonemes we have in English are these:

/u/ as in boot

/ʊ/ as in put

/o/ as in boat (actually a slight diphthong in most kinds of English – see below)

/ɔ/ as in bore

/ɑ/ as in bop

/a/ as in bar

/æ/ as in bat

/ɛ/ as in bet

/e/ as in bait (actually a slight diphthong in most kinds of English – see below)

/ɪ/ as in bit

/i/ as in beat

/ə/ as in but (when it’s said in a stressed syllable it’s a little different and is often written as /ʌ/) – our one vowel that’s right in the middle of the mouth

The letters in slashes like /e/ are the International Phonetic Alphabet symbols for the sounds. Slashes mean we’re talking about a phoneme – a sound that’s a recognized distinct sound in a language. When we’re talking about the actual sound that’s made, whether it’s the same as the phoneme or not, we use brackets, like [e].

Those single-sound vowels are called monophthongs by people who really want to or have to call them that. (Take a moment to taste that word, monophthong.) We also make a number of diphthongs – vowel sounds that move from one part of the mouth to another. They’re not two vowels, one said after another; a diphthong is a single phoneme, but it’s one that starts in one place and ends in another. You might call them vowel movements.

Here are diphthongs we make in standard Canadian English:

/ɔɪ/ as in boy

/aɪ/ as in by – Canadians often say it like [ʌɪ] before a voiceless consonant, as in bite

/eɪ/ as in bay (we tend to think of it as just /e/ – see above)

/aʊ/ as in how – Canadians often say it like [ʌʊ] before a voiceless consonant, as in bout

/ɪʊ/ as in hew (also said as /ju/ – j is the IPA symbol for the “y” sound)

/oʊ/ as in hoe (we tend to think of it as just /o/ – see above)

You’ll get some other diphthongs in some other dialects of English. Some even have triphthongs – a three-vowel movement, as in some southern US versions of words man: [aɪə]. But let’s not go crazy here just yet. You’re best off tasting words in your own dialect, so if that sound’s not in your dialect, let’s not worry too much about it now. (Oh, by the way: all versions of English are dialects, and everyone has an accent. Dialects are not just what other people speak, and accents are not just what other people have.)

OK, enough with the technical basics for a moment. Let’s do some more tasting. You already know, if you’re read “The world speaks in harmony,” that speech sounds are what they are because of harmonics. And you almost certainly know intuitively that some sounds seem higher or lighter and others seem lower or heavier. Those impressions have a lot to do with the second formant – the space in the mouth in front of the tongue. A sound like [o] or [u] tends more often to go with low, heavy, dark things; a sound like [i] goes more with high, light things. This doesn’t mean that all words with [o] and [u] must be for big things, et cetera, but if you’re using the sound for effect, that’s where you’re likely to head.

So… if I say I heard two things hit the floor and one went “plunk” and the other went “plink,” what do you assume about them?

If there are two characters in a children’s book and one is named Bobo and one is named Titi, what might your initial expectations be of them?

When you taste a word, you have to be aware of the vowels you’re using. But you also have to watch your impressions of the sound and feel and taste.

Let’s circle around your mouth with vowels. Start at [u] and move gradually and smoothly through [i], through [e], through [æ], through [a], through [o], to [u]. Then circle back in the other direction. Do it as smoothly as you can. Pay attention to what your tongue and your lips are doing.

Do you notice your lips rounding at [o] and [u] and unrounding as you go to the front? We do that in English. It’s a very normal contrast in languages the world over: round the back, unround the front. This heightens the contrast between the harmonics.

But it’s not a universal thing to round the back and unround the front. Many languages also have rounded front vowels and even unrounded back ones. (In fact, we have an unrounded low back vowel in English: /ɑ/.)

So now repeat the tongue circle exercise starting at [u], but this time keep your lips rounded as you move your tongue through the front vowels and back to [u]. Try both directions. It may help to pay more attention to what you’re doing and less to what you’re hearing. Unfocus, like when you’re watching fence posts go by on the highway and you go from counting them to watching them blur together.

Now start the loop at [i] and keep your lips unrounded all the way around, both directions.

Congratulations. You have, in the course of doing this, made several vowel sounds that never show up in English, including some that bedevil Anglophones trying to learn Turkish or Russian. You won’t need these sounds for tasting common English words, but the more you can do with your mouth, and the more you try to do with your mouth, the more fun you’re going to have. (I’m talking about language. Stop that.)

There are two other differences in vowel quality that you can make, neither of which makes a phonemic difference in English. One is what’s different between French beau and bon: whether the vowel is nasal or not – in other words, whether any air is passing through your nose while you’re saying it. In English, we do make some vowels nasal, but just when they’re before nasal consonants, as in some, sun, and sung. Sometimes the nasal consonant is dropped in casual speech and indicated just by the nasalization of the vowel, especially if there’s another consonant after the nasal – you might say [bõz] rather than [bonz] for bones, for instance.

The other difference is length. You can hold a vowel sound for a longer or shorter period of time. This is important in languages such as Finnish and Hindi. Contrary to what “everyone knows,” we don’t have an actual length distinction in English. We do not actually have long and short versions of vowels. We just have a distinction that we call long versus short. Read “The long and short of it,” next, for the low-down and dirty.

A Word Taster’s Companion: Horseshoes, hand grenades… and phonemes

Today: the fourth installment of my how-to guide for word tasting, A Word Taster’s Companion.

Horseshoes, hand grenades… and phonemes

They say close only counts in horseshoes and hand grenades (and nuclear warfare). Well, there’s somewhere else it counts: phonemes.

As I explained in “The world speaks in harmony,” phonemes are target sounds that we get variously close to. To put it another way, they’re the sounds we think we’re saying.

Say Yeah really slowly, moving your tongue down and lowering your jaw gradually and smoothly. You have just moved quite smoothly through sounds with no sharp border between them, but though you can hear that, you will probably have a sense more of fading from one distinct sound to another than of moving through sounds that are not quite one or the other. This is because you unlearned all those intermediate sounds when you were first learning English, and you learned targets – phonemes – that you’re matching what you hear and say to.

Different languages have different sets of phonemes, and may draw different boundaries between the same phonemes. Think of your mouth as a big lot of land divided by fences into smaller parts. Everyone has the same size and shape of lot, but different languages put the fences in different places. If you’re learning a different language, you have to learn new sound boundaries. For example, our vowels in beat and bit are fixed in our minds as two different sounds, but they register as the same phoneme to speakers of Spanish, Russian, and quite a few other languages. They don’t have the fence between those two sounds that we have.

The same goes with consonants. For instance, several South Asian languages have a distinction between aspirated and unaspirated voiceless stops. We make both kinds of sounds in English, but most of us don’t even notice – consciously – that we do. Put your hand just a short distance in front of your mouth. Say spit (don’t spit it, say it). Now say pit. Did you feel a puff of air on the p in pit? We aspirate /p/ when it’s the first consonant in a word but not when it’s the second – in other words, as linguists would write it, the phoneme /p/ is realized as the phones [p] and [ph] in different contexts. In Hindi and Thai, both versions of the sound are used in the same contexts and they’re considered as different as, for instance, b and p. On the other hand, in some languages, such as Spanish, /p/ is never aspirated – one of the factors that make a Spanish accent sound different from a standard Anglophone one.

Of course, there are different accents within a language, too. English has a large number of dialects, each with its own accent. Not everyone can learn to produce the accent of a different dialect, but most of us can get used to hearing the sounds done differently. Try saying (or imagining) the sentence “That’s a rather good bit of tea” in as many accents as you can imitate: east coast US, southern US, upper-crust British, working-class British, versions of Scottish and Irish, whatever else you want to try. Some sounds will vary quite a bit – compare them word by word. And yet somehow, because you know what the targets are in those accents for those phonemes in those contexts, you can understand it.

There are some snags, of course. If we hear rather in another accent there aren’t any other words it could be mistaken for – if a South African sounds like he’s saying “retha” we can mentally adjust the targets to fit it to the expected phonemes without wondering if he was saying something else. But when there are other things the word could sound like, confusion may ensue. A woman named Anne from Buffalo may risk having her name written down as Ian by someone from elsewhere hearing it over the phone. For that matter, if the sound is too different from what we expect, we may not recognize it even if there aren’t alternatives. One time when I was working in a bookstore a British bloke asked for the “hudda” section. At first I couldn’t at all understand what he wanted. He was looking for the horror section, as it turned out.

There is also the issue that we don’t all have exactly the same set of phonemes, even among English speakers. Get people from different places in Canada, the US, and England to say cot, caught, court, and you will find that most Canadians say the first two the same, most Brits (the r-dropping ones at least) say the last two the same, and many Americans say all three differently. Canadian English has merged the two vowel phonemes we hear in cot and caught. The Brits use the same vowel phoneme for caught as for court, and in court the r is dropped.

By the way, the vowel Canadians and Americans use in court is different from the one in cot, but most Canadians and many Americans may think of it as the same vowel – the same phoneme, in other words. The key is that that sound is only used before /r/, and the other one is never used before /r/. They’re in what’s called complementary distribution, which doesn’t mean they’re being handed out for free (though they are). Since they’re different sounds but are thought of as the same sounds, they’re what are called allophones of the same phoneme.

By now you should have a clear sense that phonemes often have different allophones that we may not realize are different. And yet somehow we maintain those differences. You can even have an allophone difference in one dialect that other dialects don’t have, and the speakers of the dialect with the difference may not notice that there’s a difference – and yet still maintain the difference.

For one example, most Canadians say the vowel in ice a little higher than the one in eyes, while few other English speakers do the same, and even though Canadians think of the sounds as the same and may not be consciously aware of the difference, it nonetheless persists. Many Canadians also say the vowel in out different from the one in loud. As with eyes/ice, it’s because the consonant after is voiceless in one case and voiced in another. (I’ll get to consonants soon enough, don’t worry.) But that out vowel that sounds the same as the loud vowel to Canadians trespasses on the territory of a different phoneme for Americans: the vowel in loot. This is why Canadians can say out and hear out while Americans hear the same thing and hear it as oot: for them, it’s on a different phoneme’s turf – it’s on the other side of the fence.

It gets even better, though: we actually make an at least slightly different sound each time we say a given phoneme, even in the same word repeated. Linguists draw diagrams showing the entire area in which a phoneme is made at different times by a speaker or by speakers of a specific dialect, with dots on them like holes on a dart board. But we are still able to match the sounds to what they’re intended to be. (This is helped by the fact that the fences aren’t really so much fences as fuzzy boundaries – what you hear a sound as is affected by what sound you expect to hear.)

It’s like having hand grenades going off in your mouth. They may not hit their targets right on, but they get close enough.

Next: The vowel circle

A Word Taster’s Companion: The world speaks in harmony

Today: the third installment of my how-to guide for word tasting, A Word Taster’s Companion.

The world speaks in harmony

It’s our ability to parse the flow of sound into separate sounds that makes language work. We have a conceptual understanding of the different sounds we make – ideal sounds, targets that we aim for and come variously close to when we actually speak. When the sounds are strung together, we still think of them as independent units. It’s like handwriting: the letters may flow together so you can’t say exactly where one ends and the next one starts, but you can see the different letters.

Now, when we hear someone talking, how do we know what different movements their mouth is making, what targets they’re shooting for? It’s all to do with the harmonics.

When you make a vocalization, your vocal cords are vibrating at a certain frequency – which, if you’re singing, is the note you’re singing – but they’re also echoing in your vocal tract at various frequencies that are multiples of the base frequency (two, three, four or more waves for every one of the base frequency). If you sing an A at 440 Hertz (vibrations per second), there are also echoes of that at, for instance, 880 Hertz and 1760 Hertz, among others.

Now, which harmonics sound louder and which sound quieter will be determined by the shape of the resonating space in your mouth. There’s a resonating space at the back of your mouth, from your larynx to the top of your tongue, and the higher your tongue is, the longer that space and the lower the frequency of the harmonics that stand out. There’s also a space between the front of your mouth and the closest point your tongue comes to your palate, and the smaller that space is, the higher the resonance. The stand-out harmonics those spaces engender are called formants: the one at the back is the first formant, and the one at the front is the second formant. (There are third and fourth formants that play smaller roles.)

Thus, [u] – “oo” as in “boot” – is heard as it is because it has lower harmonics coming out in both formants: the back of the tongue is high, making a big space between it and the larynx, and it’s also far back, making a big space between it and the front of the mouth. On the other hand, [æ] – “a” as in “cat” – is heard as it is because both formants are higher; the tongue is low and towards the front. And [i] – “ee” as in “beet” – has low resonances in the first set, and higher ones in the second set. The second set are always at least a little higher than the first, even when saying the low back vowel [a], as in “bother.”

We also recognize consonants this way. If they’re consonants that stop the flow of air, we recognize them by what the tongue is doing immediately before and after. If they let just a little air through, we also get the sound of the air as it hisses or buzzes. I’ll go into close-up details of the vowels and consonants in coming chapters.

So we hear these sounds, and we have a sense of where in the mouth they’re coming from, and we also have an idea of what sound could come next in any given word – by the time you’re a couple of sounds into a word, the possibilities are narrowed down quite a bit. We can also hear the effect of the tongue moving and changing the shape of the resonating space in the mouth. And we have learned a repertory of different sounds that we recognize as distinct speech sounds (I won’t say “letters”; those are what we write to represent the sounds). The actual sounds won’t always be exactly identical, but as long as they’re close enough to a target, an identifiable known speech sound, they will be identified as it, especially if the sounds around it lead us to expect it.

These target sounds – sounds that we recognize as separate speech sounds – are called phonemes. If you meet someone who speaks another language who can’t manage to differentiate “bit” from “beat,” that’s because their native language doesn’t have a distinction between those two vowel sounds, so they’re not used to making the distinction when speaking. They may even believe they can’t. They might have a heck of a hard time telling them apart when listening, too, because they both land close enough to the same target in the set of sounds they’re used to. It’s the same with English speakers hearing and making sounds from some other languages: we may not be able to tell apart sounds that, to the language’s native speakers, are obviously different. After all, learning language is also a process of unlearning: in order to have separate sounds, you not only have to treat similar sounds as completely different; you also have to forget that some sounds are different because you need to treat them as the same in order for your language to make sense.

Next: Horseshoes, hand grenades… and phonemes

The madder matter of t’s and d’s

One of the most common “have you ever noticed” things people like to make mention of in English pronunciation – especially North American English pronunciation – is how, in many words, such as matter and betting, “we say ‘t’ as ‘d’.”

I put that in quotes because that’s what people say.

It’s not really true.

Actually, we say them both as a third sound. It just happens that this third sound, to our ears, sounds more like [d] than like [t]. (By the way: I’m using the linguistic standard of putting a sound in brackets, [t], if it’s the sound we’re actually making, and between slashes, /t/, if it’s the sound we believe ourselves to be making whether or not we actually are making exactly that sound. So “hit it” will always be /hɪt ɪt/ but not always [hɪt ɪt].)

Here, I’ll prove that we don’t say it as [d]. Say the following, slowly and carefully, perhaps as though you’re speaking to someone who is hard of hearing:

I’m not kidding about the reckless betting.

No problem making /t/ and /d/ different there, right?

Now say it quickly, as quickly as you reasonably can, maybe two or three times in a row.

Those d’s and t’s seem to be pretty much the same sound now, right? All d’s, perhaps?

No, not all d’s. Say this slowly and carefully, perhaps as to someone who is having a hard time hearing you:

I’m not kidding about the reckless bedding.

Before, when you said “reckless betting” quickly, there was no problem with a hearer knowing you were talking about gambling. But when you say the [d] clearly, that’s out the window; you’re now talking about crazy quilts and sheets. You can’t say “bedding” clearly and be taken for saying “betting” under normal circumstances.

We tend to think that we’re saying it as [d] because most of us don’t have a letter to associate with what we are saying it as. But I’ll tell you what we’re really saying it as: a thing linguists call a tap. The tongue just taps the alveolar ridge without really stopping the airflow. We sometimes make a flap, which is when the tongue taps on the way past rather than bouncing off. A tap is like in “better” (said quickly and casually); a flap is like in “editing” (said quickly and casually). The International Phonetic Alphabet symbol for a tap or a flap is [ɾ].

Does that look like a partly-formed r? As well it might. Some speakers – particularly those with accents we might think of as “proper” British – will use it for /r/ in the middle of words, as in “very horrifying.” North Americans, who aren’t used to saying /r/ that way, often represent this as a d as in “veddy British.” But it’s not [d]. It’s [ɾ].

Here’s how sounds work in language: Every language has a set of sounds that are considered to be distinctive – swap in a different one and you have a different word (or a non-existent word). These distinctive sounds are called phonemes. Do not confuse these with the letters of the alphabet. For instance, c is a letter that can stand for the phoneme /k/ as in can, /s/ as in ice, or even /tʃ/ in some loan words such as ciao. On the other hand, /k/ is a phoneme that can be represented by c as in can, k as in kill, ck as in kick, ch as in school, q as in question, even que as in unique.

But a sound that is considered to be distinctive may have several different ways of being produced, depending on where it shows up. We just happen to hear them all as versions of the same sound and thus interpret them all as the same sound by habit without generally noticing that there is any difference. Take /t/, for instance. Say the following words:

ting sting matting mattress mat mitten

Each one has a different version of that /t/. Linguists call these different versions phones (as if that word didn’t have enough meanings already). The system of phones is phonetics, while the system of phonemes is phonemics. (Phonics is not a word linguists use.)

Put your hand in front of your mouth and say “ting sting.” You might feel an extra puff of breath on “ting.” If you say “pill spill” you will feel much more of a puff on “pill.” We put those puffs on voiceless stops (/k, t, p/) when they’re at the very beginning of a syllable – but not if there’s /s/ before them at the start of the syllable. Those puffs are called aspiration.

That’s two of the six different variations on /t/ – what linguists call allophones of /t/. I’m sure you can hear the different allophones in “matting” (with the tap) and “mattress” (with “mattress” the /tr/ together sound like “ch” plus “r”). Now how about “mat”? The difference with that one is that we don’t release /t/ when there isn’t another vowel or liquid after it – we just hold it closed. Usually we just close our throat (glottal stop) and sometimes we don’t even entirely touch the tongue to the roof of the mouth. If you have /n/ after it, as in “mitten,” just the nasal passage releases, unless you’re speaking carefully or formally.

All of these are thought of as /t/. All of them are heard as /t/. But they really do differ. In some languages some of them are treated as distinct sounds. You know how speakers of some languages can’t say “beat” and “bit” differently? That’s because those two vowels are allophones – different phone realizations – of the same phoneme in those languages. Well, we’re like that with things like the difference between aspirated and unaspirated stops.

Why do we do this? Economy of effort. A /t/ is a voiceless alveolar stop. We don’t always retain all those characteristics of voice (voiceless), place (alveolar), and manner (stop); we’ll stick with whichever is sufficient to make the sound recognizable while not having to make too much effort to say it, and sometimes we’ll add a little more distinction where needed. So at the start of a word, we add that puff of air to make it clearer that it’s not /d/. We don’t need to do that after /s/ because we never say /sd/ at the start of a word. In the middle of a word like matter, we just keep the place and a similar manner, but we don’t stick too closely to the voicelessness or the hard stop. At the end, as in “mat,” or before a nasal, as in “mitten,” we reduce it to a different stop (glottal) that takes less effort to say. That’s also what some people (notably some British people) do when they use a glottal stop between two vowels, as though “matter” were “ma’er” (or “ma’ah”). The quality of being a voiceless stop is enough; the other two voiceless stops (/k, p/) don’t reduce to a glottal stop in English.

So those are the allophones of /t/. What you need to know is that sometimes two different phonemes have, in some contexts, the same phone as an allophone. Most “short” vowels in English reduce to a neutral unstressed vowel [ə], for instance. The case in point today is [ɾ], which can be a version of /t/ or /d/ (or, in some kinds of English, /r/).

We think of voice as the difference between /t/ and /d/. But they’re stops – how do you voice a consonant when your air flow is stopped? You don’t, really. You know the difference between /t/ and /d/ mainly by how the sounds before and after behave. Say this:

mad mat

In “mad” your voice keeps going right up until you say the [d], but in “mat” you cut off a moment sooner. You also say the vowel a bit shorter.

Now say this:

The madder matter

The difference is very subtle, isn’t it? But you may say the [æ] before the /d/ a little longer than before the /t/, and you may cut the voice out just a little for the /t/ version. It’s not really enough to be sure about when you’re listening, but there may be that small effect of the sound you’re thinking about when you say it.

On the other hand, you might really say them both the same way.

It just happens that that way will not be with [d]. It will be with [ɾ].

cepstrum, quefrency, rahmonic

“By applying a low-pass lifter to the cepstrum in Figure 2 to extract the low quefrency components below the first rahmonic peak, the slowly varying curve (in red, upper graph) results.”

I read that to my wife and her eyes turned into a pair of shirred eggs. She was, for a time, speechless – a condition that, incidentally, the process described in the quotation would have been helpless in the face of.

Make no mistake: what Al Oppenheim and Ron Schafer are describing in their article (From frequency to quefrency: a history of the cepstrum, Signal Processing Magazine, IEEE (September 2004), 21 (5), 95–106) is freakin’ hard for most people to wrap their minds around. But while it might seem as dry as dust to you, that passage actually evinces a fundamental fact of true nerds: a sense of humour and playfulness.

There are four words in there that you need to look at: lifter, cepstrum, quefrency, and rahmonic. They are terms that apply to this specific mathematical process. The process itself is a little quirky, and applies to things that themselves require a bit of explanation to have real meaning – a bit more than I have space for here. But here’s a very short run-down – if your eyes start to glaze, skip to the paragraph that starts “So anyways.”

Sounds such as human speech are actually very complex, made up of a lot of different harmonic resonances on top of a basic sound frequency. It’s these resonances that allow people to discern the difference between different speech sounds: the position of your tongue in your mouth (among other things) changes the shape of resonating chambers and makes certain bunches of harmonics, called formants, stronger – you might say the formants are the informants of what speech sound you’re hearing.

When linguists – acoustic phoneticians in particular – and engineers and physicists analyze sound waves, they use a wonderful mathematical function called a Fourier transform to identify the different resonance frequencies in the sound waves, what is called the spectrum, a perfectly appropriate term since the spectrum of light is also the different frequencies. (Think about if someone were tapping 9 beats a second and someone else 12 beats a second and someone else 36 beats a second. If you graphed the sound waves, you would have something looking like :,..;..,:.,.:,..;..,:.,.:,.. and on and on. A Fourier transform would just show a graph plotting frequencies with one mark at 9 per second, one at 12 per second, and one at 36 per second.)

Well, if you treat the Fourier transform graph as though it were a graph of sound waves and perform a Fourier transform on it (it’s just slightly twickier than that, but that’s the general concept), you are performing a curious but useful inversion. You can identify how close together the harmonics are, and how close together the formants are; it tells you how frequent the strong frequencies are on the graph, so to speak. Believe me when I say this is useful, and not just in speech analysis: it makes cleaning up the sound on old recordings a lot easier, for instance – you can filter out unwanted resonances from the original sound-capture device.

So anyways, when you do this process, you get something that looks like a spectrum but is really a spectrum looked at the other way around, and you get what looks like frequency but is really frequency looked at the other way, and harmonics that aren’t actually harmonics, and you can apply filtering processes on the data that aren’t filters like the normal data filters are. You’re treating frequency as though it were time and time as though it were frequency.

So what do you do? You come up with new words for what you’re talking about. And if you’re a nerd, you may take this opportunity to be a little playful. (Businessmen would use wanton sesquipedalianisms and initialisms to try to sound impressive. Nerds don’t feel a need to try to sound impressive because they actually know what they’re talking about.)

That playfulness actually tells us some interesting things about language, too: not the way we perceive sounds (which is what the data that all this analyzes help us to understand), but the way we think of and group sounds and how we perceive the structure of words. You see, the guys who came up with this – Bogert, Healy, and Tukey, three engineers back in the early 1960s – wanted to signify the inversion by inverting the words. But you will notice they only inverted parts of the words, in order to maintain comprehension I suppose – in the process producing pseudomorphemes (I’ll explain, hold on) – and they did it in some particular ways:

spectrum –> cepstrum
frequency –> quefrency
filter –> lifter
harmonic –> rahmonic

In all of the words, they only inverted the first part of the word, thereby treating the front end of the word as the significant part and the remainder as a sort of tail (a common enough things for people to do – go to SoHo and ask JLo), and also treating them as separate bits of the word, like tweet plus ed in tweeted – meaning-bearing building-blocks called morphemes. Except that trum, ency, ter, and monic actually are not morphemes; they have no meanings of their own – they’re just phonological divisions.

And the way they inverted the first half is notable: in three of the four, they just reversed the letters in the first syllable, which in all cases also reversed the sounds (you should know from this that the original pronunciation of cepstrum is with a /k/ at the start). It was always the syllable, not any other division: not rtcepsum or nomrahic, which would be morphologically appropriate but phonologically and orthographically problematic. As usual, the sound patterns of the words guide how they’re treated – when you turn it around, it’s the sound you’re turning around (this is standard in most playful things we do with words, and it’s how we treat helicopter as heli plus copter rather than the original helico plus pter, and why we say a whole nother thing, and also why people asked to say my backwards will probably say “I’m” – reversing the phonemes – rather than like “yam” – reversing the actual sounds).

In the other word, that wasn’t possible – /rf/ and /wk/ aren’t acceptable syllable onsets. So the syllable onsets, /fr/ and /kw/, were simply swapped to make quefrency. The vowel sounds were not swapped: it’s just not comfortable in English to say /’kwε frin si/. But when you look at that word on paper, do you want to pronounce it with a “long e” on the first syllable? To me, thanks to other words starting with que, it looks first like the que is said like the one in question, making both vowels /ε/ and conforming the word to expected sound patterns rather than to the original sounds.

But at least quefrency looks like a swapping-up of frequency. When I first saw cepstrum, I didn’t see spectrum in it at all (obviously I wasn’t swirling and sniffing it at that point). It looked more like it was just some other Latin word I hadn’t seen, joining the long list of neuter nouns like rostrum and plectrum. And rahmonic, aside from making me think of Rahm Emanuel (and maybe rah-rah-rah), had a taste of rampike and mnemonic but took a moment to show its harmonic resonance. (Lifter happens to be an English word in its own right, and thereby carries unbidden resonances. Ironically.)

However, the resemblance of cepstrum to spectrum is not lost on those who are expecting to see spectrum. And the hazards of such wordplay showed up in an early publication by Oppenheim and Schafer on the topic – and make for a cautionary tale for editors and authors alike. I’ll quote directly from the same article I started with:

throughout the various stages of proofreading of this book, we constantly had to maintain vigilance to be certain that this “strange” term cepstrum wasn’t inadvertently “corrected” to what seemed to be more appropriate. . . . We breathed a sigh of relief when the last page proofs were returned to the publisher. When the first printing of the book appeared, it was clear that a particularly diligent proofreader at the publisher had caught the “error” at the last instant and cepstrum had been reversed to spectrum throughout.

Well, not entirely reversed – but run through a transformation aimed at making the strange look normal again. Ah, but too late – and sometimes you want to see things strangely.

Thanks to Colleen Kavanagh (@CanuckWordNerd) for drawing my attention to this whole sandbox of words.

the long and short of English vowels

A colleague mentioned an exercise she was editing in which adult ESL students are asked to sort words according to whether the vowel sound is short or long. She asked, “Where did this terminology come from, anyway? And is there any other way of effectively describing this sound-spelling relationship?” Continue reading