Tag Archives: phonetics

A Word Taster’s Companion: The world speaks in harmony

Today: the third installment of my how-to guide for word tasting, A Word Taster’s Companion.

The world speaks in harmony

It’s our ability to parse the flow of sound into separate sounds that makes language work. We have a conceptual understanding of the different sounds we make – ideal sounds, targets that we aim for and come variously close to when we actually speak. When the sounds are strung together, we still think of them as independent units. It’s like handwriting: the letters may flow together so you can’t say exactly where one ends and the next one starts, but you can see the different letters.

Now, when we hear someone talking, how do we know what different movements their mouth is making, what targets they’re shooting for? It’s all to do with the harmonics.

When you make a vocalization, your vocal cords are vibrating at a certain frequency – which, if you’re singing, is the note you’re singing – but they’re also echoing in your vocal tract at various frequencies that are multiples of the base frequency (two, three, four or more waves for every one of the base frequency). If you sing an A at 440 Hertz (vibrations per second), there are also echoes of that at, for instance, 880 Hertz and 1760 Hertz, among others.

Now, which harmonics sound louder and which sound quieter will be determined by the shape of the resonating space in your mouth. There’s a resonating space at the back of your mouth, from your larynx to the top of your tongue, and the higher your tongue is, the longer that space and the lower the frequency of the harmonics that stand out. There’s also a space between the front of your mouth and the closest point your tongue comes to your palate, and the smaller that space is, the higher the resonance. The stand-out harmonics those spaces engender are called formants: the one at the back is the first formant, and the one at the front is the second formant. (There are third and fourth formants that play smaller roles.)

Thus, [u] – “oo” as in “boot” – is heard as it is because it has lower harmonics coming out in both formants: the back of the tongue is high, making a big space between it and the larynx, and it’s also far back, making a big space between it and the front of the mouth. On the other hand, [æ] – “a” as in “cat” – is heard as it is because both formants are higher; the tongue is low and towards the front. And [i] – “ee” as in “beet” – has low resonances in the first set, and higher ones in the second set. The second set are always at least a little higher than the first, even when saying the low back vowel [a], as in “bother.”

We also recognize consonants this way. If they’re consonants that stop the flow of air, we recognize them by what the tongue is doing immediately before and after. If they let just a little air through, we also get the sound of the air as it hisses or buzzes. I’ll go into close-up details of the vowels and consonants in coming chapters.

So we hear these sounds, and we have a sense of where in the mouth they’re coming from, and we also have an idea of what sound could come next in any given word – by the time you’re a couple of sounds into a word, the possibilities are narrowed down quite a bit. We can also hear the effect of the tongue moving and changing the shape of the resonating space in the mouth. And we have learned a repertory of different sounds that we recognize as distinct speech sounds (I won’t say “letters”; those are what we write to represent the sounds). The actual sounds won’t always be exactly identical, but as long as they’re close enough to a target, an identifiable known speech sound, they will be identified as it, especially if the sounds around it lead us to expect it.

These target sounds – sounds that we recognize as separate speech sounds – are called phonemes. If you meet someone who speaks another language who can’t manage to differentiate “bit” from “beat,” that’s because their native language doesn’t have a distinction between those two vowel sounds, so they’re not used to making the distinction when speaking. They may even believe they can’t. They might have a heck of a hard time telling them apart when listening, too, because they both land close enough to the same target in the set of sounds they’re used to. It’s the same with English speakers hearing and making sounds from some other languages: we may not be able to tell apart sounds that, to the language’s native speakers, are obviously different. After all, learning language is also a process of unlearning: in order to have separate sounds, you not only have to treat similar sounds as completely different; you also have to forget that some sounds are different because you need to treat them as the same in order for your language to make sense.

Next: Horseshoes, hand grenades… and phonemes

A Word Taster’s Companion: What makes a word

Today: the second installment of my how-to guide for word tasting, A Word Taster’s Companion.

What makes a word

Let us start by looking at the parts of words. Take a word. In fact, let’s start with start. Here’s a simple question: what is this word, start, made of?

Did someone say five letters?

Oops.

No, words are not made of letters.

That’s right: one of the first things just about anyone knows about words is the first thing they’re going to have to unlearn.

Tell me, what did you do first, when you were a very small child: write or speak?

You almost certainly learned to speak a few years before you learned to write. You knew the sounds long before you knew the symbols used to represent them on paper.

But aren’t those sounds letters?

They sure aren’t. Letters came along to represent sounds many thousands of years after humans started speaking. And anyone who can write English knows that the same letter is often used to represent several different sounds – for instance, fat, make, above – and the same sound can be represented by different letters – hay, hey, weigh.

Words are made up of quite a few different things, actually – and we’ll get to them all by the time I’m done with you – but on the most basic level of expressive form, words are made up of sounds (unless you are deaf and speak sign language).

And those sounds are made by the physical movements of your vocal tract. (If you speak sign language, they’re made up of movements of your hands and other body parts.) So when you say a word, you feel it. And when you hear a word, you know what it feels like.

So feel it. Feel this word: “Start.” Say it.

What do you feel your tongue doing? First the tip is up near the front of your mouth, behind the teeth and ahead of the ridge (that ridge is called the alveolar ridge). It’s letting some air through, making a hissing noise. Your voice is not activated: you could only whisper, not sing, while saying [s].

Then your tongue closes off the airflow. For a moment no air gets out of your mouth, because your nose is closed too (by means of a flap at the back of your mouth). Then you release it, and the tongue drops down and sits flat on the bottom of your mouth, and your voice starts up: [a].

Then, if you’re among those who say the [r], the tongue humps up like a cat stretching. It makes a narrower passage between itself and the roof of your mouth (your palate).

Finally, the tip of the tongue touches again and blocks the airflow as the voice stops – but you may find that even before the tongue gets all the way there the airflow has stopped; many people will make this stop using the closing point in the throat, the glottis, which is what you use to stop the air when you swallow or hold your breath.

So there you have it. One continuous movement of the tongue, with the voice engaged just in the middle. A continuous flow of physical movement and a continuous flow of sound. But we hear it as five sounds, because we have learned to divide the sound stream we hear into those sounds.

Next: The world speaks in harmony

voice

I always enjoy the beginning of a word tasting course. All those new faces – some eager, some dragged there by their girlfriends or significant others – ready for what they hope will be an enjoyable experience but, at the very least, will leave them somehow more cultured.

Of course you get all sorts. People who have retired and now have the time to enjoy words. Young couples, the guy nearly always trying to impress the girl with what he knows about words, even if he seems only to know things that aren’t so (“Oh, no, no one likes adverbs anymore…” “There was a fashion for Scandinavian loans a while ago, but it’s East Asian ones that are pretty much where it’s at now…” “Yes, you see, you know it’s voiceless because it’s spelled with an s. A z means it’s voiced.”). Businessmen who get exposed to some pretty expensive words while out with clients but who never really get to appreciate them, now wanting to learn how to really enjoy them. Groups of women who couldn’t persuade their associated males to come – or didn’t want them to anyway.

I like to start them off with a few words right off the bat, just to have a sense of what level they’re all at and to give them a starting point. I’ll give them words they probably haven’t heard before and ask them to write down what the words make them think of, what they feel like to say. I insist that they write down the first ten things they think of. Of course the results often begin to sound a bit like a psychotherapy session.

Then, having made words a little strange, I give them some words they know quite well. I like the reactions to dog and then hound. Eyes begin to open: what different tastes, feelings, and images for words with pretty much the same objects. I used to use cat and pussy, but some of the responses were sometimes a bit much for some of those present to take graciously.

Then we dive into the exploration of the basic sound-generating organism of the body. I usually start with the voiced/voiceless distinction. This can sometimes be surprisingly unfamiliar. It can also be an occasion for some good partner work for those who have come with others, as in the case of one young lady who was in the class with her boyfriend and didn’t quite cotton to it.

“I don’t get what you mean,” she protested. “Every time I speak I’m using my voice.”

“Every time you speak you’re using your vocal tract, but your voice turns on and off.”

“If my voice was off you couldn’t hear me.”

“Say your name,” I said.

“Why?”

“Or anything. Just say a word.”

“Malcolm.” She made a sideways glance at her boyfriend.

“Now whisper it.”

She leaned up to him, cupped her hands around his ear, and whispered it into his ear. I think she licked his ear slightly, too, but her hands were in the way.

“Whisper it in this direction, loudly enough that I can hear it,” I suggested.

“Malcolm,” she obligingly whispered, reasonably loudly.

“OK, great. You whispered it. You weren’t using your voice, but I could hear it.”

“Of course I was using my voice! I was using my whispering voice!” she insisted.

“Which isn’t actually voice, because your vocal cords don’t vibrate.”

“Well, I know what my English teacher, Ms. Van Tilt, said. ‘Use your whispering voice.’”

I sighed. There are a lot of unfortunate things that get said in English classes.

“She should have been… more careful in her choice of words. If you have laryngitis, you lose your voice, right?”

“Well, yeah, but that’s just a figure of speech.”

“Actually, it’s the same use of the word voice. The technical use.” Technical usually seals it. And of course her boyfriend was forced to nod sagely. Guys always want to seem like they know something if it’s technical. “If your vocal cords are vibrating – your voice box – then a sound is voiced. If they’re not, it’s unvoiced. Put your hand on your neck and say missing slowly.” I demonstrated.

She tried. “Mmmiiiiisssssssiiiiinnnggg.”

“You feel how it’s not vibrating during the s, the ‘ssss’?” I turned to the rest of the class. “Everybody try this. Try a few words. Try some of the ones we started with.” They obliged. The air was filled with slowly echoing words, people speaking slowly with their hands on their throats – like a scene from some sci-fi movie (“Time… warp… losing… air…”).

The girl’s boyfriend, Malcolm, took this occasion to improve their partner work. “You can also feel it in the chest,” he said, putting his hand on her chest. She said “Shampoo.”

“Wait,” he said, “your shirt is damping the vibration.” He worked his hand underneath it.

“Thixotropic,” she said, and smiled. “Woo!”

I tried not to roll my eyes. “Yes, quite a lot of your body resonates with sound. That’s what helps produce the sound quality. You’ll feel it on the top of your head, too.”

Malcolm grabbed her butt. “Say it now.”

“Hey,” she said, smiling, and smacked his hand.

“It doesn’t usually make it all the way down there,” I said. “Unless you’re an opera singer.”

I moved on to the shape of the vocal tract. I showed the class the diagram of the mouth and started talking about the parts. I always encourage people to explore the insides of their mouths with their tongues.

I hadn’t really thought of this part as so much of an occasion for partner work.

But as I had the class making as many different variants of /l/ as I could, sweeping their tongues back and forth over their palates, I turned and saw Malcolm and the girl playing championship tonsil hockey.

“Now, I know that words are stimulating and can be romantic…” I said.

“Oh,” she said, pulling away, “sorry, we were just curious whether we could make sounds with each other’s tongues. Like, my tongue in his mouth. And vice versa.”

She was looking like an altogether more promising student than I had first anticipated. I glanced around the class. “Try it at home,” I said. “And report back.”

The madder matter of t’s and d’s

One of the most common “have you ever noticed” things people like to make mention of in English pronunciation – especially North American English pronunciation – is how, in many words, such as matter and betting, “we say ‘t’ as ‘d’.”

I put that in quotes because that’s what people say.

It’s not really true.

Actually, we say them both as a third sound. It just happens that this third sound, to our ears, sounds more like [d] than like [t]. (By the way: I’m using the linguistic standard of putting a sound in brackets, [t], if it’s the sound we’re actually making, and between slashes, /t/, if it’s the sound we believe ourselves to be making whether or not we actually are making exactly that sound. So “hit it” will always be /hɪt ɪt/ but not always [hɪt ɪt].)

Here, I’ll prove that we don’t say it as [d]. Say the following, slowly and carefully, perhaps as though you’re speaking to someone who is hard of hearing:

I’m not kidding about the reckless betting.

No problem making /t/ and /d/ different there, right?

Now say it quickly, as quickly as you reasonably can, maybe two or three times in a row.

Those d’s and t’s seem to be pretty much the same sound now, right? All d’s, perhaps?

No, not all d’s. Say this slowly and carefully, perhaps as to someone who is having a hard time hearing you:

I’m not kidding about the reckless bedding.

Before, when you said “reckless betting” quickly, there was no problem with a hearer knowing you were talking about gambling. But when you say the [d] clearly, that’s out the window; you’re now talking about crazy quilts and sheets. You can’t say “bedding” clearly and be taken for saying “betting” under normal circumstances.

We tend to think that we’re saying it as [d] because most of us don’t have a letter to associate with what we are saying it as. But I’ll tell you what we’re really saying it as: a thing linguists call a tap. The tongue just taps the alveolar ridge without really stopping the airflow. We sometimes make a flap, which is when the tongue taps on the way past rather than bouncing off. A tap is like in “better” (said quickly and casually); a flap is like in “editing” (said quickly and casually). The International Phonetic Alphabet symbol for a tap or a flap is [ɾ].

Does that look like a partly-formed r? As well it might. Some speakers – particularly those with accents we might think of as “proper” British – will use it for /r/ in the middle of words, as in “very horrifying.” North Americans, who aren’t used to saying /r/ that way, often represent this as a d as in “veddy British.” But it’s not [d]. It’s [ɾ].

Here’s how sounds work in language: Every language has a set of sounds that are considered to be distinctive – swap in a different one and you have a different word (or a non-existent word). These distinctive sounds are called phonemes. Do not confuse these with the letters of the alphabet. For instance, c is a letter that can stand for the phoneme /k/ as in can, /s/ as in ice, or even /tʃ/ in some loan words such as ciao. On the other hand, /k/ is a phoneme that can be represented by c as in can, k as in kill, ck as in kick, ch as in school, q as in question, even que as in unique.

But a sound that is considered to be distinctive may have several different ways of being produced, depending on where it shows up. We just happen to hear them all as versions of the same sound and thus interpret them all as the same sound by habit without generally noticing that there is any difference. Take /t/, for instance. Say the following words:

ting sting matting mattress mat mitten

Each one has a different version of that /t/. Linguists call these different versions phones (as if that word didn’t have enough meanings already). The system of phones is phonetics, while the system of phonemes is phonemics. (Phonics is not a word linguists use.)

Put your hand in front of your mouth and say “ting sting.” You might feel an extra puff of breath on “ting.” If you say “pill spill” you will feel much more of a puff on “pill.” We put those puffs on voiceless stops (/k, t, p/) when they’re at the very beginning of a syllable – but not if there’s /s/ before them at the start of the syllable. Those puffs are called aspiration.

That’s two of the six different variations on /t/ – what linguists call allophones of /t/. I’m sure you can hear the different allophones in “matting” (with the tap) and “mattress” (with “mattress” the /tr/ together sound like “ch” plus “r”). Now how about “mat”? The difference with that one is that we don’t release /t/ when there isn’t another vowel or liquid after it – we just hold it closed. Usually we just close our throat (glottal stop) and sometimes we don’t even entirely touch the tongue to the roof of the mouth. If you have /n/ after it, as in “mitten,” just the nasal passage releases, unless you’re speaking carefully or formally.

All of these are thought of as /t/. All of them are heard as /t/. But they really do differ. In some languages some of them are treated as distinct sounds. You know how speakers of some languages can’t say “beat” and “bit” differently? That’s because those two vowels are allophones – different phone realizations – of the same phoneme in those languages. Well, we’re like that with things like the difference between aspirated and unaspirated stops.

Why do we do this? Economy of effort. A /t/ is a voiceless alveolar stop. We don’t always retain all those characteristics of voice (voiceless), place (alveolar), and manner (stop); we’ll stick with whichever is sufficient to make the sound recognizable while not having to make too much effort to say it, and sometimes we’ll add a little more distinction where needed. So at the start of a word, we add that puff of air to make it clearer that it’s not /d/. We don’t need to do that after /s/ because we never say /sd/ at the start of a word. In the middle of a word like matter, we just keep the place and a similar manner, but we don’t stick too closely to the voicelessness or the hard stop. At the end, as in “mat,” or before a nasal, as in “mitten,” we reduce it to a different stop (glottal) that takes less effort to say. That’s also what some people (notably some British people) do when they use a glottal stop between two vowels, as though “matter” were “ma’er” (or “ma’ah”). The quality of being a voiceless stop is enough; the other two voiceless stops (/k, p/) don’t reduce to a glottal stop in English.

So those are the allophones of /t/. What you need to know is that sometimes two different phonemes have, in some contexts, the same phone as an allophone. Most “short” vowels in English reduce to a neutral unstressed vowel [ə], for instance. The case in point today is [ɾ], which can be a version of /t/ or /d/ (or, in some kinds of English, /r/).

We think of voice as the difference between /t/ and /d/. But they’re stops – how do you voice a consonant when your air flow is stopped? You don’t, really. You know the difference between /t/ and /d/ mainly by how the sounds before and after behave. Say this:

mad mat

In “mad” your voice keeps going right up until you say the [d], but in “mat” you cut off a moment sooner. You also say the vowel a bit shorter.

Now say this:

The madder matter

The difference is very subtle, isn’t it? But you may say the [æ] before the /d/ a little longer than before the /t/, and you may cut the voice out just a little for the /t/ version. It’s not really enough to be sure about when you’re listening, but there may be that small effect of the sound you’re thinking about when you say it.

On the other hand, you might really say them both the same way.

It just happens that that way will not be with [d]. It will be with [ɾ].

cepstrum, quefrency, rahmonic

“By applying a low-pass lifter to the cepstrum in Figure 2 to extract the low quefrency components below the first rahmonic peak, the slowly varying curve (in red, upper graph) results.”

I read that to my wife and her eyes turned into a pair of shirred eggs. She was, for a time, speechless – a condition that, incidentally, the process described in the quotation would have been helpless in the face of.

Make no mistake: what Al Oppenheim and Ron Schafer are describing in their article (From frequency to quefrency: a history of the cepstrum, Signal Processing Magazine, IEEE (September 2004), 21 (5), 95–106) is freakin’ hard for most people to wrap their minds around. But while it might seem as dry as dust to you, that passage actually evinces a fundamental fact of true nerds: a sense of humour and playfulness.

There are four words in there that you need to look at: lifter, cepstrum, quefrency, and rahmonic. They are terms that apply to this specific mathematical process. The process itself is a little quirky, and applies to things that themselves require a bit of explanation to have real meaning – a bit more than I have space for here. But here’s a very short run-down – if your eyes start to glaze, skip to the paragraph that starts “So anyways.”

Sounds such as human speech are actually very complex, made up of a lot of different harmonic resonances on top of a basic sound frequency. It’s these resonances that allow people to discern the difference between different speech sounds: the position of your tongue in your mouth (among other things) changes the shape of resonating chambers and makes certain bunches of harmonics, called formants, stronger – you might say the formants are the informants of what speech sound you’re hearing.

When linguists – acoustic phoneticians in particular – and engineers and physicists analyze sound waves, they use a wonderful mathematical function called a Fourier transform to identify the different resonance frequencies in the sound waves, what is called the spectrum, a perfectly appropriate term since the spectrum of light is also the different frequencies. (Think about if someone were tapping 9 beats a second and someone else 12 beats a second and someone else 36 beats a second. If you graphed the sound waves, you would have something looking like :,..;..,:.,.:,..;..,:.,.:,.. and on and on. A Fourier transform would just show a graph plotting frequencies with one mark at 9 per second, one at 12 per second, and one at 36 per second.)

Well, if you treat the Fourier transform graph as though it were a graph of sound waves and perform a Fourier transform on it (it’s just slightly twickier than that, but that’s the general concept), you are performing a curious but useful inversion. You can identify how close together the harmonics are, and how close together the formants are; it tells you how frequent the strong frequencies are on the graph, so to speak. Believe me when I say this is useful, and not just in speech analysis: it makes cleaning up the sound on old recordings a lot easier, for instance – you can filter out unwanted resonances from the original sound-capture device.

So anyways, when you do this process, you get something that looks like a spectrum but is really a spectrum looked at the other way around, and you get what looks like frequency but is really frequency looked at the other way, and harmonics that aren’t actually harmonics, and you can apply filtering processes on the data that aren’t filters like the normal data filters are. You’re treating frequency as though it were time and time as though it were frequency.

So what do you do? You come up with new words for what you’re talking about. And if you’re a nerd, you may take this opportunity to be a little playful. (Businessmen would use wanton sesquipedalianisms and initialisms to try to sound impressive. Nerds don’t feel a need to try to sound impressive because they actually know what they’re talking about.)

That playfulness actually tells us some interesting things about language, too: not the way we perceive sounds (which is what the data that all this analyzes help us to understand), but the way we think of and group sounds and how we perceive the structure of words. You see, the guys who came up with this – Bogert, Healy, and Tukey, three engineers back in the early 1960s – wanted to signify the inversion by inverting the words. But you will notice they only inverted parts of the words, in order to maintain comprehension I suppose – in the process producing pseudomorphemes (I’ll explain, hold on) – and they did it in some particular ways:

spectrum –> cepstrum
frequency –> quefrency
filter –> lifter
harmonic –> rahmonic

In all of the words, they only inverted the first part of the word, thereby treating the front end of the word as the significant part and the remainder as a sort of tail (a common enough things for people to do – go to SoHo and ask JLo), and also treating them as separate bits of the word, like tweet plus ed in tweeted – meaning-bearing building-blocks called morphemes. Except that trum, ency, ter, and monic actually are not morphemes; they have no meanings of their own – they’re just phonological divisions.

And the way they inverted the first half is notable: in three of the four, they just reversed the letters in the first syllable, which in all cases also reversed the sounds (you should know from this that the original pronunciation of cepstrum is with a /k/ at the start). It was always the syllable, not any other division: not rtcepsum or nomrahic, which would be morphologically appropriate but phonologically and orthographically problematic. As usual, the sound patterns of the words guide how they’re treated – when you turn it around, it’s the sound you’re turning around (this is standard in most playful things we do with words, and it’s how we treat helicopter as heli plus copter rather than the original helico plus pter, and why we say a whole nother thing, and also why people asked to say my backwards will probably say “I’m” – reversing the phonemes – rather than like “yam” – reversing the actual sounds).

In the other word, that wasn’t possible – /rf/ and /wk/ aren’t acceptable syllable onsets. So the syllable onsets, /fr/ and /kw/, were simply swapped to make quefrency. The vowel sounds were not swapped: it’s just not comfortable in English to say /’kwε frin si/. But when you look at that word on paper, do you want to pronounce it with a “long e” on the first syllable? To me, thanks to other words starting with que, it looks first like the que is said like the one in question, making both vowels /ε/ and conforming the word to expected sound patterns rather than to the original sounds.

But at least quefrency looks like a swapping-up of frequency. When I first saw cepstrum, I didn’t see spectrum in it at all (obviously I wasn’t swirling and sniffing it at that point). It looked more like it was just some other Latin word I hadn’t seen, joining the long list of neuter nouns like rostrum and plectrum. And rahmonic, aside from making me think of Rahm Emanuel (and maybe rah-rah-rah), had a taste of rampike and mnemonic but took a moment to show its harmonic resonance. (Lifter happens to be an English word in its own right, and thereby carries unbidden resonances. Ironically.)

However, the resemblance of cepstrum to spectrum is not lost on those who are expecting to see spectrum. And the hazards of such wordplay showed up in an early publication by Oppenheim and Schafer on the topic – and make for a cautionary tale for editors and authors alike. I’ll quote directly from the same article I started with:

throughout the various stages of proofreading of this book, we constantly had to maintain vigilance to be certain that this “strange” term cepstrum wasn’t inadvertently “corrected” to what seemed to be more appropriate. . . . We breathed a sigh of relief when the last page proofs were returned to the publisher. When the first printing of the book appeared, it was clear that a particularly diligent proofreader at the publisher had caught the “error” at the last instant and cepstrum had been reversed to spectrum throughout.

Well, not entirely reversed – but run through a transformation aimed at making the strange look normal again. Ah, but too late – and sometimes you want to see things strangely.

Thanks to Colleen Kavanagh (@CanuckWordNerd) for drawing my attention to this whole sandbox of words.

An Introduction to Sclgnqi: Pronunciation Guide

Nearly a decade ago, as an exercise in what my wife would undoubtedly call “geek humour,” I began writing an introduction to an invented language, Sclgnqi. I didn’t get very far, but I did complete the pronunciation guide. I dug it up to quote from for my word tasting note on sternutatory. Herewith I present it in entirety, for those whose sense of humour is as frankly odd and language-geeky as mine can be. It’s not polished or revised. So what. You paid how much to read this?

Before your have a klagnat’s hope of speaking the most beautiful, profound and logical language in the world, you must learn how to pronounce it. As you have been all your life speaking this flabby worm of a language English, this will take practice. You will never be able to walk down the street in Qhalgnna unless you practice the following sounds for three hours a day for at least two years: Continue reading