Micro-prosody mastery

At C1 you learned that stress moves with information structure: the nuclear tone falls on the new element, not the last content word by reflex. At C2 the work goes a layer deeper. Native speakers do not merely place stress correctly; they shape every syllable around the tonic — pre-tonic syllables tilt upward in micro-steps, post-tonic syllables decay along a precise curve, and the entire phrase is timed against an internal metronome that you have to install in your own speech. Acoustic-phonetic studies (Beckman, Pierrehumbert, Ladd) show that listeners decode meaning from millisecond-level pitch and duration cues that L2 speakers consistently miss.

A C1 Russian speaker can place the nuclear stress on the right word and still sound non-native, because the shape of the contour around the stress is wrong: pre-tonic syllables are flat instead of slightly rising, the tonic itself peaks late instead of on the stressed vowel, and the tail decays in a step rather than a smooth descent. This lesson is about the micro-features — the differences that separate “excellent foreigner” from “American”.

The model in your head should be: every prosodic phrase is a small wave with a clearly marked head (first stressed syllable), body (everything from head to tonic), nucleus (the tonic syllable carrying the main pitch movement), and tail (everything after). At C2 you control all four.

Micro-prosody fundamentals (C1)

1. The intonation phrase as the unit of timing

Native English speech is parsed into intonation phrases (IPs) of roughly 1.5 to 3 seconds — a length tied to the breath group and the limits of working memory. Each IP contains exactly one nuclear (tonic) accent. Boundaries between IPs are signaled by lengthening of the final syllable (pre-boundary lengthening) plus a small pause and pitch reset.

Honestly, ‖ when she said that, ‖ I just couldn’t believe it. (three IPs)
The problem isn’t the cost ‖ — it’s the timeline. (two IPs, contrastive)
Look, ‖ at the end of the day, ‖ what really matters ‖ is the result. (four IPs)

The Russian L1 default is to push longer chunks into a single contour with no pre-boundary lengthening. Result: the sentence sounds rushed and one-piece, and listeners lose the argument structure. Train your ear by listening to NPR hosts — they parse heavily, often into three- and four-IP sentences.

IP-counting drill

Practice parsing the following sentences into IPs. Native speakers will typically render them in the number of IPs indicated; you should match this rendering.

Look, the bottom line is, we just don’t have the budget for it this quarter. (4 IPs)
And then she said, completely out of nowhere, that she was moving to Portland. (3 IPs)
I mean, technically, you could argue that, but honestly, it’s a stretch. (4-5 IPs)
Between you and me, ‖ I think he’s going to quit, ‖ probably by Friday. (3 IPs)

The key marker of a fluent C2 speaker is dense parsing — more, shorter IPs than a C1 speaker would produce for the same content.

2. Pre-tonic shaping: the gentle climb

In the body of the IP, pre-tonic syllables are not flat. They form a low-rising staircase toward the nucleus, with each pitch accent slightly higher than the last (or slightly lower, in a declining “hat” pattern). This is called the prenuclear contour.

I just / cannot / believe / what she / SAID to me. — each chunk a small rising step, nucleus on SAID, then drop.

Native speakers fluctuate pitch by 30-60 Hz across pre-tonic syllables. Russian speakers tend to flatten the body to under 15 Hz of movement, then “spike” the nuclear syllable — producing an unmistakably non-native stair-and-cliff shape.

Practice with the stepping head: start mid-low, step up on each stressed syllable through the body, peak on the nucleus.

Stepping-head drill

Read the following sentences with measured pitch steps through the pre-tonic syllables. Each underlined stressed syllable should be slightly higher than the previous:

I just don’t think we should TRY it. (steps on just, don’t think, should; peak on TRY)
She has never once complained about ANYthing. (steps on never, once, complained; peak on ANYthing)
We can probably figure this out together TOday. (steps on can, probably, figure, together; peak on TOday)
They wouldn’t possibly know how to fix the PROblem. (steps on wouldn’t, possibly, know, fix; peak on PROblem)

If your speech is recorded and you see a flat pre-tonic body, focus shadowing on this specific feature. NPR hosts (Robert Siegel archives, Audie Cornish) are excellent models for stepping-head training.

3. Tonic placement: late vs early peak

The autosegmental-metrical literature (Pierrehumbert, Ladd) describes both late-peak accents (L+H*, L*+H — F0 maximum at or after the stressed vowel) and early-peak accents (H+!H* — F0 maximum before the stressed vowel followed by a fall) in English. AmE uses both, with the choice cued by accent type and information structure: many AmE nuclear accents on focused new information take late-peak L+H* contours, while many declarative-final fall accents take early-peak H+!H* contours. The relevant target for non-natives producing emphatic new-information focus is the late peak — F0 maximum riding on the stressed vowel or just after, not before. Russian L1 often produces a sharper, earlier peak on the consonantal onset, which sounds clipped to American ears in focused contexts.

I want the ROUND one. — the F0 maximum should land on /aʊ/, not on the /r/.
That was AMAZING. — peak on /eɪ/, sustained through /z/, decay into /ɪŋ/.
She’s REALLY good. — peak on /iː/ of REALLY, held through the second syllable.
I had no IDEA. — peak on /iː/ of IDEA, late and held.

In IPA terms: lengthen the stressed vowel and let the pitch peak ride on the vowel nucleus. The duration of the stressed vowel in AmE is roughly 1.5 to 2.0 times that of the unstressed vowels around it; in Russian-accented English it tends to be 1.1 to 1.3 times — too short.

Late-peak production drill

Place a hand on your throat (over the larynx) and produce each word. You should feel the pitch rise into the stressed vowel and peak as you sustain the vowel:

Amazing — peak on /eɪ/, held 250 ms minimum.
Incredible — peak on /ɛ/ of CRED, held with sustained pitch.
Phenomenal — peak on /ɑ/ of NOM, with broad pitch span.
Unbelievable — peak on /iː/ of LIEV, with the longest hold of any vowel in the word.
Astonishing — peak on /ɑ/ of TON, with the slowest tempo.

If the peak arrives before the vowel begins, the sound is clipped. If it arrives in the middle and decays smoothly, the sound is American.

4. Post-tonic decay: the long tail

After the nucleus, English does not simply drop to silence. It produces a controlled declination — a smooth, audible decay across the tail syllables, often ending with a slight final lengthening on the last syllable of the IP.

I told her about the PROject yesterday. — nucleus PROject, then “yesterday” decays smoothly with terminal lengthening on -day.
He’s the one who CALLED me last night. — nucleus CALLED, then “me last night” forms the tail.
That’s the BOOK I was telling you about. — nucleus BOOK, then a five-syllable tail decaying smoothly to about.
I just got OFF the phone with her. — nucleus OFF, then the tail decays through the phone with her.

Russian L1 either drops abruptly after the nucleus (loses the tail entirely) or maintains a flat post-tonic plateau (no decay at all). Both betray non-nativeness immediately.

Tail-decay drill

Read each sentence with strong nuclear stress on the capitalized word and a smooth controlled decay through the remainder. The pitch should fall gradually, not step-drop or stay flat.

I just FINished the report this morning.
She really LOVED the dinner you cooked for her.
We were just TALKing about that yesterday.
He’s never going to ADmit that he was wrong.
I haven’t SEEN her in years and years.

The tail in each case is 3-7 syllables long and must decay smoothly. Record yourself and listen specifically for whether the post-nuclear material is dropped, flat, or properly decaying.

5. Deaccenting given information

A defining feature of AmE prosody at C2 level is radical deaccenting of given information. Once a referent is established, native speakers strip all pitch movement from it — it becomes prosodically invisible.

A: Did you see the new Wes Anderson film?
B: I SAW the Wes Anderson film. I didn’t LIKE it.

In B’s response, “Wes Anderson film” is fully deaccented — flat low pitch, often with the entire phrase compressed in duration. Russian L1 tends to keep some accent on these phrases out of politeness or because Russian prosody is less aggressive about deaccenting. Result: the contrast between SAW and LIKE is muddled.

Deaccenting drill

Practice the following exchanges with full deaccenting of repeated material. The capitalized word receives nuclear stress; everything else should be low and flat.

A: Have you read the latest Pinker book? B: I STARTED the latest Pinker book.
A: Did Jenny call about the meeting? B: Jenny EMAILED about the meeting.
A: Was the conference in San Diego? B: The conference was in San FRANCISCO.
A: Did you finish the quarterly report? B: I SUBMITTED the quarterly report.
A: Are they hiring for the marketing role? B: They FROZE the marketing role.

In each case, the entire post-stress noun phrase is essentially whispered — same pitch, same low volume, compressed timing.

6. Phrase-final lengthening and the boundary cue

The final syllable of an intonation phrase is lengthened by 30-50% relative to a non-final occurrence of the same syllable. This pre-boundary lengthening is the listener’s main cue that a phrase boundary is coming.

When she finished talking, ‖ everyone stood up. — “talking” carries the lengthening on /-ɪŋ/; the listener uses this duration cue to parse the comma.
I was wondering, ‖ now that you mention it, ‖ whether we should reconsider. — lengthening on wondering, mention it, reconsider.

Without pre-boundary lengthening, even grammatically clear sentences become hard to parse. Russian speakers often produce uniform syllable duration throughout, which forces American listeners to do extra parsing work — a recognizable “foreign cadence” cue.

Pre-boundary lengthening drill

Produce each sentence with deliberate 30-50% lengthening on the marked syllables. The slow-then-pause sequence should feel almost exaggerated initially; over time it normalizes.

I told her the truuuth, ‖ but she didn’t beLIEVE me.
He’s not just stuuubborn, ‖ he’s COMpletely irrational.
We tried everyTHIIIING, ‖ and nothing worked.

If your speech reads as “fluent but slightly off,” lack of pre-boundary lengthening is the most common single cause.

7. Micro-pauses: the held silence

Native speakers insert sub-300-millisecond micro-pauses at clause boundaries and before emphatic items. These are not “thinking pauses” — they are structural. They signal: “what comes next is heavy; brace for it.”

And then ‖ she just ‖ walked out. — micro-pause before “walked out” lifts the verb to dramatic prominence.
I told him ‖ in no uncertain terms ‖ that the deal was off. — two micro-pauses bracketing the parenthetical.
He’s not — ‖ how should I put this — ‖ the most reliable person. — micro-pauses bracketing the self-edit.

C1 Russian speakers often eliminate micro-pauses (push through the IP without break) or replace them with filled hesitations (uh, um). At C2, the goal is the held silence: a clean, brief gap with no filler.

Held-silence drill

Record yourself producing the following with measured 200-400 ms held silences at the marked positions. Listen back and confirm the silences are truly silent — no breath, no filler, no lip-smack.

Here’s the thing. ‖ I can’t help you. ‖ Not because I don’t want to. ‖ Because I can’t. (three full silences)
And then she said — ‖ and I’ll never forget this — ‖ “I never wanted any of it.” (two silences bracketing the parenthetical)
The answer, ‖ in short, ‖ is no. (two short silences around the parenthetical)
We tried everything. ‖ Everything. ‖ And nothing worked. (silence as repetition device)

8. The AmE-specific cadence: looser than RP, tighter than spontaneous

Compared to British RP, AmE has:

Less aggressive pitch range in the head (RP can swing 100+ Hz; AmE typically 60-80 Hz).
Longer stressed vowels (1.7x ratio in AmE vs 1.4x in RP, roughly).
More uptalk (high rising terminal /↗/) on declaratives, especially in West Coast and younger speakers.
Flatter post-tonic decay — the tail is less steeply falling than RP.

Listen for the AmE cadence in: 60 Minutes interviews, Terry Gross on Fresh Air, Anderson Cooper on CNN, presidential addresses, and Atul Gawande’s New Yorker pieces read aloud. Avoid using BBC newsreader prosody as a model — it produces a recognizably British shape that AmE listeners flag as foreign.

9. Voice quality and the micro-timbre dimension

Beyond pitch and duration, native AmE speech uses voice quality (phonation type) as a meaning-bearing dimension. Three modes are common:

Modal voice — the default, balanced phonation. Used for neutral assertions.
Breathy voice — air leaks through partially adducted vocal folds. Used for intimacy, vulnerability, sincerity.
Creaky voice (vocal fry) — slow, irregular vibration of the folds. Used at phrase ends, for casual or laid-back register, and in some younger speakers as a phrase-final default.

Creaky voice in particular is widely adopted by younger AmE speakers (especially women on West Coast podcasts) and is no longer a marked feature. It has become part of the default phrase-final cue in many varieties. A C2 speaker should recognize all three and produce at least modal and a controlled creaky.

Russian L1 speakers tend to produce uniform modal voice throughout, with no voice-quality variation. Adding breathy voice on intimate or emotionally weighted phrases, and creaky voice at phrase ends, is a fast path to AmE-native sound.

10. Sentence-medial accent: the secondary peak

In longer IPs, AmE allows multiple pitch accents before the nuclear accent. A typical pattern in a 3-second IP:

Pre-head: low.
Head: first stressed syllable, mid-rise.
Body: stair-step or fall-rise across additional stressed syllables.
Pre-nuclear: stepping up toward the nucleus.
Nucleus: peak with late alignment.
Tail: smooth decay.

Each stressed syllable in the body may receive a small pitch movement (a secondary accent). The nucleus carries the largest movement; the secondaries are smaller but present. Russian L1 production tends to flatten all body accents and reserve all pitch movement for the nucleus, producing the characteristic “flat-then-spike” shape.

Secondary-accent drill

Produce the following with small pitch movements on each stressed syllable and the main movement on the capitalized word. Each underlined syllable should receive a small rise; the nucleus a larger rise-fall.

I really thought she would CALL me back. (small rises on really, thought, would; peak on CALL)
They told us not to worry about the COST. (small rises on told, worry; peak on COST)
He’s been working on this project for MONTHS. (small rises on working, project; peak on MONTHS)

11. Shadowing methodology for prosodic acquisition

The single most effective method for installing AmE micro-prosody is structured shadowing — repeating audio in tight synchrony with the original, paying attention to timing rather than meaning.

Protocol

Select a 30-second clip from a native AmE speaker (NPR host, podcaster, audiobook narrator). Recommended: any opening of Fresh Air with Terry Gross, any 30 seconds of The Daily, audiobook samples by John le Carré read by an American narrator.
Listen to the clip three times without speaking. Map the IP boundaries mentally.
Listen and shadow at the same time, lagging by roughly 200 ms. Match the timing exactly — do not produce the words from your own rhythm.
Record yourself and play back side-by-side with the original. Identify mismatches.
Re-shadow with attention to the specific mismatched feature (peak placement, terminal lengthening, deaccenting).
Repeat with the same clip 5-10 times across a week. Then move to a new clip.

After 8-12 weeks of regular shadowing, micro-prosodic features begin to install in your spontaneous speech. The improvement is not in vocabulary or grammar but in the dimensions that no textbook explicitly teaches.

12. Useful AmE prosody resources

For sustained input training, the following sources reliably reward attention to micro-prosody:

NPR programming — Fresh Air (Terry Gross), All Things Considered, Morning Edition. Hosts trained to model AmE prosody for general audiences.
The New York Times audio — The Daily (Michael Barbaro), The Ezra Klein Show, Hard Fork. Distinct cadences worth comparing.
Audiobooks read by American narrators — Atul Gawande reading his own essays, Tara Westover reading Educated, Michelle Obama reading Becoming.
Standup comedy specials — John Mulaney’s New in Town and Kid Gorgeous, Mike Birbiglia’s Thank God for Jokes. Comedians are masters of micro-timing.
Interview podcasts — WTF with Marc Maron, Conan O’Brien Needs a Friend, The Tim Ferriss Show. Conversational AmE at native speed.
Audio narratives — This American Life, Radiolab, Snap Judgment. Storytelling masters with composed cadences.

Spend at least 30 minutes per week of focused listening (not background) with shadowing on a 30-second extract. This is the practice that converts a fluent C1 speaker into a near-native C2 speaker on the prosodic dimension.

13. The Russian L1 prosodic signature

Even at C2, Russian-trained speakers leak the following micro-features:

Flat pre-tonic body instead of stepping head.
Early peak on the nucleus (peak on consonant onset, not vowel).
Truncated tail — the post-tonic syllables are dropped to near-silence too fast.
No pre-boundary lengthening — uniform syllable duration through the IP.
Sentence-final rise that is too high, too late, sounding either patronizing or interrogative when it shouldn’t.
Compressed pitch range overall — under 50 Hz where AmE uses 70-90 Hz.

The cure is targeted shadowing of 30-second NPR or podcast clips with attention to body shape, peak placement, and tail decay — not just word-level stress.

Проверка знанийKnowledge check

An American friend and a C1 Russian speaker each say the same sentence: *I really thought she would call me back yesterday.* The American produces a stepping head through *really thought she would*, a late peak on the stressed vowel of CALL, a smooth decay across *me back yesterday* with terminal lengthening on *-day*. The Russian produces a flat body, a peak on /k/ of *call*, an abrupt drop after, and no terminal lengthening. Both speakers placed the nuclear stress on the same word. Why does the Russian still sound foreign, and what specifically does the listener decode from the American version that they cannot from the Russian one?

ОтветAnswer

The Russian speaker has correctly identified the nuclear placement (CALL) — the macro-level prosody is right. But native listeners parse meaning from micro-prosody: the stepping head signals 'I am building toward something; pay attention'; the late peak on the stressed vowel of CALL marks the word as the new, emotionally weighted information; the smooth decay through the tail confirms the IP is closing in an orderly way; and the terminal lengthening on *-day* tells the listener 'this phrase is complete; I am ready to yield the floor or continue.' Without these micro-cues, the American hears the right word stressed but cannot decode the speaker's stance (frustrated? disappointed? matter-of-fact?), cannot parse phrase boundaries cleanly, and cannot predict whether the speaker is about to continue. The result is the well-documented 'great English but I can tell they're not American' impression — the macro is right, the micro is wrong, and listeners decode personhood from the micro.

Common Russian-speaker mistakes

Wrong: Flat pre-tonic body, then nuclear spike. Right: Stepping head with 30-60 Hz of pitch movement before the nucleus. Why: AmE prosody is built on a gradual rise to the tonic, not a flat-line-then-cliff shape.
Wrong: Pitch peak on the consonantal onset of the stressed syllable (Cá-ll with peak on /k/). Right: Late peak on the vowel nucleus (Caaá-ll with peak riding /ɔ/). Why: AmE has a documented late-peak alignment that Russian L1 systematically anticipates.
Wrong: Truncated tail — pitch drops to near-zero immediately after the nucleus. Right: Smooth decay across all post-tonic syllables. Why: English uses declination as a phrase-closing cue; abrupt drop sounds clipped or angry to American ears.
Wrong: Uniform syllable duration throughout the IP, no pre-boundary lengthening. Right: Final syllable of each IP lengthened by 30-50%. Why: Pre-boundary lengthening is the listener’s main cue to phrase boundaries; without it, parsing is extra effortful.
Wrong: Keeping pitch accent on given information out of politeness. Right: Radically deaccent given referents — flat low pitch, compressed duration. Why: AmE relies on deaccenting to mark information status; leaving given referents accented obscures the new-information contrast.
Wrong: Filled hesitations (uh, um) at structural boundaries. Right: Clean micro-pauses of 150-300 ms before emphatic items. Why: At C2, fluency means being comfortable with silence; held silence is a prosodic device, filled hesitation breaks the cadence.
Wrong: Sentence-final rise (uptalk) used reflexively on all statements. Right: Uptalk used sparingly and intentionally for stance (uncertainty, list-not-complete, soliciting agreement). Why: Overused uptalk reads as adolescent or insecure in older AmE registers; used selectively, it is a powerful stance marker.

Summary

AmE speech is parsed into intonation phrases of 1.5-3 seconds, each with exactly one nuclear accent and clear boundary cues.
The stepping head produces gradual pitch rise through the body; the late peak places F0 maximum on the stressed vowel itself.
Post-tonic decay must be smooth, with terminal lengthening on the final syllable of each IP.
Given information is radically deaccented — flat low pitch — to make new information prosodically prominent.
Micro-pauses replace filled hesitations at structural boundaries; held silence is a C2-level prosodic device.
The Russian L1 signature shows in flat body, early peak, truncated tail, no pre-boundary lengthening, and compressed pitch range.

Next lesson: intonation for irony, sarcasm, and deadpan mastery — the production-level contours that make verbal irony register as irony and not as straight assertion.

Micro-prosody mastery

1. The intonation phrase as the unit of timing

IP-counting drill

2. Pre-tonic shaping: the gentle climb

Stepping-head drill

3. Tonic placement: late vs early peak

Late-peak production drill

4. Post-tonic decay: the long tail

Tail-decay drill

5. Deaccenting given information

Deaccenting drill

6. Phrase-final lengthening and the boundary cue

Pre-boundary lengthening drill

7. Micro-pauses: the held silence

Held-silence drill

8. The AmE-specific cadence: looser than RP, tighter than spontaneous

9. Voice quality and the micro-timbre dimension

10. Sentence-medial accent: the secondary peak

Secondary-accent drill

11. Shadowing methodology for prosodic acquisition

Protocol

12. Useful AmE prosody resources

13. The Russian L1 prosodic signature

Common Russian-speaker mistakes

Summary

Закончили урок?