Speech

n    Blockage and release of vocal cords – pass through series of cavities changing in size and shape

n    Phoneme – shortest segment of speech that, if changed, changes meaning of word

q    English 13 vowels, 24 consonants

 

How produce Consonants

n    Manner of articulation – way in which airstream is blocked, how much blockage

q    Stop (complete), fricative (partial), nasal (nose)

n    Place of articulation – position of obstruction as air flows from lungs, where

q    Lips, dental, palate

n    Voicing – when vocal cords vibrate in rel to when sound emitted

q    Voiced (simultaneous), voiceless (small delay)

 

How produce vowels

n    Height of tongue – high (tree) to low (Bob)

q    Coded on first formant (F1) – 300 hz to 1000 hz, lower the pitch, closer to roof of mouth

n    Location of curve of tongue – front (tree) to back (root)

q    Coded on second formant (F2) – 850 hz to 2500 hz, higher the pitch, closer to front of mouth (also lip rounding decreases pitch)

Spectrogram

n    Visual picture of acoustic event

n    Pitch (Y), time (X), intensity (formants – usually 3, up to 5)

n    Steady states – vowels

n    Transitions – consonants – either rising (increase pitch) or falling (decrease pitch)

n    Vertical lines – pressure oscillations from vibration of vocal cord

Problems with speech

n    Segmentation problem – acoustic signal continuous but we perc separate signals

n    Variability problem – formants change even when same phoneme – each phoneme modified by surrounding phonemes (coarticulation), yet hear same phoneme in each condition

n    Variation from different speakers – high or low pitch, rate of speech, sloppy pronounciation

Is speech special?

n    Separate special purpose innate neural mech  - process and perc unlike other audio sounds

n    Perc mediated by production – motor theory – acoustic syllables are taken in chunks and decoded – what motor beh necessary to produce that sound?

n    Others believe speech not special – just well learned (see some of evid in animals and music)

 

McGurk Effect

n    Film – see person speaking

n    Audio – hear person speaking

q    Babaperc ba

q    Gagaperc ga

q    Gabaperc da (if eyes are open)

n    Influence of vision on speech perc – audiovisual speech perception – motor theory

Categorical perception

n    Voice onset time (VOT) – da and ta

n    D – 15 msec, t 90 msec

n    Set up continuum, vary in small steps from short to long

n    When play to part., report hearing either da or ta – even though large no. of stim w/ diff VOT presented

n    Phonetic boundary – around 30-50 msec, see shift in what perc.

q    Play 10 and 30 msec hear da both times, 60 and 80 msec hear ta both times, 30 and 50 – hear da at one and ta at other

Cat. Perc

n    Fact that all stim on same side of phonetic boundary are perc as same suggests speech is special (since would not hear this with other nonspeech sounds)

n    Can also get shift by fatiguing (play da over and over, shift so hear more ta sounds)

Duplex perception

n    Split speech stim so one part in one ear (F3 transition) and other part in other ear (base)

n    Hear chirp from transition and complete/combined acoustic signal in other ear (either da or ga)

n    Can even get if transition is played just below threshold

n    Specialized module is combining 2 signals in brain

Evidence against speech is special

n    McGurk effect – cellos plucking or bowing string

q    If see bowing but hear plucking, more likely to rate sound heard as a bow moving on strings

n    Categorical Perception – can get with different notes

n    Duplex Perception – can get with chords

q    Play part of chord in 1 ear (C,G) and other part in other ear (E or Eb) – hear complete chord

 

Top down processing in speech

n    Phonetic restoration – remove phoneme, people fill in the blank with what fits in context

n    Listeners perceive better if know will be hearing speech

n    Perceive better if phoneme appears in word and if words appear in phrase or sentence

n    Perceive better if know topic of conversation and sentence is meaningful

n    Perceive better if see lip movements (McGurk effect)

n    Indexical char – gender, age, where from, emotional state, sarcastic or serious