Speech
n
Blockage and
release of vocal cords – pass through series of cavities changing in size and
shape
n
Phoneme –
shortest segment of speech that, if changed, changes meaning of word
q
English 13
vowels, 24 consonants
How produce Consonants
n
Manner of articulation
– way in which airstream is blocked, how much
blockage
q
Stop (complete),
fricative (partial), nasal (nose)
n
Place of
articulation – position of obstruction as air flows from lungs, where
q
Lips, dental,
palate
n
Voicing – when
vocal cords vibrate in rel to when sound emitted
q
Voiced
(simultaneous), voiceless (small delay)
How produce vowels
n
Height of tongue
– high (tree) to low (Bob)
q
Coded on first
formant (F1) – 300 hz to
1000 hz, lower the pitch, closer to roof of mouth
n
Location of curve
of tongue – front (tree) to back (root)
q
Coded on second
formant (F2) – 850 hz to
2500 hz, higher the pitch, closer to front of mouth
(also lip rounding decreases pitch)
Spectrogram
n
Visual picture of
acoustic event
n
Pitch (Y), time
(X), intensity (formants – usually 3, up to 5)
n
Steady states –
vowels
n
Transitions –
consonants – either rising (increase pitch) or falling (decrease pitch)
n
Vertical lines –
pressure oscillations from vibration of vocal cord
Problems with speech
n
Segmentation
problem – acoustic signal continuous but we perc
separate signals
n
Variability
problem – formants change even when same phoneme – each phoneme modified by
surrounding phonemes (coarticulation), yet hear same
phoneme in each condition
n
Variation from
different speakers – high or low pitch, rate of speech, sloppy pronounciation
Is speech special?
n
Separate special
purpose innate neural mech - process and perc unlike other audio sounds
n
Perc mediated by production – motor theory – acoustic
syllables are taken in chunks and decoded – what motor beh
necessary to produce that sound?
n
Others believe
speech not special – just well learned (see some of evid
in animals and music)
McGurk Effect
n
Film – see person
speaking
n
Audio – hear
person speaking
q
Ba – ba – perc
ba
q
Ga – ga – perc
ga
q
Ga – ba – perc
da (if eyes are open)
n
Influence of
vision on speech perc – audiovisual speech perception
– motor theory
Categorical perception
n
Voice onset time
(VOT) – da and ta
n
D – 15 msec, t 90 msec
n
Set up continuum,
vary in small steps from short to long
n
When play to
part., report hearing either da or ta – even though large no. of stim
w/ diff VOT presented
n
Phonetic boundary
– around 30-50 msec, see shift in what perc.
q
Play 10 and 30 msec hear da both times, 60 and
80 msec hear ta both times,
30 and 50 – hear da at one and ta
at other
Cat. Perc
n
Fact that all stim on same side of phonetic boundary are perc as same suggests speech is special (since would not
hear this with other nonspeech sounds)
n
Can also get
shift by fatiguing (play da over and over, shift so
hear more ta sounds)
Duplex perception
n
Split speech stim so one part in one ear (F3 transition) and other part
in other ear (base)
n
Hear chirp from
transition and complete/combined acoustic signal in other ear (either da or ga)
n
Can even get if
transition is played just below threshold
n
Specialized
module is combining 2 signals in brain
Evidence against speech is
special
n
McGurk effect – cellos plucking or bowing string
q
If see bowing but
hear plucking, more likely to rate sound heard as a bow moving on strings
n
Categorical
Perception – can get with different notes
n
Duplex Perception
– can get with chords
q
Play part of
chord in 1 ear (C,G) and other part in other ear (E or
Eb) – hear complete chord
Top down processing in
speech
n
Phonetic
restoration – remove phoneme, people fill in the blank with what fits in
context
n
Listeners
perceive better if know will be hearing speech
n
Perceive better
if phoneme appears in word and if words appear in phrase or sentence
n
Perceive better
if know topic of conversation and sentence is meaningful
n
Perceive better
if see lip movements (McGurk effect)
n
Indexical char –
gender, age, where from, emotional state, sarcastic or serious