Perception of Estonian word prosody in whispered speech
In Nordic Prosody: Proceedings of the VIIIth Conference, Trondheim, August 19-21, 2000 Wim A. van Dommelen and Thorstein Fretheim (eds), Peter Lang GmbH, Frankfurt/M etc.: 153-164.
Of the three distinctive degrees of quantity in Estonian, the contrast between short (Q1) and long (Q2) is perceived with the help of temporal characteristics alone. To distinguish overlong (Q3) from long, however, a falling F0 contour is considered to be important.
This paper addresses the following question: can listeners distinguish Q2 and Q3 even when F0 is absent, as in whispered speech? If they do, the phonetic basis of the contrast in whispered speech must be explained.
A listening test with 13 male and 12 female students was carried out at the Institute of Cybernetics in Tallinn, Estonia. The stimuli consisted of words of the form CVCV (Q1), CVVCV (Q2), CVVVCV (Q3), CVCCV (Q2) and CVCCCV (Q3), built on the CVCV sequences /kasi/, /koli/ and /laki/. The resulting 3 word groups, each consisting of 5 semantically meaningful words, were read by one male (TL) and two female (KL, MR) Estonian speakers, first normally and then in whisper, each word 5 times, in random order within each word group. A CD with the stimuli was presented to the listeners through earphones.
The results showed only a marginal difference between normal and whispered speech in the listeners' recognition of Q3: 96.0% in whispered speech, 96.5% in normal speech, all words and speakers pooled. A lower recognition rate could be expected in whispered "vowel type" stimuli, i.e. words where the quantity distinction is carried by the vowel, but was not found.
Acoustic analyses (using Kay MultiSpeech 3700) of V1 of the vowel type stimuli showed that, in normal speech, there was a clear difference in the F0 contour for all three speakers; TL and MR had a falling F0 contour in Q3, flat or rising in Q2; KL had a flat F0 contour in Q3, rising in Q2. The SPL roughly followed the F0 contours.
In whispered speech, the differences in duration between different degrees of quantity were slightly enhanced in comparison with normal speech. In vowel type words, the mean SPL in V1 of the Q3 stimuli fell slightly, while the SPL curve of the vowel in Q2 words seemed to be a truncated version of Q3. Moreover, the noise of V1 gave an impression of becoming darker in Q3 and lighter in Q2. This impression was corroborated by using a high-pass filter (cut-off frequency at 1.76 kHz for the female speakers, 1.54 kHz for the male): the filtered vowel showed a strongly decreasing SPL in connection with Q3 and an increasing SPL with Q2. Higher SPL in this case indicated more high frequency energy and vice versa.
For consonant type words, there was no difference between whispered and normal speech when the consonant was voiceless e.g. in kassi; a voiced consonant like /l/ in kolli, on the other hand, behaved similarly to V1 in vowel type words.
The conclusion is that there is a difference between Q2 and Q3 in the frequency domain also in whispered speech. Instead of F0, it is characterized by lightening or darkening color of the whisper noise.