The role of F0 in vowel perception

Speech has three types of quality with different functions:

Linguistic phonetic quality - carries the conventional linguistic information - is distinctive among different speech sounds. Example: [i] as opposed to [e].

Expressive quality - carries paralinguistic information - serves basic and primitive communicative purposes. Example: Variation in vocal effort.

Organic quality - reflects the size of the organ of speech. Example: 7 as opposed to 12 years of age.

The appended auditory demonstrations illustrate the importance of F0 , the frequency of the voice fundamental, for the perception of these qualities in vowels [1, 2, 3, 4]. They are offered in the formats Apple 16-bit PCM (.AIF), CCITT 8-bit mu-law PCM (.AU), and Microsoft 4-bit ADPCM (.WAV), .

A natural vowel was analyzed by inverse filtering to obtain the wave form of the voice source. This was imitated in synthesis with a 4-parameter voice source model. Synthetic vowels were generated using cascade synthesis with eight formants. While the voice source parameters varied over the duration of the synthetic vowel, the formants (Fn ) were stationary with bandwidths Bn = 50 + 0.05 Fn . The demonstration consists of several series of such vowels.

The first series contains six vowels that differ only in the frequency position of the first formant (F1 ) that is successively moved upwards in steps of one critical band (1 Bark, about 100 Hz).

F1   __                    
F0   __  __  __  __  __  __        .AIF .AU .WAV  

These vowels are heard as successively more open ('lower'). Although the difference resides in F1 alone, while the higher formants are those of a female [i], the series extends over the whole range of front vowels. This shows that F1 overshadows the higher formants in their contribution to perceived openness or vowel height.

The second series differs from the first only in F0 . As F1 increases, F0 is also increased in steps of 1 Bark, so that the tonotopic distance between F1 and the average F0 of each vowel does not increase.

                    __  __
                __  __   
            __  __       
        __  __             
F1   __  __                
F0   __                            .AIF .AU .WAV  

For most listeners, the perceived degree of openness of these vowels increases only marginally - in any case clearly less than in the first series. instead, there is a successive change in expressive quality. The vowels are heard as produced with increasing vocal tension.

In the third series, all formants remain fixed while F0 is decreased successively in steps of 1 Bark.

F1   __  __  __  __  __  __
F0   __           
                        __       .AIF .AU .WAV  

These vowels are heard as successively more open and produced with successively less vocal tension. As for the perception of openness ('vowel height'), we have to conclude that the dominant cue to this dimension resides in the tonotopic position of F1 in relation to F0 .

The fourth series contains the first four vowels of the second series, presented as a background to the fifth series, in which not only F1 but all formants together are successively moved upwards in steps of 1 Bark while F0 is also moved upwards to the same extent.

Listen to series 4 and 5: .AIF .AU .WAV  

While the linguistic quality of the vowels in the fifth series is approximately the same as that in the fourth series, they show a change in organic quality as if the vowels were produced by successively smaller or younger speakers. The change in expressive quality heard in the fourth series (vocal tension) is only marginally present in the fifth.

The results evoke the hypothesis that the linguistic quality of vowels might be given by the tonotopic distances between the prominent peaks in their spectra, shaped by the formants and F0 . This is, however, not immediately compatible with all aspects of series 2. While in this series, the distance between F1 and F0 is always the same, that between F1 and the constant F2 decreases successively. The vowels should, then, be heard as more and more rounded, but listeners show only a slight tendency in this direction. The observations are, however, compatible with a modified version of this hypothesis, according to which the perceptual weight of between-peak distances decreases as a function of increasing distance [5, 6]. This perceptual weight is, then, much lower for front vowels than for back vowels [6] with their smaller distance between F1 and F2 .

The F0 -value to which listeners relate F1 is not necessarily represented by an actual peak in the spectrum [7]. The F0 -mark can be provided by a 'virtual peak' when the first partial is suppressed, and in the context of preceding speech, listeners appear to relate F1 to a 'base value' of F0 rather than to its instantaneous value [3, 4].)

Organic and expressive variations in F0 and in F1 as large as in these synthetic vowels have also been observed in natural speech [8].

The Modulation Theory of Speech provides a frame within which the phenomena demonstrated here can be understood.


[1] H. Traunmüller (1981) "Perceptual dimension of openness in vowels", J. Acoust. Soc. Am. 69: 1465 -1475, especially Exp.2 - 4, pp. 1469 - 1472.

[2] H. Traunmüller (1985) "The role of the fundamental and the higher formants in the perception of speaker size, vocal effort, and vowel openness" in B. Guerin, R. Carre (eds.) Actes du Symposium Franco-Suédois sur le Parole pp 209–219. (Also in PERILUS IV: 92-102.)

[3] H. Traunmüller (1991) "The context sensitivity of the perceptual interaction between F0 and F1", Actes du XIIème Congres international des Science Phonetiques, Aix-en-Provence, vol. 5, pp. 62 - 65. (html-version).

[4] H. Traunmüller (1994) "Conventional, biological and environmental factors in speech communication: A modulation theory", Phonetica 51: 170 - 183 (Abstract).

[5] H. Traunmüller (1984) "Articulatory and perceptual factors controlling the age- and sex-conditioned variability in formant frequencies of vowels", Speech Comm. 3: 49 - 61.

[6] R.P. Fahey, R.L. Diehl and H. Traunmüller (1996) "Perception of back vowels: Effects of varying F1 - F0 Bark distance", J. Acoust. Soc. Am. 99: 2350 - 2357 (Abstract).

[7] R.P. Fahey, and R.L. Diehl (1996). The missing fundamental in vowel height perception. Perc. & Psychophys. 58: 725 - 733.

[8] A. Klinkert and D. Maurer (1997) Fourier spectra and formant patterns of German vowels produced at F0 of 70 - 850 Hz J. Acoust. Soc. Am. 101: 3112 (A) (Summary).

The Modulation Theory | Hartmut Traunmüller | Phonetics Lab | Stockholm University
Last modified in October 1998, references completed later