Perception of Quantity in Estonian

Diana Krull and Hartmut Traunmüller
Dept. of Linguistics, Stockholm University

This is a contribution to Fonetik-2000


An experiment is described in which the speech rate of a short preceding or following context was manipulated in addition to that of a vowel or a consonant that carried a quantity distinction. The results showed the durations of these segments and the speech rate of their left and right context to be crucial for quantity perception.


From investigations of Estonian speech (Lehiste 1960; Eek 1980a, b) it is known that the three-way quantity opposition, which shows itself in both vowels and consonants in accented positions within a rhythmic foot, is characterized by certain duration ratios between the accented first and the second syllable. Engstrand and Krull (1994) observed these ratios to be very stable in spontaneous speech. As for speech perception, the importance of the duration of the first, stressed syllable, has never been questioned. However, the views differ concerning the role of the second syllable, whose duration is inversely related with that of the first. According to one account, the duration ratio between the two syllables is considered important for the quantity distinction (e.g. Lehiste 1960, 1997, Eek 1980a, b). According to a different account (Hint 1998), it is the first syllable alone that determines quantity. Although an unstressed second syllable is necessary - quantity oppositions do not occur in monosyllables - it is said not to contribute to the quantity distinction.

A reanalysis of data obtained by Lehiste (1965-1968, 1997) in a perception experiment in which segment durations had been manipulated experimentally failed to produce strong evidence for either of these accounts. The second syllable did contribute to the quantity distinction, but not as much as implied by the first account, and the possible contribution of a syllable initial consonant remained unknown. According to an alternative view that we wish to consider, quantity distinctions are determined by the duration of a segment measured with a clock that runs in synchrony with the speech (Traunmüller, 1994). In this case, each segment should contribute to quantity perception as much as it contributes to the local speech rate, and a preceding context should influence perception in the same sense as a following.

In order to learn in which sense and how much peripheral segments do contribute to the perception of quantity in Estonian, we did an experiment in which the speech rate of a short preceding or following context was manipulated in addition to that of a vowel or a consonant that carried a quantity distinction.



The stimuli were obtained by manipulating the durations of selected sections of recordings of the words saagi [sa:ki] 'crop, catch' (gen.) and satu [sat:u] 'get into' (imp., 2.p.s.) produced by a female speaker, preceded by ja [ja] 'and', in isolation, and followed by ka [ka] 'also', with list reading intonation. These words had been chosen bearing in mind that [saki] and [sa::ki] as well as [satu] and [sat::u] also are common words.

The durations of the [a:] in [sa:ki] and of the [t:] in [sat:u] (occlusion + burst) were modified in steps of, nominally, a factor 2^n/8 with -9 < n < 9, and with 15 consecutive values of n used in each context. In order to obtain phase-clean joints for the [a:], a deviation within ± ½ pitch period from the nominal duration was tolerated. The durations of either those parts of the utterances that proceeded or that followed the vowels and consonants in focus were modified similarly, allowing n to take the values -2, 0, and +2.

Listeners and Procedure

The stimuli, which had been recorded on a CD, were presented through headphones to three groups of university students, 13 male and 12 female in all. There were 480 stimuli separated by 0.6 s of silence and arranged in 32 blocks with additional pauses of 1 s in between. Each stimulus was presented once in a block with successively increasing duration of the segment in focus. Those beginning with "ja" were also presented in reverse order.


The responses are shown in Figure 1 for one of the blocks of 15 stimuli in the two orders of presentation. For lack of space, we can not show all the results, but the segment durations at the quantity boundaries between Q1 and Q>1 and between Q3 and Q<3 in the utterances in which vowel duration was varied are all included in Table 1. In Figure 1, the boundaries are where the lines cross between Q1 and Q2 and between Q2 and Q3.

Based on the inverse of the cumulative normal distribution function, the effect of a small increase in duration was calculated for the segment in focus [a] in [saki] and [t] in [satu] as well as for its preceding and following context (head and tail) where this had been varied in duration. In Table 2, the effects of small increases in head and tail duration are expressed as a percentage of the effect of an equal increase in the duration of the segment in focus.

Figure 1. Responses to stimuli derived from "ja saagi" with unmodified durations of "ja s" and "gi", plotted against the duration (logarithmically scaled) of the "aa" for the two orders of stimulus presentation. Increasing duration: solid lines, decreasing duration: dashed lines.


If the perception of quantity distinctions were based on the duration ratios of the syllables involved, then a lengthening of an initial consonant should have the same effect as an equal lengthening of the vowel. We would have a value of +100 for the [s] of saagi and we would expect positive values everywhere in the two "head" columns of Table 2. It has been observed before that an initial consonant does not actually have this effect. Lehiste (1960) claimed initial consonants to lack any effect, although this is at variance with the claim that the duration of the syllable is relevant. Now we see that an initial consonant does have some effect, but it works in the direction opposite to the one expected on the basis of the syllable ratio hypothesis. The tails of the utterances with saagi work in the expected direction, although not as hard as expected on the basis of this hypothesis. According to Hint's hypothesis, we would expect the tails of these utterances to show no effect at all. When we try to apply these hypotheses to intervocalic consonants as in satu, we encounter difficulties since the syllable boundary can not be objectively located with any greater precision than "somewhere within the [t]".

If the perception of quantity distinctions is based on segment durations measured by a clock that runs in pace with the local speech rate, we should expect negative values everywhere in Table 2, and this is, without exception, what we observe. If we accept this hypothesis, the present data allow us to say something about the domain of speech rate perception.

Table 1. Quantity boundaries between Q1 and Q>1 and between Q3 and Q<3 of the vowel [a] observed with speech that had been manipulated in segment durations. Durations listed in ms. Upright: given durations; in italics: boundaries obtained by interpolation, in some cases for two orders of stimulus presentation: (1) successively increasing duration of the [a], (2) successively decreasing duration of the [a]. Rate structure: ">" increased, "=" unmodified, "<" decreased speech rate of the segments before or after "x", the segment in focus.

Utterance  Rate      [ja] [s]  [a]  [a] [a:] [a:]  [ki] [ka]
           structure           (1)  (2)  (1)  (2)           
ja saagi   > x >     151  119  137  140  215  248   259    0
ja saagi   = x =     180  142  153  158  261  290   308    0
ja saagi   < x <     214  169  177  184  305  326   366    0

ja saagi   > x =     151  119  149  154  239  264   308    0
ja saagi   = x =     180  142  153  158  261  290   308    0
ja saagi   < x =     214  169  162  167  256  331   308    0

saagi      > x =       0  130  135       215        275    0
saagi      = x =       0  155  134       227        275    0
saagi      < x =       0  184  145       239        275    0

saagi ka   = x >       0  153  124       204        193  345
saagi ka   = x =       0  153  141       226        230  410
saagi ka   = x <       0  153  148       257        274  488

Table 2. The effect of a small uniform increase in the duration of the whole preceding context (the "head") and/or the whole following context (the "tail"), expressed as a percentage of the effect of the same increase in the duration of the segment in focus ("aa" or "t").

Utterance      Q1          Q3    
           Head  Tail  Head  Tail
ja saagi    -25   -38   -63   -77
saagi       -24         -61      
saagi ka          -10         -24

ja satu     -25   -56   -41  -107
satu        -39         -33      
satu ka           -12         -31

Previous descriptions of Estonian suggest that the domain of speech rate perception might comprise exactly a rhythmic foot. In our data, we can see that it is slightly larger, since the introductory ja had a sizable net effect, in addition to that of the initial s, in most cases, with the possible exception of the Q1 boundary in ja satu, and the final ka appears to have some effect on the Q3-boundary in utterances with satu. This can be seen by comparing the effects obtained with and without ja and ka.

Although we have to reject the hypotheses that ascribe decisive perceptual importance to the duration ratios between the first and the second syllable of a rhythmic foot, the present results do not give us any reason to question the existence of such invariances. Such invariant ratios will result if speakers keep their speech rate constant within a rhythmic foot. However, this regularity appears to result primarily from requirements of speech production rather than perception.

The boundary shifts obtained for uniform changes in rate (Table 1) were somewhat smaller than proportional to the change in rate, except for the Q3-boundary obtained with successively increasing segment duration in ja saagi and ja satu, where they were larger.

As for the effect of presentation order, we obtained a mean shift of 5.0 ms of the Q1 boundary for [a] and of 6.2 ms for [t], while the effect on the Q3 boundary was substantially larger, 36.6 and 37.8 ms, respectively. This shift can be understood as due to adaptation or contrast, but we do not have a ready explanation for the greater susceptibility of the Q3 boundary to any contextual influences. Vowels in Q3 are typically, but far from always, produced with a falling intonation, which is known to contribute to quantity perception (Lehiste, 1970–75, 1997). However, the absence of this additional cue cannot be responsible, since a similar discrepancy emerged with [t:] as well.


This research has been supported, in part, by grant F0558/1998 from HSFR. Our special thanks are due to Einar Meister for his help with the arrangement of the listening sessions at the Institute of Cybernetics, Tallinn Technical University.


Eek A. 1980. Estonian quantity: notes on the perception of duration. Estonian Papers in Phonetics, 5-29. Academy of Sciences of the Estonian S.S.R.

Eek A. 1980. Further information on the perception of Estonian quantity. Estonian Papers in Phonetics, 31-56. Academy of Sciences of the Estonian S.S.R.

Engstrand O. and Krull D. 1994. Durational correlates of quantity in Swedish, Finnish and Estonian: Cross language evidence for a theory of adaptive dispersion. Phonetica 51, 80-91.

Hint M. 1998. Why syllabic quantity? Why not the foot? Linguistica Uralica 34, 172-177.

Lehiste I. 1960. Segmental and syllabic quantity in Estonian. American Studies in Uralic Linguistics, vol. 1. Bloomington: Indiana University.

Lehiste I. 1970-75. Experiments with synthetic speech concerning quantity in Estonian. In Hallap V. (ed.) Congressus Tertius Internationalis Fenno-Ugristarum, Tallinnae habitus 17.-23. VIII 1970. Pars I: Acta Linguistica, 254- 69. Tallinn: Valgus.

Lehiste I. 1997. Search for phonetic correlates in Estonian prosody. In Lehiste I. and Ross J. (eds.) Estonian Prosody: Papers from a Symposium, 11-33. Tallinn: Inst. of Estonian Language.

Traunmüller H. 1994. Conventional, biological, and environmental factors in speech communication: A modulation theory. Phonetica 51: 170-183.

Diana Krull | Hartmut Traunmüller | Phonetics Lab | Dept. of Linguistics | Stockholm University
On the Web 2000-04-05