Audiovisual perception of Swedish vowels with and without conflicting cues  

Niklas Öhrström & Hartmut Traunmüller

In the Proceedings of Fonetik 2004, the XVII Swedish Phonetics Conference, Stockholm, May 26-28, 2004: 40-43.


The Swedish nonsense syllables /gig/, /gyg/, /geg/ and /gøg/ produced by two men and two women were presented to male and female subjects for identification. Presentation occurred in auditory and in visual mode alone as well as in audiovisual mode, including beside the congruent one also the three incongruent audiovisual combinations of these stimuli, as produced by each speaker. The results showed most listeners (16 of 21) to perceive roundedness almost exclusively by eye rather than by ear, while a minority that included mainly male listeners with superior auditory but inferior visual speech perception (4 men and 1 woman) relied less on vision. This is in line with previous reports of women outperforming men in lip reading tasks. As distinct from roundedness, all listeners perceived openness almost exclusively by ear. Consequently, an auditory [e] paired with a visual [y] is typically perceived as an [ø]. This is analogous to the fusions observed in the perception of incongruent audiovisual stop consonants - the McGurk effect. The results also revealed that listeners notice visibly rounded lips as incompatible with the presence of an unrounded vowel, while the incompatibility of unrounded lips with the presence of a rounded vowel goes more often unnoticed.

View pdf-version of paper

