While speech research has often been focused on aspects with a linguistic function, speech conveys necessarily also several kinds of paralinguistic information: expressive (attitudes and emotions) and organic (reflecting the speaker’s age, sex, etc.). In addition, there is perspectival information. There are no absolute acoustic or optic properties of speech that convey any one of these kinds of information invariantly. The interplay can be understood if speech is considered as voice modulated by speech gestures. This is formalized in the Modulation Theory, which is a new and comprehensive theory of speech. It requires listeners to “tune in” to a speech signal and to evaluate the deviations of its properties from those expected of a linguistically neutral vocalization with the same paralinguistic quality. Most organic and much expressive information is conveyed in the properties of the carrier, but expressive factors affect also amplitude and rate of linguistic modulations. The theory also describes the neural linkage between perceptual demodulation and speech motor control that is required for speech acquisition (an imitative behavior) and realized by echo neurons in the human brain. The imitation of bodily postures and gestures requires analogous structures evidenced in mirror neurons. It remains yet to gain a better understanding of variation in speaking rate and to incorporate audiovisual integration into this framework, which also needs to be made more widely known.
Hartmut Traunmüller "Speech considered as modulated voice" (Manuscript) Abstract | pdf-versionPhonetics | Department of Linguistics | Stockholm University | October 2006.