Auditory scales of frequency representation

Conversion equations (Hz, semi-tones, mel, bark, ERB).

Basic auditory processes

The hearing system performs a spectrographic analysis of any auditory stimulus. The cochlea can be regarded as a bank of filters whose outputs are ordered tonotopically, so that a frequency-to-place transformation is effectuated. The filters closest to the cochlear base respond maximally to the highest frequencies and those closest to its apex respond maximally to the lowest.

The hearing system can also be said to perform a temporal oscillographic analysis of the set of neural signals that originate in the cochlea in response to an auditory stimulus. This process is important for frequencies below 500 Hz and it contributes to frequency resolution up to about 1.5 kHz. (Listeners who depend on a cochlear implant device with only one electrode have only this process available.)

In order to understand the perception of pitch, it is also necessary to consider the facts that are given by the harmonic structure of many sounds, such as the voiced sounds of speech [5]. We learn early in our lives where the partials of such sounds are to be expected, since we are exposed predominantly to harmonic sounds already in utero. This knowledge enables us to judge musical pitch intervals, and it explains why the fundamental pitch of a harmonic sound is not necessarily altered when the lowest partial is removed. Listen here to a tone composed of the three lowest partials (440 Hz, same level of all partials) - then with its first partial eliminated - and finally with doubled frequency of all partials.

The perceived pitch of a sinusoidal tone is not strictly given by its frequency, but it is also marginally influenced by its intensity level: An increase in level produces a slightly more extreme sensation of pitch.

For more on this topic, try these auditory links or consult the references, esp. [1 to 6]. Physical scales

It is often convenient to measure just frequencies when pitch perception is studied. Thus, the pitch of a sound may be specified by the frequency of a pure tone whose pitch is judged to be the same as the pitch of that sound, but auditory scales of frequency representation are required in models of auditory perception.
Period p (in s or ms)
is more directly relevant than frequency in describing the temporal or oscillographic analysis in hearing.
Frequency f (in Hz or kHz)
is a slightly more abstract notion than period, since it is defined as the number of periods per time unit. Frequency is preferred over period as the default choice in describing acoustic phenomena, sometimes only for reasons of tradition.
log(p) and log(f)
These logarithmic measures are fully equivalent for most applications, since they have the same absolute value, log(f) = -log(p). A musical octave scale can be obtained by choosing 2 as the base of log.
Auditory scales
Musical pitch (in octaves, semi-tones or cents)
The perceived musical pitch of complex tones is generally proportional to the logarithm of frequency, with only minor deviations. This is true over a wide range of frequencies up to about 5 kHz. For complex tones, the just noticeable difference (jnd) for frequency is approximately constant on this scale.
Ratio pitch (in mel)
The mel-scale of auditory pitch was established on the basis of experiments with simple tones (sinusoids) in which subjects were required to divide given frequency ranges into four perceptually equal intervals or to adjust the frequency of a tone to be half as high as that of a tone given for comparison [7]. One mel was defined as one thousandth of the pitch of a 1 kHz tone. The mel scale is now mainly used for the reason of its historical priority only. It is closely related with the critical-band rate scale.
Critical band rate z (in bark)
Measurement of the classical "critical bandwidth" (CB) [9, 10] typically involves loudness summation experiments. Different summation rules have been found to hold for auditory stimuli, depending on whether their frequency components are separated by more or less than the CB. The critical band rate scale differs from Stevens' mel-scale mainly in that it uses the CB as a natural scale unit. The relation between frequency f and CB-rate z has been described by Zwicker in form of a table, but for most applications it is more convenient to use the conversion equations listed below.
Equivalent rectangular bandwidth rate (ERB rate)
The "notch-noise method" involves the determination of the detection threshold for a sinusoid, centered in a spectral notch of a noise, as a function of the width of the notch. On the basis of results obtained with this method, auditory frequency selectivity can be described in terms of an "equivalent rectangular bandwidth" (ERB) as a function of center frequency [11]. Both spectral and temporal analysis contribute to the detection of the sinusoid.

The CB and the ERB have been found to be proportional for center-frequencies above 500 Hz. For lower frequencies, the ERB decreases with decreasing center-frequency, while the CB remains close to constant (see Figure 1). The discrepancy can be explained by the assumption that the temporal fine structure of the signal is not resolved in loudness summation, while it contributes substantially to frequency resolution for f < 500 Hz. Due to differences in bandwidth definition, the ERB is narrower than the classical critical band at all frequencies.
Conversion equations | Converter and interval calculator

Standard musical octave = log2(f / 127.09), with frequency f in Hz.
Here, -0.nn represents the 'large octave' and +0.nn the 'small octave', etc. Multiply this by 12 to obtain semi-tones and by 1200 to obtain cents.

Ratio pitch m (in mel): m = 1127 ln (1 + f / 700)
This is an approximation based on data tabulated by Beranek [8].
Inverse: f = 700 [exp(m / 1127) - 1].

Critical band rate z (in bark): z = [26.81 / (1 + 1960 / f )] - 0.53, with f in Hz [12].
This agrees with the experimental values [9, 10] to within +/- 0.05 bark within the frequency rage from 200 Hz to 6.7 kHz. This is much better than with alternative equations that have been suggested and for f < 200 Hz it is probably more correct than the rounded values in [10]. For a comparison see [12] and P. Carter.
Inverse: f = 1960 / [26.81 / (z + 0.53) - 1].
Critical bandwidth (in Hz): Bc = 52548 / (z2 - 52.56 z + 690.39), with z in bark.

Equivalent rectangular bandwidth (in Hz): Be = 6.23 10-6 f2 + 9.339 10-2 f + 28.52.
ERB-rate E (in ERB units) = 11.17 ln[(f + 312) / (f + 14675)] + 43.0
This is valid within the frequency range from 0.1 to 6.5 kHz [11].
Notes on application

Critical bandwidth Bc is a measure of tonotopic resolution in audition. Critical band rate z can be considered a measure of tonotopic position that is useful in models of hearing and for showing excitation patterns and auditory spectrograms of sounds (level by place by time). However, since the hearing system also performs a temporal analysis that contributes to frequency resolution for low frequencies, auditory frequency resolution cannot be represented on the basis of z alone.

Auditory frequency resolution is better described by the equivalent rectangular bandwidth (ERB) [11], but since ERB-rate does not represent the tonotopic dimension alone, it cannot be used to show auditory excitation patterns without distortion. Using it as a place coordinate in models of hearing, in addition to a time coordinate, implies representing the temporal contribution to frequency resolution twice, but spectra of the type level by ERB-rate, without time and without a true place coordinate, may still be useful for certain purposes.

In order to visualize pitch contours in speech, it is suggested to use a semi-tone scale or to scale frequency (or period) logarithmically [13].

References:

[1] J.O. Pickles (1988) An Introduction to the Physiology of Hearing, London: Academic (2nd ed.).
[2] R. Plomp (1976) Aspects of Tone Sensation: A Psychophysical Study, London: Academic.
[3] B.C.J. Moore (1989) An Introduction to the Psychology of Hearing, London: Academic, (3rd ed.).
[4] E. Zwicker and H. Fastl (1990) Psychoacoustics: Facts and Models, Berlin, New York: Springer.
[5] E. Terhardt (1972) "Zur Tonhöhenwahrnehmung von Klängen" Acustica 26: 173-199.
[6] E. Terhardt (1974) "Pitch, consonance, and harmony" J. Acoust. Soc. Am. 55: 1061 - 1069.
[7] S.S. Stevens and J. Volkman (1940) "The relation of pitch to frequency: A revised scale" Am. J. Psychol. 53: 329-353.
[8] L.L. Beranek (1949) Acoustic Measurements, New York: Wiley.
[9] E. Zwicker, G. Flottorp and S.S. Stevens (1957) "Critical bandwidth in loudness summation" J. Acoust. Soc. Am. 29: 548-557.
[10] E. Zwicker und R. Feldtkeller (1967) Das Ohr als Nachrichtenempfänger 2. Aufl., Stuttgart: Hirzel.
[11] B.C.J. Moore and B.R. Glasberg (1983) "Suggested formulae for calculating auditory-filter bandwidths and excitation patterns" J. Acoust. Soc. Am. 74: 750-753.
[12] H. Traunmüller (1990) "Analytical expressions for the tonotopic sensory scale" J. Acoust. Soc. Am. 88: 97-100.
[13] H. Traunmüller and A. Eriksson (1995) "The perceptual evaluation of F0-excursions in speech as evidenced in liveliness estimations" J. Acoust. Soc. Am. 97: 1905 - 1915.
(Abstract)

Hartmut Traunmüller | Phonetics at Stockholm University
Last modified in August 1997, links refreshed in May 2000, link to converter added in March 2005