Task 4

Name: Correlation between objective acoustic features of the singing voice and voice disorders in singing

Coordination: FMUP/FEUP

Duration: 6 months

Task description

TASK4 will run in parallel with TASK1 and its objective it to identity what singing disorders are typical, what are the associated perceptual classification parameters, and to investigate what acoustic features correlate well with them. This information is extremely important as input to TASK6 whose objective is to use the acoustic signal in order to detect as early as possible, i.e., in a preventive perspective, risk factors (e.g., stress, tension) that could give rise to voice disorders.

Databases of both healthy singing and singing voices exhibiting voice pathologies like dysphonia or laryngeal lesions, are instrumental to this task. As it can be anticipated that these will be difficult to identify, contacts will be established with other research groups or organizations working on similar areas (e.g., members of Cost Action 2103 "Advanced Voice Function Assessment", or members of the European Laryngological Research Group - http://www.elsoc.org/).

In order to understand the challenges involved and to devise possible new approaches to the problem, it is convenient to review a bit of the history of acoustic feature extraction. In our discussion, features are objective characteristics that are computed from an acoustic signal (spoken voice or singing voice) using digital signal processing techniques, after the signal has been captured by a microphone and converted to a digital format.

For more than 50 years at least, signal processing techniques have been extensively investigated and optimized in three main areas concerning spoken voice: coding/compression of speech, speech recognition and speech synthesis. Automatic speaker recognition has also received considerable attention in recent years but most frequently, the acoustic features used in this context are the same as those used in speech recognition [Sha99].

Comparatively, acoustic feature extraction for voice quality assessment has received little attention in recent years. Either the same acoustic features developed for speech coding or recognition have been used for voice quality evaluation, although with little success, or specific voice measures have been adopted with considerable more success [TIT94, Rei04]. Among these, three (jitter, shimmer and HNR) receive the largest consensus among the scientific community due to their consistent correlation with subjective parameters in sustained speech like roughness, breathiness, astheny and tension. Jitter refers to a short-term (cycle-to-cycle) perturbation in the periodicity of glottal pulses (i.e. the fundamental frequency of the voice) in the sustained phonation of a vowel, typically /a/. Shimmer refers to a short-term (cycle-to-cycle) perturbation in the amplitude of glottal pulses in sustained phonation of a vowel. The Harmonics-to-Noise ratio is a quality measure defined as the ratio between the energy of the harmonic components of a voiced vowel and the noise energy of a voiced vowel [TIT94]. Other acoustic features can be found in the literature but their relevance is not generally acknowledged by the scientific community and thus need to be confirmed in the context of this project and this task in particular.

Therefore, in this task acoustic features generally accepted by the scientific community as meaningful, will be first tested and correlations will be established using available databases. Then, new features such as harmonic irregularity/extension and closing/open coefficient of the glottal pulse [Leh07] (a research topic that will be addressed in TASK5) will be investigated.

The correlation of acoustic data with electroglottograph, laryngoscopic and stroboscopic information will also be important (as well as in TASK5) so as to conclude on functional/biomechanical profiles characterizing normal and abnormal voicing.

ORL doctors from FMUP and engineers from FEUP will collaborate in this task. PhD students who have already significant experience in acoustic-perceptual evaluation of voices will also be involved.

Expected results

The main expected results of this task are: one report, one journal paper, and software models of estimation techniques of acoustic features.

Human resources: 10.9 person-month.