Progress

Report regarding the first year of activities of the project

(period of October 2010 till November 2011)

Date of this report: December 2011

1.0 Introduction

This research project focuses on the singing voice and gathers institutions, professionals and researchers in three complementary domains: singing pedagogy, engineering/signal processing and medical/laryngology. The objective is to articulate expertise and know-how in the different domains such as to design, develop and validate innovative computer-assisted methodologies and technologies in three main application areas: i) real-time visual feedback of relevant quality parameters of the singing voice, ii) new technology-assisted pedagogic methodologies in singing teaching/learning, and iii) real-time monitoring and assessment of the singing voice with the purpose to prevent voice disorders. Briefly, this project aims at providing singing students, teachers and professionals with solutions helping them to optimize singing teaching, learning and training, and to perform safely. This is the vision underlying the seven tasks of the project:

  • TASK1-correlation between subjective quality parameters of the singing voice and objective acoustic features,

  • TASK2-new technology-assisted methodologies in singing teaching/learning,

  • TASK3-singing to musical score transcription and music composition,

  • TASK4-correlation between objective acoustic features of the singing voice and voice disorders in singing,

  • TASK5-robust real-time glottal pulse estimation from running singing,

  • TASK6-real-time preventive assessment of the singing voice,

  • TASK7-management.

A general overview of the project may be found in the document FCT104995_oct10.pdf which describes the objectives of each task, identifies the team (institutions and researchers) associated to each task, specifies the man-month effort, provides the timeline of each task, and describes the overall realization goals.

The above-mentioned PDF document has been prepared using the description of the initial project proposal but also includes some additional information related to relevant progresses after the project approval (first semester of 2010) but just before the first project meeting that took place on November 20th 2010 (shortly after the official start of the project activities on October 1st 2010).

In the following a brief description of the progresses given to the objectives of each individual task and during the first year of the project, will be presented.

2.0 Progress during the first year of the project

2.1 Task 1

Name: Correlation between subjective quality parameters of the singing voice and objective acoustic features

Coordination: ESMAE-IPP/FEUP

Duration: 6 months

Task description:

The objective of this task is to identify and characterize the most important quality and stylistic/expressive perceptual parameters in singing, to investigate what objective acoustic features correlate well with those parameters, and to develop efficient algorithms that are able to estimate them reliably. This information is of paramount importance for TASK2 since the right features must be known and the right estimation algorithms must be implemented before a meaningful and useful visual representation is given to the associated perceptual parameters.

Activity report:

A wide advertising of the 6-month research grant assigned to this task has been given, including at the EraCareers website. Six candidates have applied. On Abril 4th 2011, the Jury has completed the ranking of all the candidates and a candidate was eventually selected. The selected candidate (João Terleira) is a postgraduate student aiming at his Masters in Artistic Interpretation in Singing and has started his research activities in May 2011. The subject of his dissertation is related to the topic of Task 1 of the project. So far the candidate has identified and structured a list and definition of relevant quality and stylistic/expressive perceptual parameters in singing, and currently he is making records of professional voices illustrating each one of the identified relevant quality and stylistic/expressive perceptual parameters. These records will then be analyzed with the help of engineers in order to extract the most important acoustic features from the singing signals that have a good correspondence with those quality and stylistic/expressive perceptual parameters. This challenge has in fact been proposed as an Msc Dissertation to an ECE student at FEUP. The proposed topic is "Relation between objective characteristics of the singing voice and its aesthetical attributes" and which has been undertaken by the MSc student Vitor Almeida. In the context of his work, Vitor Almeida has already produced a monograph (identified below under "outcomes") and currently he is working in close cooperation of João Terleira and focusing of formant analysis of the singing voice.

Main researchers involved:

João Terleira, MSc degree (Artistic Interpretation –ESMAE/IPP) expected by July 2012

Vitor Almeida, Msc degree (Electrical and Computer Engineering -FEUP) expected by March 2012

Rui Taveira (ESMAE), PhD Sudent

Ricardo Sousa (FEUP), PhD

Aníbal Ferreira (FEUP), PhD

Outcomes:

Relevant facts or remarks related to this task:

The PhD dissertation of Susana Freitas, another researcher in the project, is strongly related to this task and to TASK4 as it involves the correspondence between acoustic (objective) features of normal/pathological voices and perceptual (subjective) quality parameters. Extensive testing and research results have already been obtained and are being reflected in the preparation of two journal papers and one PhD dissertation.

Next steps:

Finalization of a data base of singing voice records (male/female) concerning the most relevant quality and stylistic/expressive perceptual parameters and which are amenable to a visual feedback on a computer screen. Development of a robust algorithm estimating formants in singing and, in particular, the "singer formant". Both LCP-based and cepstral-based techniques area used on the spectral envelope as well as on the noise floor of singing voice signals. Port of the formant estimation algorithm to the SingingStudio C/C++ platform.

2.2 Task 2

Name: New technology-assisted methodologies in singing teaching/learning

Coordination: FEUP/ESMAE-IPP

Duration: 25 months

Task description:

The objective of this task is the design, realization and validation of interactive visual feedback technologies in singing as well as technology-assisted teaching, learning and training methodologies, particularly at beginner's level. Singer students and teachers from ESMAE (Prof. Rui Taveira), from UCP (Prof. Sofia Serra), and engineers will collaborate closely to enhance current singing learning, teaching and practicing methodologies with useful technologies providing objective visual feedback of singing expression, in addition to the natural auditory feedback. This combined and richer feedback of the singing voice will facilitate students to grasp better the subjective and objective dimensions of their singing exercises, which will make learning faster and more effective. As a strategy, this task will improve design and functionality of technology according to pertinence and usability criteria suggested by singing teaching and learning, and will innovate on pedagogical methodologies (teaching/learning) as a result of real utilization and experimentation in class of the technology and by taking advantages of its capabilities.

Activity report:

The SingingStudio C/C++ platform has been extended with the following two main functionalities: inclusion of a real-time spectrogram and touch-screen interactivity. The work has been mainly performed by Ricardo Sousa although some fine-tuning on the touch-screen interactivity is still ongoing and new functionalities are also being added to the platform. As a result of research work carried out by Ricardo Sousa in the context of his PhD dissertation, several important contributions have been included in the SingingStudio software platform. Ricardo Sousa has successfully defended his PhD dissertation entitled "Metodologias de Avaliação Perceptiva e Acústica do Sinal de Voz em Aplicações de Ensino do Canto e Diagnóstico/Reabilitação da Fala" on October 19th 2011.

A professional singing professor from Escola Superior de Música e Artes do Espectáculo of the Polytechnic Institute of Porto (ESMAE/IPP) and member of the project team has been motivated to this task and has started in 2010 a PhD Programme on "Digital Media" at FEUP. The theme of the PhD dissertation consists in the topic of this task. A first monograph on the theme has already been concluded (see outcomes, below) which not only reviews the current state-of-the-art but also presents an in-depth discussion on the evolution in the area.An MSc student has also been motivated to this task and has carried out research and development in the context of his MSc dissertation (Integrated Masters in Electrical and Computer Engineering) whose title is "Biofeedback of the Singing Voice -automatic identification and parameterization of vibrato in singing". The dissertation was successfully defended and approved on July 11th 2011. The developed algorithm (Matlab-based) is currently being adapted by Ricardo Sousa to be included in the C/C++ SingingStudio platform.

During the year of 2010 and 2011, two grants supported by the project and specifically in the context of this task have been given wide publicity, including at the EraCareers web site. As a result of the first grant announcement and candidate ranking, the candidate Ricardo Sousa has been selected which, due to the FCT requirements, has implied a request concerning his removal from the initial team of the project where he was considered as a PhD student not supported by the project budget. As a grant holder, he started activities on December 10th 2010. As a result of the second grant announcement, two phases existed. During the first phase, the grant announcement (limit date: February 2011) has motivated three candidates. The evaluation of the candidates by the Jury concluded that a new call should exist as neither of the candidates qualified. After getting permission from FCT, a second announcement was publicized (limit date: July 2011) and eight candidates have submitted their application. After ranking the candidates, the Jury has selected Tiago Campos as the best candidate holding an MSc degree. Tiago Campos has started his activities on the project, namely in the context of Tasks 2 and 3, on November 7th 2011. His current assignment regards touch-screen interactivity of SingingStudio, a revision/redesign of the searchtonal() algorithm in SingingStudio for better performance in melody line estimation, particularly at high fundamental frequencies of the singing voice, and the inclusion of the formant estimation functionality in the spectrogram window of SingingStudio.

Main researchers involved:

Rui Taveira (ESMAE), Professor of Singing and PhD Sudent

Sofia Serra (UCP), PhD, Professor of Singing

José Ventura, Msc degree (Electrical and Computer Engineering -FEUP) awarded in July 2011

Tiago Campos (FEUP), MSc

Ricardo Sousa (FEUP), PhD

Aníbal Ferreira (FEUP), PhD

Outcomes:

Monograph by Prof. Rui Taveira.

MSc dissertation by José Ventura, defended and approved on July 2011

Extension of the SingingStudio processing and interactive capabilities. This work has been carried out mainly by Ricardo Sousa and Tiago Campos.

PhD dissertation by Ricardo Sousa defended and approved on October 19th 2011.

Next steps:

Finalization in the SingingStudio environment of the functionality of automatic identification and statistical parametrization of vibrato in singing. Adaptation of the formant estimation algorithm (Matlab-based) developed by Vitor Almeida to the C/C++ platform. This algorithm is to be used in the spectrogram window of SingingStudio. Fine-tuning of the touch-screen interactivity in SingingStudio. Revision/redesign of the searchtonal() algorithm in SingingStudio for better performance, particularly at high fundamental frequencies of the singing voice. This specific item of work is to be carried out mainly by Tiago Campos and Ricardo Sousa.

2.3 Task 3

Name: Singing to musical score transcription and music composition

Coordination: FEUP/UCP

Duration: 24 months

Task description:

The objective of this task is to specialize the SingingStudio software environment developed in the context of TASK2 in order to offer new interactive functionalities, namely (real-time) automatic transcription of singing to music score and didactic music composition using singing voice only as input. In this case, the MIDI (Musical Instrument Digital Interface) protocol will be used to represent the symbolic notation of music. Since in reality this consists of a set of musical parameters, MIDI is editable which means that after the transcription of singing to music score, the user is allowed to modify or correct the automatically recognized melody line and all of its individual music notes. The final score can therefore be played back using any synthetic instrument that is allowed by the MIDI protocol. This is the basic didactic functionality that will be implemented in the context of this task. It will also be enhanced in order to allow the repetition of singing voice exercises that are assigned to different music instruments. Combining the results together leads to a practical music composition functionality taking singing voice as the only input. This task will bring together engineers (FEUP), musicians and interactive system designers (UCP), and will be developed in close collaboration with Casa da Música in Porto.

Activity report:

Although the activities concerning this task are scheduled for the second year of the project, some visible activity took place already. In fact, an MSc student from FEUP (Miguel Garcia) has chosen the topic of this task for his MSc dissertation (ECE at FEUP). The specific topic is "Transcrição de Canto para Pauta Musical" (Singing to musical score transcription). In the context of his work, Miguel Garcia has already produced a monograph (identified below under "outcomes") and currently he is working in close cooperation with all the team of the project at FEUP in order to add the new functionality (singing to music score transcription) in the SingingStudio platform. On the other hand, since this activity is to be actively coordinated by the Partner Institution UCP and since a 6-month grant (Licenciate degree) is included in this task, UCP has already given wide publicity to the grant announcement, including at the EraCareers web site. During a Jury meeting on November 25th 2011, the Jury has ranked all the received applications and a candidate has been selected. The selected candidate will start the research and development activities in the context of this project and task, on January 1st 2012. The work emerging from this grant will be articulated with the work currently being developed by Miguel Garcia such as to add a new practical music composition functionality in SingingStudio taking singing voice as the only input. From the point of view of UCP, his task will focus on interactive system design and implementation, and will be developed in close collaboration with both the engineering team at FEUP as well as with Casa da Música in Porto, whose representatives have already expressed their strong interest in such a scenario for workshops, in the perspective of the educational mission of Casa da Música. These workshops will be organized for the general public interested in learning the basics of singing, music notation and music composition.

Main researchers involved:

Álvaro Barbosa (UCP), PhD

MSc Student (UCP), to be recruited and to start activities on January 2012

Miguel Garcia, Msc degree (Electrical and Computer Engineering -FEUP) expected by March 2012

Tiago Campos (FEUP), MSc

Ricardo Sousa (FEUP), PhD

Aníbal Ferreira (FEUP), PhD

Outcomes:

Next steps:

Implementation of a metronome in SingingStudio. Implementation of the singing to music score transcription functionality in SingingStudio and planning of basic editing capabilities. Preliminary contacts with representatives of the educational mission of Casa da Música in order to design the interactive "look and feel" functionalities of singing voice to music score transcription and composition.

2.4 Task 4

Name: Correlation between objective acoustic features of the singing voice and voice disorders in singing

Coordination: FMUP/FEUP

Duration: 6 months

Task description:

The objective of this task is to identity what singing disorders are typical, what perceptual classification parameters best describe those disorders, and to investigate what acoustic features correlate well with them. This information is extremely important as input to TASK6 whose objective is to use the acoustic signal in order to detect as early as possible, i.e., in a preventive perspective and in a non-invasive way, risk factors (e.g., stress, tension) that could give rise to voice disorders. Databases of both healthy singing and singing voices exhibiting voice pathologies like dysphonia or laryngeal lesions, are instrumental to this task. As it can be anticipated that these will be difficult to gather or locate, contacts will be established with other research groups or organizations working on similar areas (e.g., members of Cost Action 2103 "Advanced Voice Function Assessment", or members of the European Laryngological Research Group - http://www.elsoc.org/). In this task acoustic features generally accepted by the scientific community as meaningful, will be first tested and correlations will be established using available databases. Then, new features such as harmonic irregularity/extension and closing/open coefficient of the glottal pulse (a research topic that will be addressed in TASK5) will be investigated. The correlation of acoustic data with electroglottograph, laryngoscopic and stroboscopic information will also be important (as well as in TASK5) so as to conclude on functional/biomechanical profiles characterizing normal and abnormal voicing. ORL doctors from FMUP and engineers from FEUP will collaborate in this task. PhD students who have already significant experience in acoustic-perceptual evaluation of voices will also be involved.

Activity report:

Due to operational and communication problems related to the research team of the Faculty of Medicine of the University of Porto (FMUP), as timely reported to FCT, on the part of the FMUP there were no significant advances in this task. As a consequence and contrary to what was initially planned and to what was decided during the first project meeting on November 2010, no researcher has been recruited during the first year of activities of the project for this particular task. In the meantime, a special request to FCT by the Principal Investigator and with the agreement of the Director of FMUP, a new researcher (Full Professor at FMUP) has been included in the research team, effective from November 28th 2011, and will coordinate all operational activities of the project and related to FMUP. These involve especially the recruitment of a researcher holding an Msc degree at aiming at being involved in a PhD Programme.

Relevant facts or remarks related to this task:

As already mentioned in the context of TASK1, the PhD dissertation of Susana Freitas, another researcher in the project, is strongly related to this task as it involves the correspondence between acoustic (objective) features of normal/pathological voices and perceptual (subjective) quality parameters. Extensive testing and research results have already been obtained and are being reflected in the preparation of two journal papers and one PhD dissertation. It is expected that these research results will represent valuable input to this task. A related monograph is mentioned in the outcomes pertaining to this task.

Main researchers involved:

TBD (FMUP), PhD

MSc Student (FMUP), to be recruited and to start activities early in 2012.

Susana Freitas, PhD degree (Biomedical Engineering -FEUP) expected during 2012

Ricardo Sousa (FEUP), PhD (2011)

Aníbal Ferreira (FEUP), PhD (1998)

Outcomes:

Monograph by Susana Freitas on October 2010.

Next steps:

Urgent recruitment of a researcher (holding an MSc degree) by FMUP and motivated to be involved in a PhD Programme which is consistent with the objectives of the project and this task.

2.5 Task 5

Name: Robust real-time glottal pulse estimation from running singing

Coordination: FMUP/FEUP

Duration: 13 months

Task description:

The objective of this task is to develop a computational procedure that is able to estimate reliably and in a non-invasive way, the glottal pulse from running singing, in real-time. The glottal pulse is very important because it conveys quite relevant information regarding the idiosyncrasies of the speaker (i.e. the sound signature and identity of a speaker), the physiological structure of the glottis and the vibration pattern of the vocal folds. These aspects determine the quality of the phonation, either in the perspective of artistic/aesthetic quality or in the perspective of healthy/non-healthy voice quality. Innovative results are expected in the context of this task that can be extended to other application scenarios where the economic value is considerable. For example, the results of the research carried out in the context of this task may pave the way for the automatic remote assessment of the voice quality when a patient calls to the hospital or the clinic. Thus, this scenario justifies that an international patent application process be filled. Although the objective is to develop a non-invasive procedure, semi-invasive methods will be used to obtain data whose importance is central to complement the acoustic data in the definition of accurate models of the glottal pulse and of the vocal tract filter for different singing or spoken voice registers and health conditions. In particular, electroglottograph (EGG), laryngoscopic, and stroboscopic information will be captured in addition to the acoustic signal. This will be possible thanks to the participation of researchers from FMUP in this task (who are also ORL doctors), since only ORL doctors are allowed by the Portuguese law to perform these exams. Engineers (FEUP) will also be involved in this task.

Activity report:

The realization sought in the context of this task is strongly depended on the recruitment of a researcher holding an MSc degree in the area of medicine with a specialization on (or pursuing studies in the area of) otorhinolaryngology. As already mentioned in the context of task 4, due to operational and communication problems related to the research team of the Faculty of Medicine of the University of Porto (FMUP), as timely reported to FCT, on the part of the FMUP, so far there were no significant advances in this task.

In spite of this and given that computational models are a central part of the realizations planned in the context of this task, a post-graduate student (Master in Biomedical Engineering at FEUP) has taken the theme of glottal inverse filtering as the topic of her MSc dissertation. In this context, a monograph has already been produced (July 2011) including an in-depth study of state-of-the art techniques as well as some preliminary simulation results assessing the performance of different algorithms. This monograph is included in the "outcomes" section pertaining to this task. As a result of this monograph as well as of discussions regarding the major guidelines for a new frequency-domain approach for glottal inverse filtering to be developed in the context of this task, a conference paper has been presented at an international conference in June 2011 in Israel.

Relevant facts or remarks related to this task:

With the generous help of a specialist in otorhinolaryngology (and not a part of the project team), a special setup for acoustic signal acquisition and involving two twin (and synchronous) miniscule microphones (one of them attached to a video-laryngoscope and used to capture the voice signal near the larynx and vocal folds, and the other used to capture the voice signal outside the mouth), has been used to obtain data which will be used to model the (glottal) source excitation as well as the vocal tract filter. Currently this data is being analyzed so that models can be estimated. On the other hand, since this task includes concerns of reliable speaker identification, an MSc dissertation (Integrated Masters in Electrical and Computer Engineering) has been proposed on the topic of "Robust Speaker Identification in less than two seconds". This topic has been undertaken by Diana Mendes. All the experiments and simulations conducted have revealed very interesting conclusions. For example, they have revealed that using very short speech segments to perform speaker recognition, voiced regions contribute strongly to performance and, adding phase information related to the glottal pulse, also improves speaker recognition performance by about 7%. These results are very encouraging for the specific work of robust glottal inverse filtering. The MSc dissertation has been defended and approved on September 28th 2011. The corresponding PDF is also included in the outcomes pertaining to this task.

Main researchers involved:

TBD (FMUP), PhD

MSc Student (FMUP), to be recruited and to start activities early in 2012.

Sandra Dias, MSc degree (Biomedical Engineering -FEUP) expected during 2012

Diana Mendes, Msc degree (Electrical and Computer Engineering -FEUP) awarded in September 2011

Ricardo Sousa (FEUP), PhD (2011)

Aníbal Ferreira (FEUP), PhD (1998)

Outcomes:

Monograph by Sandra Dias on July 2011.

Sandra Dias, Ricardo Sousa and Aníbal Ferreira, "Glottal inverse filtering: a new road-map and first results", Signal Processing Conference, Tel-Aviv, Israel, June 2011.

MSc dissertation by Diana Mendes, defended and approved on September 28th 2011.

Next steps:

Urgent recruitment of a researcher (holding an MSc degree) by FMUP and motivated to be involved in a PhD Programme which is consistent with the objectives of the project and this task. Development, simulation, validation and performance evaluation of the frequency-domain glottal inverse filtering approach devised in this task. Tests will be conducted first using modal voice.

2.6 Task 6

Name: Real-time preventive assessment of the singing voice

Coordination: FMUP/FEUP

Duration: 24 months

Task description:

The objective of this task is to develop a software environment (an extension of that developed in the context of TASK2) allowing singers to monitor their singing voice in real-time with the purpose to detect risk factors (i.e., stress factors or voice over-use factors) that could develop into voice disorders. The results of TASK 4 and TASK5 will be used to establish a safety margin between normal voice usage and incorrect or risky voice usage. As in TASK2, it should be noted that a significant challenge that will be tackled in this task is not only to take full advantage of meaningful acoustic features, but also to make it possible to extract them from running singing and not only from sustained vowels. This important advance paves the way for remote automatic assessment of the voice quality as it has been already highlighted in the description of TASK5. This task will require extensive validation work and will involve researchers from FEUP (engineers) and FMUP (ORL doctors).

Activity report:

This task is planned to start just after the first year of the project and is strongly depended on the recruitment of a researcher holding an MSc degree in the area of medicine with a specialization on (or pursuing studies in the area of) otorhinolaryngology. It is expected that the recruitment or researchers during the next two months will give effective start to the activities of this task.

Next steps:

Urgent recruitment of a researcher by FMUP and in the context of tasks 4, 5, and 6.

2.7 Task 7

Name: Management

Coordination: FEUP/FMUP/UCP/KTH

Duration: 36 months

Task description:

This task is devoted to the overall management of the project.

Activity report:

A plenary meeting of the project took place on November 19th 2010. As mentioned before, the input document to this meeting (FCT104995_oct10.pdf) provides a general overview of the project. During the plenary meeting, all tasks of the project were discussed, namely in terms of articulation, realization objectives and immediate actions (especially the announcement of several research grants). The minutes of the meeting reflects all the major aspects discussed and includes comments made to the document improving its final version. During the first year of the project, numerous dedicated (i.e., local) meetings took place involving the PI, individual partner institutions and research teams. The purpose of those meetings was to discuss, monitor and plan the progress of the different research activities as well as to discuss and activate administrative decisions regarding the project. A special mention must be made concerning critical administrative issues involving the partner institution Faculty of Medicine of the University of Porto (FMUP) which were highly time-consuming and have implied critical delays on specific activities as reported above. Fortunately, thanks to the diligences and commitment of the Board of Directors of FMUP, a new Researcher (Full Professor) from the FMUP has been included in the FMUP team of researchers (after approval by FCT) and the right conditions are now in place to give the necessary momentum to the tasks of the project especially assigned to FMUP.

Relevant facts or remarks related to this task:

Since Prof. Sten Ternström from KTH, a consultant in the project, has also participated in the meeting, he kindly accepted the invitation to give a talk on the interesting topic of "Does the acoustic waveform mirror the singing voice ?". The corresponding announcement is available here (seminar announcement Sten) and a supporting journal paper is available here (Ternstrom,2005 Does The Acoustic Waveform Mirror The Voice).

Outcomes:

Input document to the first plenary meeting describing all project tasks and realization objectives, general timeline, institutions involved and research teams.

Meeting minutes resulting from the first plenary meeting of the project.

A web page describing the project, its activities, progress and realizations has been prepared and is available at http://gnomo.fe.up.pt/~voicestudies/artts/

Next steps:

Organization of the Second Plenary Meeting of the Project on January 16th 2011.