Progress

Report regarding the second year of activities of the project

(period of October 2011 and November 2012)

Date of this report: January 13th 2013

1.0 Progress during the second year of the project

1.1 Task 1

Name: Correlation between subjective quality parameters of the singing voice and objective acoustic features

Coordination: ESMAE-IPP/FEUP

Task description:

The objective of this task is to identify and characterize the most important quality and stylistic/expressive perceptual parameters in voice and singing, to investigate what objective acoustic features correlate well with those parameters, and to develop efficient algorithms that are able to estimate them reliably.

Summary of activities:

Vitor Almeida completed his ECE integrated master studies at FEUP, on February 15th 2012. His MSc dissertation has focused on formant estimation and several innovative approaches we developed and tested. The most important results were then inserted in the SingingStudio environment, as reported in Task 2. Vitor Almeida has subsequently been recruited on May 14th 2012 as a researcher in the project with the purpose to collaborate in tasks 1, 2, 3, 5 and 6, namely:

  • (Task 1) to structure in collaboration with ESMAE, a data base of singing voice records according to a categorization of relevant singing voice stylistic or expressive perceptual parameters, to identify what acoustic features correlate well with those parameters and, whenever possible and useful, provide a visual feedback of those features in the SingingStudio environment,
  • (Task 2) to implement a formant estimation algorithm whose result (formant tracking in time and frequency) is displayed over a spectrogram and that should also be able to detect and highlight the singer's formant.

João Terleira, who is a singer (tenor) and who was also a researcher in the project from May 2011 till October 2011, has completed his Msc dissertation (at ESMAE) entitled "Relation between perceptual parameters of the singing voice and objective acoustic phenomena", on November 29th 2012. In the context of his Msc research, he identified and structured a list and definition of relevant quality and stylistic/expressive perceptual parameters in singing, he made records of professional voices illustrating each one of the identified relevant quality and stylistic/expressive perceptual parameters, and investigated (in collaboration with Vitor Almeida and Ricardo Sousa) the correspondence between a selection of these parameters and relevant acoustic features (not much time was available to dedicate to this last phase).

Susana Freitas is a Speech Therapist and a PhD student in Biomedical Engineering at FEUP. She is also a collaborator in the project since its start. She has submitted her PhD dissertation entitled "Human Voice Characterization using acoustic and perceptual evaluation" in December 2012. During her PhD research work, she analyzed a data base containing 90 dysphonic voice records, with the help of ten voice professionals or specialists (perceptual evaluation), and four voice analysis software (acoustic evaluation). Several studies were carried out on intra and inter voice expert consistency, inter software consistency, correspondence between perceptual and acoustic evaluation, and prediction models of perceptual quality based on acoustic quality, for each one of the tested software.

Three research papers reflecting the most important results are being prepared and will be submitted soon.

As an important outcome of this PhD research, a training software (Voice-PE, perceptual evaluation) has been designed and implemented with the instrumental help of Vitor Almeida. This software is a user-friendly educational tool allowing voice students and professionals to compare their perceptual assessment of a selection of dysphonic voices, against the assessment performed on the same voices by a selection of voice specialists/professionals. This software can also be used as a training tool as well as a calibrator of the "internal" perceptual references.

Outcomes:

- MSc dissertation by Vítor Almeida defended and approved on February 2012;

- MSc dissertation by João Terleira defended and approved on November 2012;

- PhD dissertation by Susana Freitas (submitted);

- A didactic software application on perceptual voice quality evaluation (Voice - PE);

- A structured database of (solo) singing records illustrating several relevant stylistic/expressive perceptual parameters.

Next steps:

- Submission by Susana Freitas of three journal papers addressing the most important results obtained in the context of her PhD research.

- Development of algorithms allowing to estimate a selection of the stylistic/expressive perceptual parameters in singing as identified in the context of João Terleira's MSc research, notably the "singer formant". Implementation of those algorithms in the SingingStudio C/C++ platform.

1.2 Task 2

Name: New technology-assisted methodologies in singing teaching/learning

Coordination: FEUP/ESMAE-IPP

Task description:

The main objectives of this task are the design, implementation and validation of biofeedback technologies in singing and also computer-assisted methodologies in singing teaching / learning.

Summary of activities:

The SingingStudio platform has been improved in several aspects, receiving important contributions from several researchers (Ricardo Sousa, Tiago Campos, Vitor Almeida):

  1. Automatic detection and parametrization of vibrato (as a result of the MSc work developed by José Ventura, as reported in the previous annual activity report).

    A singing voice record is analyzed and vibrato occurrences are automatically detected and graphically highlighted directly over the F0 contour on the main melody window of SingingStudio, these highlighted regions are clickable and a click opens a new window displaying the main parameters of the vibrato (frequency -in Hertz- and depth or extension –in semitones) and a few other statistical analyses including a sinusoidal purity parameter which measures the similarity between the vibrato and a sinusoidal wave, as well as its aperiodicity degree. This work gave rise to a conference paper that has bee presented at the "5th International Symposium on Communications, Control, and Signal Processing", Rome, Italy, in May 2012;

  2. Pitch contour and spectrogram.

    The F0 detection algorithm has been improved (e.g. to avoid sub-harmonic errors and) to insure a smoother melody line detection in real-time, particularly in cases when the F0 is high, and in regions of the singing where the signal is very weak (or the SNR is very low). On the other hand, a spectrogram representation which had already been included last year in the SingingStudio environment, has been improved in several ways. For example, it includes a representation of the F0 contour, and the colors of the spectrogram, which appears in a new window, may be adjusted so as to allow a better visualization of the prominent spectral harmonics and spectral regions (i.e. formants). On the other hand, the type of time window used in the spectrogram is now user-selectable so as to allow the selection of cleaner spectrogram displays.

  3. Formant detection.

    A new functionality of formant estimation has been added to the spectrogram representation, this functionality has implied an intensive optimization combining different algorithms so as to be able to deal with a significant diversity of the singing signal, namely when the F0 is very high. This work has been undertaken by Vitor Almeida and has benefited from his MSc dissertation work on the same topic, as mentioned above.

After getting permission from FCT, José Lopes (holding an MSc degree and the "father" of SingingStudio), has been recruited on March 2012 as a consultant to the ARTTS project with the purpose to help the development and the innovation of the SingingStudio platform.

The SingingStudio software environment has been upgraded from Qwt5 to Qwt6 graphic libraries (allowing in particular a more efficient approach to the spectrogram display), and from VisualStudio2005 to VisualStudio2010 Microsoft compiling environment (allowing a faster code execution and cleaner code organization). This work has been carried out collaboratively by José Lopes, Tiago Campos and Vitor Almeida.

With the contribution of José Lopes, Vitor Almeida and Tiago Campos, an SNV repository has been setup to facilitate the asynchronous and distributed development (and version control) of the different software projects, namely SingingStudio and SingingBattle (reported in Task 3).

SingingStudio has been installed at UCP so that singing students can test the software and can provide feedback regarding its usability, usefulness, and new ways in which it can be improved helping the teaching/learning of singing. A similar installation is planned for ESMAE which is dependent on hardware acquisition.

Ricardo left the ARTTS team on April 10th, 2012 and Tiago Campos left the ARTTS team on November 6th 2012. After getting permission from FCT taking into consideration the available budget in the project, two new research grants have been advertised during the period from December 10th 2012 till December 28th 2012. The purpose of one of them, is to migrate the SingingStudio software from the Windows platform (using Qt, Qwt, RT Audio and VisualStudio 2010 environments) to an iOS-based platform (e.g iPad), and the purpose of the second grant is to collaborate in tasks 2, 3, 4 and 6 of the project, namely those involving the development, adaptation and optimization of the SingingBattle software environment for Casa da Música. The current situation is that the first grant has received candidates and the second has received none. For this reason, permission has already been asked to FCT to reopen a new submission period concerning the second research grant.

Outcomes:

- Improved version of the SingingStudio environment.

Next steps:

- Finalization of the windows-based version of the SingingStudio environment (probably during the month of January 2012) and its installation at ESMAE and UCP, so that students and professors can test and assess SingingStudio and provide feedback and suggestions for improvement.

- Port the SingingStudio software to iOS-based platforms.

- PhD work to start by Prof. Rui Taveira on the development of computer-assisted pedagogical methodologies in singing teaching/learning.

1.3 Task 3

Name: Singing to musical score transcription and music composition

Coordination: FEUP/UCP

Task description:

The main objectives of this task are the design, implementation and optimization of technologies allowing the transcription of singing to musical score and including also editing capabilities

Summary of activities:

After the first phase of algorithm development concerning the functionality of singing to music score transcription, carried out by Miguel Garcia in the context of his MSc dissertation, and which he successfully defended on February 15th 2012, the work has been continued by Tiago Campos. Tiago has developed further and fine-tuned that functionality in the SingingStudio platform: after the melody line (or F0 countour) has been detected in SingingStudio, the musical notes are identified taking as a reference a chosen time scale, and are represented in a music score. A metronome has also been included as a new functionality in order to facilitate and to add flexibility to the estimation, parameterization and association of musical notes. In addition, the detected music score can also be exported as a MIDI, MusicXML or as a PDF file. With the help of João Terleira (singing teacher and professional), some heuristics had to be included in order to produce cleaner and more consistent results. This functionality, although not yet perfect (for example, it does not yet include editing capabilities), is already available in the SingingStudio environment.

A Portuguese magazine in the area of informatics (Exame Informática) has interviewed ARTTS researchers concerning the SingingStudio software and has produced a video which is available on YouTube: Exame Informática - interviewed ARTTS researchers (YouTube)

Emerging from SingingStudio, a new software environment, named SingingBattle, has been developed in collaboration with UCP (a partner in the project) and Casa da Music. As planned in the project, a licenciate student (André Cardoso) who was recruited as researcher at UCP, during six months (first semester of 2012), was in charge of designing the main functionalities to be supported by SingingBattle (in tight collaboration with the ARTTS team at FEUP), considering the available functionalities in the SingingStudio software, the expected type of visitors to Casa da Música and the Educational Mission of Casa da Música. In this perspective, several "design-thinking" sessions were organized at UCP and four meetings too place at Casa da Música: on February 16th 2012, April 5th, July 31st, and on October 9th. The first two of these meetings were brainstorming sessions and the last two involved demonstrations of the SingingBattle prototype. As a result of this design and development effort, SingingBattle is at present in a fairly stable configuration which supports three main functionalities:

  • vocal range detection (tessitura) and singing voice classification (bass, baritone, tenor, alto, mezzo-soprano, and soprano),
  • simple melody-following exercises (up to four users, simultaneously) using as a reference (to sing along) the singing voice without musical accompaniment,
  • exercises of melody following (up to four users, simultaneously) by singing along popular songs, karaoke-style, with an appealing form of synchronization between music and lyrics shown on two lines for ease of tracking.

Each one of these functionalities is be used separately but the first is a required "entry point" since it allows to adapt the natural vocal range of the user to the range of musical notes of either one of the last two SingingBattle functionalities (as above).

Two aspects remain to be addressed before SingingBattle can be installed at Casa da Música: hardware and navigation between the three functionalities. Concerning hardware, an investment is necessary to support the costs of one computer, two large LCD touch screens and interfaces like microphones and loudspeakers. Permission will be asked to FCT to address this. Concerning navigation, a design and development investment is needed in order to implement a graphical user interface which is informative, intuitive and friendly, allowing an autonomous interaction with SingingBattle. In particular, appealing buttons need to be designed that are consistent with and reminiscent of the Casa da Música image, and demos (videos) need to be prepared containing user instructions. Pro bono collaboration has been sought among students and professors of several design/multimedia courses and even design schools outside the university, unfortunately without much success.

Although not directly planned in the context of this task, a PhD research (in Informatics Engineering) conducted by Nuno Fonseca and under the supervision of the PI of the project, has been articulated with this task since it shares many similar concerns and objectives. The title of the dissertation is "Singing Voice Resynthesis using Concatenation-based Techniques". It involves analysis of the singing voice with the purpose to extract meaningful features characterizing such aspects of the singing as pitch, energy dynamics, phonetics/timbre. These features are then used to drive a singing synthesizer which is based on concatenation-based techniques and on a large data base of singing voice records by several singers. In short, the main objective of the dissertation work is to allow a user to directly control a singing voice synthesizer using his/her own voice. The dissertation has been successfully defended and approved on May 16th 2012.

Outcomes:

- New version of the SingingStudio environment with two new functionalities: metronome and singing to music score transcription,

- A new software prototype: SingingBattle,

- PhD dissertation by Nuno Fonseca, defended and aproved on May 2012,

- MSc dissertation by Miguel Garcia, defended and aproved on February 2012,

- A report by André Cardoso.

Next steps:

- to improve and fine-tune the SingingStudio functionality of singing to music score transcription,

- to overcome the final steps before SingingBattle can be installed at Casa da Música.

1.4 Task 4

Name: Correlation between objective acoustic features of the singing voice and voice disorders in singing

Coordination: FMUP/FEUP

Task description:

The main objectives of this task are the design, implementation and validation of analysis strategies of acoustic signals, complemented with laryngeal examination, and allowing to establish correspondences for the noninvasive detection of voice disorders

Summary of activities:

As planned in the project plan, a research grant associated to the FMUP has been advertised from April 27th 2012 till May 11th 2012. As a result, Inês Moura, a Speech Therapist, has been recruited and started her activities in July 2012. The main goals of the research reflect those of this task, namely to motivate voluntary professional and amateur singers, having symptoms of singing voice discomfort or disorders, to perform several tests. These tests include a self-assessment, spoken and singing voice exercises for perceptual and acoustic quality evaluation and, when appropriate, laryngeal examination for complete physiological and functional assessment, notably of the vocal folds. So far, self-assessment and voice exercises have been concluded for several voluntary singers and the corresponding data is being pre-processed. Since ethical procedures and authorizations are currently underway, laryngeal examinations could not yet start, however, it is expected that they may resume by the end of January 2013.

Other objectives of the research include voice quality evaluation by a panel of voice specialists, acoustic voice quality evaluation, and identification of relations between self-assessment, voice quality assessment and laryngeal examination. The final goal is to identify pertinent acoustic features that may be used to provide early feedback on voice stress or mild perturbations, in a preventive perspective, avoiding the evolution to a situation of declared voice disorders.

Outcomes:

- data collection concerning self-assessment and voice records by volunteer singers declaring voice problems or difficulties,

- due diligences regarding authorization by the Ethics Committee of FMUP.

Next steps:

- evaluation of the recorded voices using perceptual an acoustic analysis,

- realization of laryngeal examination after authorization by the Ethics Committee is granted,

- relation between the different types of data for the same singer and for different singers.

1.5 Task 5

Name: Robust real-time glottal pulse estimation from running singing

Coordination: FMUP/FEUP

Task description:

The main objectives of this task are the robust and real-time estimation of the glottal pulse from running speech/singing, in order to extract information concerning the quality of phonation or abnormal functioning of the vocal folds.

Summary of activities:

As reported in the previous annual activity report, the topic of this task has been taken early in 2011 as the subject of an MSc dissertation (in Biomedical Engineering, second cycle) by a post-graduate student, Sandra Dias. The research led to the proposal of a new glottal pulse prototype and a robust frequency-domain approach for glottal source estimation that uses a phase-related feature based on the Normalized Relative Delays (NRDs) of the source harmonics. This approach was tested with several speech signals (synthetic and real), and the glottal pulse estimation results were compared with those obtained using other state-of-the-art methods, notably IAIF. Sandra defended successfully her MSc dissertation (entitled "Estimation of the Glottal Pulse from Speech or Singing Voice") on October 3rd 2012.

The results concerning the frequency-domain approach to glottal source estimation are preliminary and, in addition, are not amenable to real-time operation. This means the topic requires a new breath of research effort and, therefore, will be proposed as the theme for a new MSc or, preferably, PhD dissertation.

A part of the results obtained in the context of Sandra's dissertation have been used to prepare a paper that was submitted to the INTERSPEECH 2013 conference. Unfortunately the paper was not accepted and, as a consequence, it is has been improved and submitted to the ICASSP 2013 signal processing conference.

As also reported in the previous annual activity report of the project, following the approval, on September 28th 2011, of the MSc dissertation (integrated Masters) defended by Diana Mendes, on the topic of "Robust Speaker Identification in less than two seconds" (which is related to this task due to the importance of phase information related to the glottal source), a paper has been prepared and submitted to the 46th Audio Engineering Society International Conference on Audio Forensics. The paper was accepted and was presented in June 2012 (in Denver, Colorado, USA).

Outcomes:

Next steps:

- Motivation of a new Msc or PhD student to continue this line of research.

1.6 Task 6

Name: Real-time preventive assessment of the singing voice

Coordination: FMUP/FEUP

Task description:

The main objectives of this task are studies of correspondence between objective acoustic characteristics and sensible disturbances of the singing voice; the design, realization and validation of technologies for evaluating in real-time the singing voice so as monitor vocal stress, excessive vocal effort and to prevent vocal disturbances.

Despite the fact that a motivated ORL specialist has already been identified that will collaborate in the project and in this task specifically, this task could not yet start because it depends critically on the outcomes of task 5 which is currently in progress and, therefore, no results are yet consolidated or usable.

1.7 Task 7

Name: Management

Coordination: FEUP/FMUP/UCP/KTH

Summary of activities:

A plenary meeting took place on January 16th 2011and comprised two parts. The first part, open to the academic and research community, consisted of a seminar by Prof. Sten Ternström from the Department of Speech, Music and Hearing at KTH, on the topic of "Singing Voice: I see the sounds - but what do they mean?". The second part consisted of a number of short presentations giving an update on the status of the research and development work in each task of the project.

After this plenary meeting several local meetings took place joining researchers from FEUP, UCP and FMUP. The purpose of those meetings was to discuss, monitor and plan the progress of the different research activities as well as to discuss and activate administrative decisions regarding the project, namely concerning the recruitment of researchers.

As highlighted in the previous annual report, it should be noted that the modifications that took place during the second semester of 2011 and concerning the project team (effective on November 28th 2011) at the partner institution Faculty of Medicine of the University of Porto (FMUP), notably Prof. Altamiro Pereira and Dr. Isabel Lema, had a tremendous and positive impact in terms of management, administrative matters and support. This explains for example that Inês Moura could be recruited and that activities regarding Task 4 could start early in 2012.

In the sequence of last plenary project meeting in January 2012, Prof. Sten Ternstrom and Dr. Nuno Fonseca submitted very interesting personal perspectives on the scope, status, possible difficulties e suggestions for the evolution of the different task of the project.

Prof. Sten highlighted the need that the «scope of [Task 1] be carefully defined and constrained, in collaboration with the singing teachers, to narrow the focus to something very specific that is manageable, yet still relevant and interesting», that Task 2 «[focuses] (a) on such modes of analysis that are especially relevant to singers, i.e. measures that relate as directly as possible to how the vocal sounds are produced physiologically; and (b) to perform frequent regular assessments of the instructional value of (new features of) the system» (the example of vibrato is given), that in the context of Task 4 difficulties will naturally arise since «It is a major step for a singer to concede that she or he has a voice problem, since it has career implications; also, singers train to overcome and mask the influence of voice problems», and that in the context of Task 6, one should be aware that «We can be confident only that the tolerance to overload is highly individual and that the risk would be highly dependent on personal vocal strategies for performing» (the product www.sonvox.com is an interesting study case). Also, concerning Task 3, the product scorecleaner.com should also be seen as an interesting study case.

Nuno Fonseca has highlighted the need to give the SingingStudio interface a flavor of a panel of real-time instruments such as tuning (between ± 0.5 semitone) and vibrato frequency; the need to migrate the SingingStudio software, which is Windows-based, to iOS-based platforms (iPhone, iPad, iPod) since these facilitate worldwide outreach and distribution, and also because they are clearly preferred by most singing teachers and students due to the ease of use, anywhere; the touch-screen functionality of SingingStudio using Windows is not natural and has little impact, investment should be directed to iOS-based enviroments (where the touch paradigms are quite different); and finally, the look-and-feel of SingingStudio should be redesigned to make it more professional and competitive, giving more freedom to the user. A good example we may take inspiration from the Adobe Premiere software.

Although most of the pointed out aspects have been somehow addressed in this report and in each particular task, they represent very valuable input to the discussions during the next plenary meeting concerning strategic decisions so that the outcomes of the project will leverage its impact beyond its conclusion by the end of 2013.

2.0 Other activities/facts

2.1 "International Conference on Computational Processing of the processing Language" (PROPOR)

Aníbal Ferreira has been invited for a Workshop at the "International Conference on Computational Processing of the processing Language" (PROPOR) that took place in Coimbra, Portugal, on April 17th 2012. About 20 researchers attended the workshop.

Title: Analysis and visual feedback of the singing Voice: is what you sing what you get ?

Abstract: A voice signal is multidimensional in the sense that it conveys information concerning not only the linguistic contents of the voice message, but also other layers of information, namely the accent and speaking style, the identity of the speaker and its mood. The importance of these stylistic layers is dominant in the case of the singing voice since they determine melody, expressivity, and emotion. The visual feedback of objective features extracted from the singing voice and denoting relevant musical characteristics is very beneficial in both singing teaching and learning. In this talk/tutorial, the challenges involved in feature extraction from the singing voice and due mainly to the typical high F0, will be addressed. An emphasis will be placed on melody and vibrato extraction and parameterization, as well as on formant estimation where the Sinusoids+Noise signal decomposition proves to be useful. Illustrative examples will be presented and future developments will be discussed in order to reduce the gap between machine listening and human listening in the case of the singing voice.

2.2 BIN@FEUP (Business and Innovation Network)

The ARTTS Team has been invited to participate in the Technologies Showroom of the event BIN@FEUP (Business and Innovation Network, paginas.fe.up.pt/~binporto2012/), on October 25th 2012. The two software applications SingingStudio and SingingBattle were on display, and a flyer and a video were prepared describing the ARTTS objectives and realizations. The following abstract was used in the programme of the event.

Abstract: ARTTS - Assistive Real-Time Technologies in Singing | ARTTS aims at providing singing students, teachers and professionals with solutions helping them to optimize singing learning and training in a visual, objective and intuitive way, and to perform safely. Two interactive software platforms (SingingStudio and SingingBattle) will be demonstrated featuring innovative functionalities, namely real-time visual feedback of relevant quality parameters of the singing voice, visually-oriented formal and entertaining singing exercises, and automatic singing to music score transcription. ARTTS is about seeing, understanding and mastering your voice!

The flyer is available at flyer-artts-web.pdf