logoFEUP
Engineering Faculty, University of Porto
Doctoral Programme in Informatics Engineering
  ProDEI
Menu Principal

Luís António Diniz Fernandes de Morais Sarmento

Contact: las 'at' fe.up.pt

Admission Date: September, 2005
Status: In progress

Advisor: Prof. Eugénio Oliveira (DEEC/FEUP)
Steering Committee: Prof. Mário J. Silva (FCUL), Prof. Rui Camacho F. Silva (FEUP), Prof. Eugénio Oliveira (FEUP)

A semantic analyser for Portuguese for Automatic Question-Answering Systems

The purpose of this thesis is to develop a semantic analyser that will be able to identify and classify in Portuguese text several elements and semantic relations needed for supporting question-answering (QA) systems.

However, developing such an analyser is complex and requires several language resources that are not yet available for Portuguese. Manually coding all the rules and developing the needed languages resources is not scalable if we are aiming at wide-scope question answering systems. Therefore, our approach to building such a semantic analyser will be on developing unsupervised or lightly supervised automatic machine learning methods. Such methods use a set of seed examples (text instances and their classification) regarding what elements / structures the analyser should be able to identify. These examples will be used to obtain other similar examples from the web or large corpora using a bootstrap strategy. The learning system will iterate several times to build an expander set of examples from which it will be able to infer both the classification rules and the related lexicon. These rules that can then be directly applied for analysing new text.

As we expand the set of questions we are trying to answer, new requirements are imposed on the semantic analysis. The learning procedure can be started again with new examples to learn the corresponding rules that will allow the analyser to deal with the new cases. Using this approach we will be able to scale up the development of rules and lexical resources needed for a wide-scope QA system.

Publications

Hunting Answers with RAPOSA (FOX). Luís Sarmento; Working Notes of the Cross-Language Evalaution Forum Workshop (CLEF 2006) Alicant, Spain, 20-22 September, 2006

BACO - A large database of text and co-occurrences. Luís Sarmento; In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006) Genova, Italy, 22-28 May, 2006

Corpógrafo V3: From Terminological Aid to Semi-automatic Knowledge Engine. Luís Sarmento, Belinda Maia, Diana Santos, Ana Pinto & Luís Cabral; In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006) Genova, Italy, 22-28 May, 2006

Component Evaluation in a Question Answering System. Luís Costa & Luís Sarmento; In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006) Genova, Italy, 22-28 May, 2006

SIEMES - a Named-Entity Recognizer for Portuguese Relying on Similarity Rules. Luís Sarmento; In Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR'2006) Itatiaia, RJ, 13-17 May, 2006

REPENTINO - A collaborative wide-scope gazetteer for Entity Recognition in Portuguese. Luís Sarmento, Ana Sofia Pinto & Luís Cabral; In proceedings of Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR'2006) Itatiaia, RJ, 13-17 May, 2006

Contact Web Master Updated in: 2011-09-12