IXA group


Koldo Mitxelena award for PhD theses to Arantxa Otegi


Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.


Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word sense disambiguation and semantic relatedness’.

The whole text is available here. This is the abstract:

Information retrieval (IR) aims at searching documents which satisfy the information need of an user. In that way, an IR system informs the user about relevant documents, that is those documents that contain the information they need as formulated in the query. Well-known search engines like Google and Yahoo are prime examples of IR systems.
A perfect IR system should retrieve only, and all, the relevant documents, rejecting the non-relevant ones. However, perfect retrieval systems do not exist. One of the main problems is the so-called vocabulary mismatch problem between query and documents: some documents might be relevant to the query even if the specific terms used differ substantially, or some documents might not be relevant to the query even they have some terms in common. The former is because several words or phrases can be used to express the same idea or item (synonymy). The latter is caused by ambiguity, where one word can have more than one interpretation depending on the context. Owing to these facts, if an IR system relies only on terms occurring in both the query and the document when it comes to deciding whether a document is relevant, it might be diffcult to fnd some of the interesting documents, and also to reject non-relevant documents. It seems fair to think that there will be more chances of successful retrieval if the meaning of the text is also taken into account.
Even though the vocabulary mismatch problem has been widely discussed in the literature from the early days of IR it remains unsolved, and most search engines just ignore it. This PhD dissertation explores whether natural language processing (NLP) can be used to alleviate this problem.
In a nutshell, we expand queries and documents making use of two NLP techniques, word sense disambiguation and semantic relatedness. For each of the mentioned techniques we propose an expansion strategy, in which we obtain synonyms and other related words for the words in the query and documents. We also present, for each case, a method to combine the expansions and original words effectively in an IR system. Furthermore, as the expansion technique we propose is useful for translating queries and documents, we show how a cross lingual information retrieval system could be improved using such an expansion technique.

Our extensive experiments on three datasets show that the expansion methods explored in this dissertation help overcome the mismatch problem, consequently improving the effectiveness of an IR system.

1 comment to Koldo Mitxelena award for PhD theses to Arantxa Otegi

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>