IXA group


PhD Thesis: Computational Model for Semantic Textual Similarity (A. Gonzalez, 2015/07/07)

Title: Computational Model for Semantic Textual Similarity
Author: Aitor Gonzalez-Agirre
Supervisors: German Rigau i Claramunt  / Eneko Agirre Bengoa (Ixa Group)
Date: July 7, 2017, Friday
Time: 11:00
Where:  Faculty of Informatics, Ada Lovelace Room (UPV/EHU)


The goal is to advance on computational models of meaning and their evaluation. We define two tasks: Semantic Textual Similarity (STS) and Typed Similarity.

STS aims to measure the degree of semantic equivalence between two sentences. We have collected pairs of sentences to construct datasets for STS, a total of 15,436 pairs of sentences, being by far the largest collection of data for STS.  We have designed, constructed and evaluated a new approach to combine knowledge-based and corpus-based methods using a cube.

Typed Similarity tries to identify the type of relation that holds between a pair of similar items in a digital library. Providing a reason why items are similar has applications in recommendation, personalization, and search. A range of types of similarity in this collection were identified and a set of 1,500 pairs of items from the collection were annotated using crowdsourcing.

We present systems that resolve the Typed Similarity task.

HAP/LAP master thesis (Noelia Migueles, 2017-06-27)

Today afternoon, June the 27th, Noelia Migueles will defend her master thesis.

Date: june 27th, 15:00
Place: Ada Lovelace room

Izenburua / Title: A Study Towards Spanish Abstract Meaning Representation

Egilea / Author: Noelia Migueles-Abraira

Tutoreak / Supervirors: Arantza Diaz de Ilarraza and Rodri Agerri

Talk: Computational explorations of creative language (C. Strapparava, 2017-07-07)

Speaker: Carlo Strapparava
…………….FBK-irst (Fondazione Bruno Kessler – Istituto per la ricerca scientifica e Tecnologica)
Date: July 7, 2017,
Time: 09:30
Place: UPV/EHUko Informatika Fakultatea, Manuel de Lardizabal 1, 20018 Donostia (map)
Title: Computational explorations of creative language


Dealing with creative language and in particular with affective, persuasive and even humorous language has often been considered outside the scope of computational linguistics. Nonetheless, it is possible to exploit current NLP techniques starting some explorations about it. We briefly review some computational experiences about these typical creative genres.

Short bio:

Carlo Strapparava is a senior researcher at FBK-irst (Fondazione Bruno Kessler – Istituto per la ricerca scientifica e Tecnologica) in the Human Language Technologies Unit.
His research activity covers artificial intelligence, natural language processing, intelligent interfaces, human-computer interaction, cognitive science, knowledge-based systems, user models, adaptive hypermedia, lexical knowledge bases, word-sense disambiguation, affective computing and computational humour. He is the author of over 200 papers, published in scientific journals, book chapters and in conference proceedings. He has the Italian scientific habilitation for full professor in informatics and engineering.
He regularly serves in the program committees of the major NLP conferences (ACL, EMNLP, etc.). He was executive board member of SIGLEX, a Special Interest Group on the Lexicon of the Association for Computational Linguistics (2007-2010), Senseval (Evaluation Exercises for the Semantic Analysis of Text) organisation committee (2005-2010).
On June 2011, he was awarded with a Google Research Award on Natural Language Processing, specifically on the computational treatment of creative language.

PhD Thesis: Automatic Scansion of Poetry (M. Agirrezabal, 2015/06/19)

Title: Automatic Scansion of Poetry
Author: Manex Agirrezabal Zabaleta
Supervisors: Dr. Iñaki Alegria Loinaz and Dr. Mans Hulden
Date: June 19, 2017, Monday
Time: 12:00
Where:  Faculty of Informatics, Ada Lovelace Room (UPV/EHU)

Research questions:

  • What do we need to know when analyzing a poem, and how can we capture it?
  • Does language-specific linguistic knowledge contribute when analyzing poetry?
  • Is it possible to analyze a poem with any language-specific information?
    Is such analysis something that can be learnt?



Neural Machine Translation. Open workshop with Kyunghyun Cho (2017-05-29)

Neural Machine Translation
Open workshop with Kyunghyun Cho
Donostia, 2017-05-29

The third generation of machine translation systems is currently under active development. After initially dominating the field, rule-based machine translation (RBMT) systems have been gradually replaced by data-driven approaches in the last two decades, with statistical machine translation (SMT) systems prevailing as the main paradigm. In the last two years, deep learning approaches have significantly impacted the field, with the rise of neural machine translation (NMT) as the new state-of-the-art in automated translation. This event presents advanced results in the field, in particular for machine translation of Basque.

The MODELA project was created to advance research and development in deep-learning approaches to machine translation and to address the many challenges of Basque machine translation. The project is financed by the Basque Government and is being carried out by the following entities: Ametzagaiña, Elhuyar, ISEA, UPV/EHU (IXA group) and Vicomtech-IK4.


The main speaker will be Kyunghyun Cho (Center for Data Science, New York University), who is an eminent researcher in the area, the most referenced on NMT, a field in which he has obtained a Google prize. Additionally, he is a a brilliant speaker.

Date: May 29, 2017, 11:00
Place: UPV/EHUko Informatika Fakultatea, Manuel de Lardizabal 1, 20018 Donostia (map)

  • 11.00-11.15: Introduction and presentation of the project
  • 11.15-12.30: Neural Machine Translation (Kyunghyun Cho)
  • 12.30-13.15: First results in the Modela project

Sponsor: Modela project and University of the Basque Country

Next day, on Tuesday May 30, at 15:00 he will be with the students of our Master on Language Technology

PhD position in Innsbruck with Michael Ustaszewski

After finishing our Erasmus Mundus LCT Master in 2016 Michael Ustaszewski is now a postdoc assistant at the University of Innsbruck, and Unit Manager (liaison with the Department of Translation Studies) at the Innsbruck Translation Centre. His group is working on Corpus-Based Translation and asked us to publish this Call for PhD Position Candidates:

The Department of Translation Studies at the University of Innsbruck invites applications for a PhD position in the framework of the two-year research project “TransBank: A Meta-Corpus for Translation Research” funded by the Austrian Academy of Sciences.

The goal of the project is to build a large, open and expandable bank of translated texts and their original texts. Its main innovative feature is the ability to exploit a rich set of metadata labels characterising each text and text pair for the compilation and download of sub-corpora, tailored to the requirements of specific translation-related research questions.

The PhD student will be involved in all stages of the corpus building process, thus having the opportunity to gather translation data relevant to his/her specific research interest. The student will work autonomously on the development of the metadata labelset and on collecting translation data, on the basis of which he or she will conduct quantitative and/or qualitative analyses for his/her thesis. Work will be carried out in close collaboration with the project’s two principal investigators and two MA students.

The following requirements are looked for in the successful candidate:

  • Master’s degree in Translation Studies, Corpus Linguistics,
  • Computational Linguistics or a related field
  • proven familiarity with translation theory
  • strong interest in data-driven research methodologies and linguistic annotation
  • excellent teamwork skills
  • proficiency in English on a level suitable for written and spoken scientific communication
  • solid programming skills in a scripting language (e.g. Python) will be an asset, as will knowledge of German or any other language(s)

The two-year position with a weekly working time of 20 hours (50%) commences in September 2017 and offers an annual stipend of € 19,117 plus allowances for conference attendance. The position involves enrolment in the PhD programme in Linguistics and Media Studies at the University of Innsbruck.

Applications should include:

  1. A cover letter (1 page maximum) that relates the candidate’s  experience and interest in the TransBank project
  2. A two-page thesis proposal describing the research question and methodology underlying the candidate’s envisaged analyses using TransBank data
  3. A CV listing any publications
  4. Copies of relevant diplomas and certificates
  5. A recommendation letter by the candidate’s MA thesis supervisor or a university professor
  6. A copy of the MA thesis or the latest draft

To apply, please submit the documents in two PDF files (one containing documents 1 to 5, one containing document 6) by 10 April 2017 via the upload form at http://transbank.info/jobs

Shortlisted applicants will be interviewed in person or via Skype towards the end of April.

Further information:

Details on the research project can be found on the project website http:/www.transbank.info
For enquiries about the position and the application process, please contact mail[at]transbank.info
Information about the Department of Translation Studies at the University of Innsbruck: http://translation.uibk.ac.at
For information on the PhD programme in Linguistics and Media Studies at the University of Innsbruck and the enrolment process, please refer to

Mikel Artetxe awarded in Hackaton on Language Technologies organized by Red.es

Yesterday, Mikel Artetxe was awarded in Barcelona with the second prize in the First Hackaton on Language Technologies organized by Red.es in collaboration with  the Spanish Plan to promote Language Technology managed by the Spanish Government’s SESIAD agency.

This hackathon was organized in the context of  “4 Years From Now” (4YFN), the bussines platform created by Mobile World Capital Barcelona to promote technological startups. Several IXA members participated as organizers (German Rigau, Iñaki Alegria and Rodrigo Agerri).

Eight projects participated in the final session yesterday in Barcelona. Mikel developed a free alternative that allows the automatic creation of  bilingual dictionaries offering examples with real uses of words (an application similar to Linguee).

German Rigau keynote speaker in the JRC Conference TEXT MINING IN POLICY MAKING

IXA Group member German Rigau participated as keynote speaker last Monday in  the JRC Conference “TEXT MINING IN POLICY MAKING” organised by the European Commission in Brussels to present the new JRC competence centre on text mining. This new JRC has been organized with a showcase of various success stories of JRC applied text mining solutions. German Rigau addressed challenges related to textual data.

“This conference was an opportunity for policy makers from EU institutions to understand better the benefits of text mining in policy making processes, and pave the way forward for a better use of these solutions in policy making.

Information needed by policy makers is increasingly embedded in large amounts of textual data available on the Internet, e.g. traditional or social media, or in large public or proprietary document sets.

Text mining, the automatic extraction of information from text, offers policy makers timely access to important information which would otherwise be inaccessible. Indeed, the sheer volume of data makes it nearly impossible to extract the available information manually.”

Our papers in Japan (COLING 2016)

Those are our six papers in COLING 2016, taking place in Osaka, Japan, on Dec 11 2016.

HAP/LAP master theses (2016-09-27)

Master HAP/LAPhap-laptesi-irakurketa
EMLCT master
Master thesis defences

Date: September 27th
Place: Ada Lovelace room


Universal Dependencies for Buryat.
Author: Elena Badmaeva
Supervirors: Koldo Gojenola , Gosse Bouma

LexSynSimpleText, a lexical and syntactic simplifier: first steps.
Author: Maria Eguimendia
Supervirors: Arantza Diaz de Ilarraza and Gosse Bouma

Data Sparsity in Highly Inflected Languages: The Case of Morphosyntactic Tagging in Polish.
Egilea / Author: Michael Ustaszewski
Tutoreak / Supervirors: Rodrigo Agerri and German Rigau

Multilingual Central Repository version 3.0: improving a very large lexical knowledge base.
Egilea / Author: Daniel Parera Perez
Tutoreak / Supervirors: German Rigau Claramunt