IXA group


Be a friend of the Minority SafePack!

We call upon the EU to adopt a set of legal acts to improve the protection of persons belonging to national and linguistic minorities and strengthen cultural and linguistic diversity in the Union. It shall include policy actions in the areas of regional and minority languages, education and culture, regional policy, participation, equality, audiovisual and other media content, and also regional (state) support

A European citizens’ initiative is an invitation to the European Commission to propose legislation on matters where the EU has competence to legislate. A citizens’ initiative has to be backed by at least one million EU citizens, coming from at least 7 out of the 28 member states. A minimum number of signatories is required in each of those 7 member states.

Minority SafePackiniciative has got 849.888 signatures. 150.000 more are needed in two weeks.

 You can sign here

In the European Union there are about 50 million people who belong to a national minority or a minority language community.

Science journal: 'Ixa opens a new research avenue: Machine Translation without a dictionary?'

Science reported this week about the work recently published by our colleagues Mikel Artetxe, Eneko Agirre and Gorka Labaka: Artificial intelligence goes bilingual—without a dictionary
In October the 30th our three colleagues published a pre-print paper entitled  Unsupervised Neural Machine Translation in collaboration with Kyunghyun Cho.
One day later G. Lample published another paper with similar contents  entitled Unsupervised Machine Translation Using Monolingual Corpora Only. Both papers are under consideration at ICLR 2018.
Those are some sentences written by Matthew Hutson a freelance writer covering technology for Science:

[…] two new papers show that neural networks can learn to translate with no parallel texts—a surprising advance that could make documents in many languages more accessible.

[…]  Imagine that you give one person lots of Chinese books and lots of Arabic books—none of them overlapping—and the person has to learn to translate Chinese to Arabic. That seems impossible, right?” says the first author of one study, Mikel Artetxe, a computer scientist at the University of the Basque Country (UPV) in San Sebastián, Spain. “But we show that a computer can do that.”

[…]  “This is in infancy,” Artetxe’s co-author Eneko Agirre cautions. “We just opened a new research avenue, so we don’t know where it’s heading.”

[…] Artetxe says the fact that his method and Lample’s—uploaded to arXiv within a day of each other—are so similar is surprising. “But at the same time, it’s great. It means the approach is really in the right direction.”

Congratulations Mikel, Eneko, Gorka and Kyunghyun!

Course: Deep Learning for Natural Language Processing (4,5 ECTS, February)

Are the meanings of these two words related? (Eneko’s Google Award 2015)

Course: Deep Learning for Natural Language Processing

    Course open to anyone, see details and pre-requisite information below.
    Deep Learning neural network models have been successfully applied to natural language processing, and are now changing radically how we interact with machines (Siri, Amazon Alexa, Google Home, Skype translator, Google Translate, or the Google search engine). These models are able to infer a continuous representation for words and sentences, instead of using hand-engineered features as in other machine learning approaches. The seminar will introduce the main deep learning models used in natural language processing, allowing the attendees to gain hands-on understanding and implementation of them in Tensorflow.


Introduction to machine learning and NLP with Tensorflow, Deep learning, Word embeddings, Language modeling and recurrent neural networks, Convolutional neural networks, Attention mechanisms

Instructors :Eneko Agirre & Oier Lopez de Lacalle

Practical details

Part of the Language Analysis and Processing master program
Schedule: Twelve days, February 5-8, 19-22, 26-28 and March 1 (2018)
Time: 17:30 – 20:00
Where: Lab 0.1, Computer science faculty, San Sebastian
Teaching language: English
Capacity: 20 students (selected according to CV)
Price: 180€
4.5 ECTS credits


Pre-registration and contact: send an e-mail with CV to amaia.lorenzo@ehu.eus and e.agirre@ehu.eus
Pre-registration open: now to 24th of December
Prerequisite: Basic programming experience, a university-level course in computer science and experience in Python.
Basic math skills (algebra or pre-calculus) are also needed.

Presentation: Research groups in the Faculty of Informatics (2017-10-10, 10:00-11:10)

Tomorrow morning the research groups in the Faculty of Informatics will present their work to the students.

Date: Tuesday, October 10
Time: 10:05-11:10
Where: Ada-Lovelace room
Audience: Students of 3rd & 4th levels
Subject: Presentation of research subjects and groups in the Faculty.
IXA Group’s collaboration with students: job opportunities for undergraduate students, scholarships…

HAP/LAP master theses (2017-09-26)

Master HAP/LAP  —  EMLCT master
Master thesis defences


Izenburua / Title: Automatic Generation of Named Entity Taggers Leveraging Parallel Corpora
Egilea / Author: Yi-Ling Chung (EMLCT)
Tutoreak / Supervirors: Rodrigo Agerri and German Rigau


Izenburua / Title: Dialect normalisation with deep learning-based automatic speech recognition
Egilea / Author: Mahsa Vafaie (EMLCT)
Tutoreak / Supervirors
: Inma Hernaez, Josef Van Genabith
Izenburua / Title: Mapping of Electronic Health Records in Spanish to the Unified Medical Language System Metathesaurus
Egilea / Author: Naiara Perez (HAP/LAP)
Tutoreak / Supervirors
: Montse Cuadros and German Riga

Best paper award in SEPLN2017

Last week, our colleagues Begoña Altuna, María Jesús Aranzabe, and Arantza Diaz de Ilarraza were awarded in Murcia with the best paper award in the 33rd INTERNATIONAL CONFERENCE OF THE SPANISH SOCIETY FOR NATURAL LANGUAGE PROCESSING (SEPLN 2017)


The paper is available here: EusHeidelTime: Time Expression Extraction and Normalisation for Basque

Temporal information helps to organise the information in texts as it places the actions and states in time. It is therefore very important to identify the time points and intervals in the text, as well as what times they refer to. We developed EusHeidelTime for Basque time expression extraction and normalisation. For it, we analysed time expressions in Basque, we created the rules and resources for the tool and we built corpora for development and testing. We finally ran an experiment to evaluate EusHeidelTime’s performance. We achieved satisfactory results and we proved the adaptability of the tool for morphologically rich languages.

PhD Thesis: Computational Model for Semantic Textual Similarity (A. Gonzalez, 2017/07/07)

Title: Computational Model for Semantic Textual Similarity
Author: Aitor Gonzalez-Agirre
Supervisors: German Rigau i Claramunt  / Eneko Agirre Bengoa (Ixa Group)
Date: July 7, 2017, Friday
Time: 11:00
Where:  Faculty of Informatics, Ada Lovelace Room (UPV/EHU)


The goal is to advance on computational models of meaning and their evaluation. We define two tasks: Semantic Textual Similarity (STS) and Typed Similarity.

STS aims to measure the degree of semantic equivalence between two sentences. We have collected pairs of sentences to construct datasets for STS, a total of 15,436 pairs of sentences, being by far the largest collection of data for STS.  We have designed, constructed and evaluated a new approach to combine knowledge-based and corpus-based methods using a cube.

Typed Similarity tries to identify the type of relation that holds between a pair of similar items in a digital library. Providing a reason why items are similar has applications in recommendation, personalization, and search. A range of types of similarity in this collection were identified and a set of 1,500 pairs of items from the collection were annotated using crowdsourcing.

We present systems that resolve the Typed Similarity task.

HAP/LAP master thesis (Noelia Migueles, 2017-06-27)

Today afternoon, June the 27th, Noelia Migueles will defend her master thesis.

Date: june 27th, 15:00
Place: Ada Lovelace room

Izenburua / Title: A Study Towards Spanish Abstract Meaning Representation

Egilea / Author: Noelia Migueles-Abraira

Tutoreak / Supervirors: Arantza Diaz de Ilarraza and Rodri Agerri

Talk: Computational explorations of creative language (C. Strapparava, 2017-07-07)

Speaker: Carlo Strapparava
…………….FBK-irst (Fondazione Bruno Kessler – Istituto per la ricerca scientifica e Tecnologica)
Date: July 7, 2017,
Time: 09:30
Place: UPV/EHUko Informatika Fakultatea, Manuel de Lardizabal 1, 20018 Donostia (map)
Title: Computational explorations of creative language


Dealing with creative language and in particular with affective, persuasive and even humorous language has often been considered outside the scope of computational linguistics. Nonetheless, it is possible to exploit current NLP techniques starting some explorations about it. We briefly review some computational experiences about these typical creative genres.

Short bio:

Carlo Strapparava is a senior researcher at FBK-irst (Fondazione Bruno Kessler – Istituto per la ricerca scientifica e Tecnologica) in the Human Language Technologies Unit.
His research activity covers artificial intelligence, natural language processing, intelligent interfaces, human-computer interaction, cognitive science, knowledge-based systems, user models, adaptive hypermedia, lexical knowledge bases, word-sense disambiguation, affective computing and computational humour. He is the author of over 200 papers, published in scientific journals, book chapters and in conference proceedings. He has the Italian scientific habilitation for full professor in informatics and engineering.
He regularly serves in the program committees of the major NLP conferences (ACL, EMNLP, etc.). He was executive board member of SIGLEX, a Special Interest Group on the Lexicon of the Association for Computational Linguistics (2007-2010), Senseval (Evaluation Exercises for the Semantic Analysis of Text) organisation committee (2005-2010).
On June 2011, he was awarded with a Google Research Award on Natural Language Processing, specifically on the computational treatment of creative language.

PhD Thesis: Automatic Scansion of Poetry (M. Agirrezabal, 2017/06/19)

Title: Automatic Scansion of Poetry
Author: Manex Agirrezabal Zabaleta
Supervisors: Dr. Iñaki Alegria Loinaz and Dr. Mans Hulden
Date: June 19, 2017, Monday
Time: 12:00
Where:  Faculty of Informatics, Ada Lovelace Room (UPV/EHU)

Research questions:

  • What do we need to know when analyzing a poem, and how can we capture it?
  • Does language-specific linguistic knowledge contribute when analyzing poetry?
  • Is it possible to analyze a poem with any language-specific information?
    Is such analysis something that can be learnt?