IXA group


Mikel Artetxe awarded in Hackaton on Language Technologies organized by Red.es

Yesterday, Mikel Artetxe was awarded in Barcelona with the second prize in the First Hackaton on Language Technologies organized by Red.es in collaboration with  the Spanish Plan to promote Language Technology managed by the Spanish Government’s SESIAD agency.

This hackathon was organized in the context of  “4 Years From Now” (4YFN), the bussines platform created by Mobile World Capital Barcelona to promote technological startups. Several IXA members participated as organizers (German Rigau, Iñaki Alegria and Rodrigo Agerri).

Eight projects participated in the final session yesterday in Barcelona. Mikel developed a free alternative that allows the automatic creation of  bilingual dictionaries offering examples with real uses of words (an application similar to Linguee).

German Rigau keynote speaker in the JRC Conference TEXT MINING IN POLICY MAKING

IXA Group member German Rigau participated as keynote speaker last Monday in  the JRC Conference “TEXT MINING IN POLICY MAKING” organised by the European Commission in Brussels to present the new JRC competence centre on text mining. This new JRC has been organized with a showcase of various success stories of JRC applied text mining solutions. German Rigau addressed challenges related to textual data.

“This conference was an opportunity for policy makers from EU institutions to understand better the benefits of text mining in policy making processes, and pave the way forward for a better use of these solutions in policy making.

Information needed by policy makers is increasingly embedded in large amounts of textual data available on the Internet, e.g. traditional or social media, or in large public or proprietary document sets.

Text mining, the automatic extraction of information from text, offers policy makers timely access to important information which would otherwise be inaccessible. Indeed, the sheer volume of data makes it nearly impossible to extract the available information manually.”

Our papers in Japan (COLING 2016)

Those are our six papers in COLING 2016, taking place in Osaka, Japan, on Dec 11 2016.

HAP/LAP master theses (2016-09-27)

Master HAP/LAPhap-laptesi-irakurketa
EMLCT master
Master thesis defences

Date: September 27th
Place: Ada Lovelace room


Universal Dependencies for Buryat.
Author: Elena Badmaeva
Supervirors: Koldo Gojenola , Gosse Bouma

LexSynSimpleText, a lexical and syntactic simplifier: first steps.
Author: Maria Eguimendia
Supervirors: Arantza Diaz de Ilarraza and Gosse Bouma

Data Sparsity in Highly Inflected Languages: The Case of Morphosyntactic Tagging in Polish.
Egilea / Author: Michael Ustaszewski
Tutoreak / Supervirors: Rodrigo Agerri and German Rigau

Multilingual Central Repository version 3.0: improving a very large lexical knowledge base.
Egilea / Author: Daniel Parera Perez
Tutoreak / Supervirors: German Rigau Claramunt

Book: Microparameters in the Grammar of Basque

Edited by Beatriz Fernández (UPV/EHU) and Jon Ortiz de Urbina (Deusto University), this book is an endeavor to present and analyze some standard topics in the grammar of Basque from a micro-comparative perspective. From case and agreement to word order and the left periphery, and including an incursion into determiners, the book combines fine-grained theoretical analyses with empirically detailed descriptions. Working from a micro-parametric perspective, the contributions to the volume address in depth some of the exuberant variation attested in the different dialects and subdialects of Basque. At the same time, although the contributions focus mainly on Basque data, cross-linguistic evidence is also presented and discussed.
After all, the goal pursued in this book is to attempt to explain variation in Basque as a particular instantiation of variation in human language at large. The volume presents and analyzes a wide range of empirical phenomena, many typologically marked among European languages, and will therefore be a welcome resource to linguists looking for detailed description and/or theoretical discussion.

Nora Aranberri: Machine Translation for Translators (Innsbruck, 2016-07-20)

InsbrukSummertransOur colleague Nora Aranberri has been the lecturer in the workshop on “Machine Translation for Translators: Taking Advantage of the New Technology” at SummerTrans 2016.

The International Translation Summer School SummerTrans, was founded in Innsbruck in 2004.  From 11 to 20 July 2016 the University of Innsbruck hosted the 7th International Translation SummerSchool “SummerTrans VII: Quality and Competence in Translation”. Addressing trainee translators, professional translators and translation researchers alike, its varied programme featured cutting-edge courses and workshops aiming to advance participants’ theoretical knowledge of and practical skills in translation and interpreting, including state-of-the art translation technology and human-machine interaction in translation.
SummerTrans VII welcomed more than 60 participants from 16 countries spanning from Tunisia over half of Europe to India and China.NoraInnsbruck2016
Michael Ustaszewski, one of our students in Eramus Mundus LCT master2014-2016, now is a lecturer at the University of Innsbruck and one of the organizers of SummerTrans 2016  🙂
Michael told us that now the participants in the workshop know the state-of-the art translation technology and human-machine interaction in translation.


Nice results in Codefestdss2016 projects

This a list of the aims of the projects in CODEFEST 2016 summer school and the results achieved by each of them. Further information can be found in Codefest_dss2016 website.


Quiz Bowl: Multilingual question-answering for trivia games with Wikipedia


The QUIZ Bowl team was the winner in our codefest competition. Congratulations!

Aims:The question-answering trivia quiz project is in progress. To start the first game prototype, the team is using some of the questions translated into Basque on Monday. This prototype  matches the Basque Wikipedia articles with the questions or hints from the quiz, so that the answer to the hint pops out as an article.

Results: We had the chance tre o play a quiz based on Wikipedia trivia: Human vs. Computer. This time humans have been the winners, but by a very small margin only.

The code is available here: github.com/dss2016eu/codefest/tree/master/quizbowl
References to all the code generated in will also be posted there!

Create a morphological analyzer for your minority language

Aims:In order to develop the morphological analyzer for Hungarian language, Ixa group members Iñaki Alegria and Montse Maritxalar have gathered to offer their help in programming tasks. After creating a list of the lexical roots of Hungarian, they have made a selection based on verbs and adjectives, among other criteria. Afterwards, they want to computerize that selection through a specific program in lexc format.

Results: They have explained several projects they’ve been developing through these days, all of them related to machine translation devices: for Hungarian, Buryat (a variation of Mongolian), Rif Berber (language spoken mostly in Morocco), Uyghur (Turkic language spoken in Western China), among others.

NLP for Literature Analysis and Creation

Aims:Members of the group have chosen the name Story buffet for their tools for analysis and creation of literary texts. The team is made up of linguists, programmers and other experts who consider themselves to be “hybrids” of the two.

On the second day, we had a break so that people from Ixa group (the ones in charge of this project) could explain their work to us. Manex Agirrezabal is an expert on metrical analysis in poetry; therefore, along with his knowledge in programming/coding, he thinks this is a great chance to semantically alter short stories. Originally, Itziar Gonzalez-Dios’ field of study was linguistics, but she has joined the world of programming in the last few years; she is interested in the analysis of the complexity and synthesis of texts.

Results: They have showed their webpage (Story buffet) for literature creation and analysis, in a quite humorous way.


Aims:The team has continued developing the Behagunea project making use of their different abilities. Victor (programmer) has visualized the results of the Ixa-pipes, and he is working on designing an attractive interface. Also, Dani (IT expert) is trying to translate Ixa-pipes resources into Catalán. Sabrina (linguist), with the help of Iñaki (programmer), is starting an app based on tweets to study what countries think about each other. Finally, due to some problems, Kassandra has decided to put aside one of the projects: the one that aims to include social media in the website DSS2016EU Iritzien Behagunea (Opinion Observatory). Instead, she has chosen to examine the tweets about the DonostiCup football competition.

Results: They have accomplished their goals. Apart from adding new languages (Catalan, Italian) to the Behagunea project, they have managed to merge social media and geolocalization.

Enriching ZureTTS platform with new languages

Aims: Several aspects of the project ZureTTS have been treated. On the one hand, the members of Aholab have focused on developing the platform to include the dialect from Iparralde (the northern side of the Basque Country), and they have started both writing the questions for the voice donors and designing the new interface. Concerning the app for Android, they have spent the day identifying errors and preparing everything required to install the new platform. To conclude, in the “Ireland team” they have translated the webpage interface into Gaélic and contacted some Irish experts within their university to get hold of a good, reliable database.

Results: At the end of the week, apart from adding the Lapurtera (Basque dialect) version to the web, they’ve made a huge progress in Gaelic, thanks to the help of the Irish people specially.

SRL and Dockers

Aims: Members of the SRL project have been structuring a database to add and handle information later on. As Suhail Sarwan says, developments in SRL mean a direct benefit in the field of semantics, particularly if we want to promote and improve the e-learning model. Aided by Rodrigo Agerri, among others, they have worked on the SRL, and Eleanor Dutton intends to develop a tool for linguistic analysis and to apply it to Moroccan Arabic.

Results: They showed us a tool they have developed to identify the participants of the events described by the predicates within a sentence, by sequence tagging methods.

Machine Translation for minority languages

Aims: Each member of the group is focusing on the pair of languages in which he/she is fluent. Based on the program called Apertium, for example, they have started working on a translator for the language combination French-Occitan, so that they can later develop a linguistic analyzer for Occitan. They have also been working on a Tetum-Portuguese translator (the two official languages spoken on the island of Timor) with the same program. Others have started preparing lexical transfers (they will try to do the same with dependency transfers) for the English-Spanish combination using  Matxin. This exact same program also allows the creation of a English-Welsh translator, as well as a translator for English-Basque (one such translator already exists, but some errors must be identified and corrected). The latter will be applied in the field of medicine.

Results: They have explained several projects they’ve been developing through these days, all of them related to machine translation devices: for Hungarian, Buryat (a variation of Mongolian), Rif Berber (language spoken mostly in Morocco), Uyghur (Turkic language spoken in Western China), among others.

Erasmus Mundus LCT master. Annual Meeting 2016 in Donostia (June 09 - 10)

Seminar: Big Data and NLP at Trivago (Min Fang, 2016-06-08)

Talk: Big Data and NLP at Trivago
Speaker: Min Fang
………..2013 – 2015: Master Erasmus Mundus Language and Communication Technologies, summa cum laude
………..2015-… :   (Trivago, hotel metasearch)
When: Wed, 8 June, 10pm – 11pm
oom 3.2 gelan   map
I’m interested in getting insights from data by applying natural language processing, machine learning and statistical analyses. Ideally, those insights can then be turned into useful applications or facilitate higher level decisions.

Together with our software engineers I take care of our NLP capabilities: We work on improving and maintaining a highly flexible and scalable pipeline that is geared towards aspect-based sentiment analysis (and more in the future). Extracting knowledge from a large number of natural language texts allows us to understand our domain better and enhance the experience for our users.

Our technology stack includes:
– Python and Java
– R for analysis
– AWS for infrastructure

Ixa Group is one of the 15 institutional members of EAMT

Ixa Group is an institutional member in the European Association of Machine Translation  (EAMT) since 2012, the organization that serves the growing community of people interested in MT and translation tools, including users, developers, and researchers of this increasingly viable technology. Now we have pubished a new a page about IXA Group inside EAMT’s website.

The EAMT is one of three regional associations of the International Association for Machine Translation (IAMT). Its sister organizations are the Association for Machine Translation in the Americas (AMTA) and the Asia-Pacific Association for Machine Translation (AAMT).

Among other activities, the EAMT organizes the bi-annual MT Summit and the annual EAMT conferences, maintains the MT-List mailing list, and  compiles listings of companies and products which are distributed free or at nominal cost to its members (Compendium of Translation Software)

The current 15 corporate and institutional members are the following: