Nora Aranberri is a researcher at the IXA natural language processing group and lecturer at the Faculty of Education of Bilbao at the University of the Basque Country. She specialises in the area of machine translation (MT), where her research focuses on integrating linguistic knowledge into the systems and their evaluation, and pays special attention to aspects related to their use by both professional translators and regular users. Although not exclusively, the language pairs she mainly works with involve Basque, providing her the opportunity to explore the implications MT can have for low-resource minority languages. She has also led hands-on workshops on post-editing with trainee and professional translators and collaborates with the Association of Translators, Correctors and Interpreters of Basque Language.
Parallel corpora are vital to the development and evaluation of many natural language processing applications. In many cases, however, compiling suitable parallel resources poses an enormous challenge. In this talk, we will focus on machine translation (MT) and consider a number of situations at different stages of the development and implementation cycle where parallel corpora play a key role. We will first concentrate on the development stage, and specifically consider the features of the data required to build the systems. We will look into ways in which researchers have tried to generate the parallel corpora, discussing examples of targeted manual generation and automatic generation, including the implications of back-translation. Secondly, we will examine the requirements of the parallel corpora used in the implementation stage to help users take full advantage of MT and also corpora compiled to draw conclusions on MT use by professional translators and regular users. Throughout the talk we will present specific examples where Basque is involved, allowing us to highlight the implications of working with a low-resource minority language.