Dr. Nora Aranberri (UPV/EHU)

Nora Aranberri is a researcher at the IXA natural language processing group and lecturer at the Faculty of Education of Bilbao at the University of the Basque Country. She specialises in the area of machine translation (MT), where her research focuses on integrating linguistic knowledge into the systems and their evaluation, and pays special attention to aspects related to their use by both professional translators and regular users. Although not exclusively, the language pairs she mainly works with involve Basque, providing her the opportunity to explore the implications MT can have for low-resource minority languages. She has also led hands-on workshops on post-editing with trainee and professional translators and collaborates with the Association of Translators, Correctors and Interpreters of Basque Language.

Parallel corpora in machine translation: opportunities, challenges, and... Basque

Parallel corpora are vital to the development and evaluation of many natural language processing applications. In many cases, however, compiling suitable parallel resources poses an enormous challenge. In this talk, we will focus on machine translation (MT) and consider a number of situations at different stages of the development and implementation cycle where parallel corpora play a key role. We will first concentrate on the development stage, and specifically consider the features of the data required to build the systems. We will look into ways in which researchers have tried to generate the parallel corpora, discussing examples of targeted manual generation and automatic generation, including the implications of back-translation. Secondly, we will examine the requirements of the parallel corpora used in the implementation stage to help users take full advantage of MT and also corpora compiled to draw conclusions on MT use by professional translators and regular users. Throughout the talk we will present specific examples where Basque is involved, allowing us to highlight the implications of working with a low-resource minority language.

Dr. Silvia Bernardini (University of Bologna)

Silvia Bernardini (Laurea (Bologna), MPhil (Cantab), PhD (MDX)) is Professor of English language and Translation and Head of the Department of Interpreting and Translation of the University of Bologna, Italy. She has taught a variety of courses (technical and scientific translation, translation methods and technology, English (corpus) linguistics), and has coordinated the Master’s in Specialised Translation. She has published widely in the fields of translation technology, translator education, English as a Lingua Franca, documentation and corpus linguistics, and has given invited talks on these topics in conferences all over the world, from China to Brazil.

 

Parallel corpora 2.0: collaborative multiparallel designs and the future of corpus-based translation studies

Dating back to the early 1990s, the birth of corpus-based translation studies (CBTS) is associated with a move away from equivalence-oriented methods toward descriptive comparisons of translated and non-translated texts in the target language. Monolingual comparable corpora have consequently gained in popularity, spurred by the ease with which existing methods could be applied to this novel research framework, as well as by theoretical considerations stemming from both translation studies (Toury 1995) and (neo-Firthian) corpus linguistics (Sinclair 1991). Parallel corpora have remained somewhat at the margins, despite calls for the inclusion of parallel components in monolingual corpus studies, and although their importance in the neighbouring fields of contrastive linguistics and applied translation (translation teaching, machine translation) has not faltered. Recent years have however witnessed a bourgeoning of new parallel corpus projects that position themselves within the theoretical framework of CBTS. In this talk I will describe two such projects, EPTIC (the European Parliament Interpreting and Translation Corpus, Bernardini et al. 2016) and MUST (the MUltilingual Student Translation corpus, Granger and Lefer 2018). I will illustrate the challenges and the potential of such collaborative multiparallel designs, and reflect on the ways in which they could revamp the discipline, and provide the means to bridge the gap between product and process in corpus-based translation research.

 

References

Bernardini, S., Ferraresi, A., and Miličević, M. (2016). "From EPIC to EPTIC — Exploring simplification in interpreting and translation from an intermodal perspective". Target 28, 61-86.

Granger, S. and Lefer, M.-A. (2018). "MUST: A collaborative corpus collection initiative for translation teaching and research". In Granger, S., Lefer, M.-A. & Aguiar de Souza Penha Marion, L. (eds) Book of Abstracts. Using Corpora in Contrastive and Translation Studies Conference (5th edition). CECL Papers 1. Louvain-la-Neuve: Centre for English Corpus Linguistics, Université catholique de Louvain. 72-73.

Sinclair, J. McH. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Toury, G. (1995). Descriptive translation studies and beyond. Amsterdam: John Benjamins.

Dr. Xavier Gómez Guinovart (Universidade de Vigo)

Xavier Gómez Guinovart is an assistant professor at the University of Vigo, where he teaches Computational Linguistics. He is the leader of the research group Tecnoloxías e Aplicacións da Lingua Galega (TALG, in its Galician acronym), in charge of running seminars on Computational Linguistics (http://sli.uvigo.gal). His research interests include linguistic applications of computing, the development of multilingual lexical resources and ontologies, and the construction and exploitation of corpora, both parallel and specialized. Dr Gómez Guinovart has led numerous projects about technologies applied to the Galician language. He is an active member of research networks, and has organised and assessed scientific and academic activities and journals. He is the editor of the journal Linguamática (http://linguamatica.com), devoted to the computational processing of the languages of the Iberian Peninsula.

Semantic networks in the construction and exploitation of parallel corpora

In this talk, I will explain the research on parallel corpora recently conducted in the Seminars of Computational Linguistics at the University of Vigo. I will focus on the use of lexico-semantic information, as provided by WordNet, in the construction and exploitation of the CLUVI and SensoGal corpora respectively. This combination of resources is possible in both directions, namely, from the parallel corpus to WordNet and from WordNet to the parallel corpus.

On the one hand, it is possible to apply on the parallel corpora a variety of equivalents extraction techniques that widen the lexical coverage of the wordnets of the languages under alignment. It is also possible to benefit from parallel corpora to obtain contexts of use, for WordNet, of the concepts compiled in the net, provided that the corpus is previously processed with a suitable semantic framework. 

On the other, WordNet may be used in the alignment of parallel corpora at the lexical level as well as in their lexico-semantic annotation. For example, the graph technique of semantic relations in WordNet is used for constricting semantic taggers able to disambiguate, lexically, parallel corpora. Another resource used for this purpose has been English language corpus SemCor, semantically annotated by the team who developed English WordNet in Princeton.

I will attempt to provide a wide overview of the many facets of the research in progress for the audience to perceive the benefits of lexico-semantic annotation in the construction and exploitation of parallel corpora.


References

Gómez Guinovart, X. & Solla Portela, M.A. (2020). Construction of a WordNet-based multilingual lexical ontology for Galician. In M. J. Domínguez Vázquez, M. Mirazo Balsa & C. Valcárcel Riveiro (Eds.) Studies on Multilingual Lexicography, pp. 179-196. De Gruyter, Berlin and Boston. Doi: https://doi.org/10.1515/9783110607659

Gómez Guinovart, X. (2019). Enriching parallel corpora with multimedia and lexical semantics: From the CLUVI Corpus to WordNet and SemCor. In I. Doval & M. Teresa Sánchez Nieto (Eds.), Parallel Corpora for Contrastive and Translation Studies: New resources and applications, pp 141-158. John Benjamins, Amsterdam. DOI: https://doi.org/10.1075/scl.90.09gom

Simões, A. & Gómez Guinovart, X. (2018). Extending the Galician wordnet using a multilingual Bible through lexical alignment and semantic annotation. In P. Rangel Henriques, J. P. Leal, A. Menezes Leitão & X. Gómez Guinovart (Eds.) 7th Symposium on Languages, Applications and Technologies (SLATE 2018), pp. 14:1-14:13. Schloss Dagstuhl/Leibniz-Zentrum fuer Informatik, Dagstuhl, DOI: https://doi.org/10.4230/OASIcs.SLATE.2018.14

Gómez Guinovart, X. & Solla Portela M.A. (2018). Building the Galician wordnet: Methods and applications. Language Resources and Evaluation, 52 (1) 317-339. DOI: https://doi.org/10.1007/s10579-017-9408-5

Dr. Signe Oksefjell Ebeling (University of Oslo)

Signe Oksefjell Ebeling is Professor of English language at the University of Oslo, Norway. Her research focuses on corpus-based contrastive analysis on topics such as verb semantics, phraseology and idiomaticity. Her publications include several papers on these contrastive topics as well as the monograph (with J. Ebeling) Patterns in Contrast (2013). She has co-edited several volumes on contrastive analysis and she was editor (with H. Hasselgård) of the international journal for contrastive linguistics Languages in Contrast (2014-2019). She has been a member of several corpus teams, including the English-Norwegian Parallel Corpus, its extension the English Norwegian Parallel Corpus+, and the Oslo Multilingual Corpus. She was a member of the project team on the Computational Processing of Portuguese (now Linguateca). She is currently engaged in the compilation of two comparable corpora: the English-Norwegian Match Report Corpus and the International Comparable Corpus.

https://www.hf.uio.no/ilos/english/people/aca/signeo/index.html

 

Bidirectional parallel corpora: Challenges and possibilities

In this talk I will start by outlining some of the main challenges relating to the use of bidirectional parallel corpora for contrastive research, offering some insights from my own experience of compiling and using parallel corpora of this kind. These challenges notwithstanding, I will then move on to describe the potential of bidirectional parallel corpora and give a snapshot of some of the possibilities they offer. More specifically, I will give examples of different kinds of contrastive studies that have benefitted from the bidirectional corpus design devised by Stig Johansson (Johansson & Hofland 1994). The selection of studies discussed, mainly from my own research, will range from lexical and lexico-grammatical studies of predefined items and patterns in two languages to more exploratory studies of n-grams.