Ruta de navegación

Publicador de contenidos

Defensa de tesis doctoral: Baliabide urriko hizkuntzetarako hizkuntza-eredu neuronalak

Autor: Gorka Urbizu Garmendia

Tesis: Baliabide urriko hizkuntzetarako hizkuntza-eredu neuronalak

Directores: Aitor Soroa / Saralegi Urizar

Día: 20 de octubre de 2025
Hora: 11:00h
Lugar: sala Ada Lovelace (Facultad de informática)

Abstract:

"This PhD thesis focuses on neural language models for low-resource languages, with a particular emphasis on Basque. It addresses three main research questions: how to evaluate language models for Basque extrinsically, how data scarcity affects model performance, and how Basque’s distinctive linguistic features influence pretraining. To answer these questions, new evaluation benchmarks have been developed, including BasqueGLUE for general language understanding in Basque and BL2MP, a dataset for evaluating Basque grammatical competence. Additionally, the study examines the scaling laws observed in BERT models in low-resource settings and their capacity to learn grammar in morphologically rich and syntactically flexible languages, such as Basque. The feasibility of using automatically translated synthetic data for pretraining is also explored. The findings and resources from this work provide a strong foundation for future research on language models for Basque and other low-resource languages."