XSL Content

Building Language Resources

General details of the subject

Face-to-face degree course

Description and contextualization of the subject

The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.

Teaching staff

NameInstitutionCategoryDoctorTeaching profileAreaE-mail
ALDEZABAL ROTETA, IZASKUNUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualBasque
GONZALEZ DIOS, ITZIARUniversity of the Basque CountryProfesorado Adjunto (Ayudante Doctor/A)DoctorBilingualBasque


Getting to know the existing tools for processing different languages (morphological, syntactic, semantic and parsers).20.0 %
Getting to know the general language resources for different languages.20.0 %
Understanding and using machine learning strategies in natural language processing (NLP).20.0 %
Using and adapting knowledge-based tools in NLP.20.0 %
Using and adapting existing NLP tools (morphological, syntactic, semantic and parsers) for different languages.20.0 %

Study types

TypeFace-to-face hoursNon face-to-face hoursTotal hours
Applied computer-based groups203050

Training activities

NameHoursPercentage of classroom teaching
Computer work practice, laboratory, site visits, field trips, external visits50.040 %
Lectures25.040 %

Assessment systems

NameMinimum weightingMaximum weighting
Drawing up reports and presentations50.0 % 50.0 %
Practical tasks50.0 % 50.0 %

Learning outcomes of the subject

- Knowledge and management of resources that are reference in the computational field, especially in English but also other languages.

- Knowledge and management of corpus evaluation systems.

- Assimilation of necessary concepts in the field of computational lexico-semantics: corpus linguistics, linguistic unit, lemma/stem/morpheme, semantic classes, senses and variants, hierarchy and conceptual equivalence (hyperonymy/hyponymy, synonymy), semantic relationships, semantic disambiguation, evaluation methods

- Linguistic criteria when designing and building linguistic resources.

Ordinary call: orientations and renunciation

Ordinary call:

- Class assignments (50%): exercises, notes, case analysis...

- Project (50%): research project on a topic discussed in class

Extraordinary call: orientations and renunciation

Extraordinary call:

- Final exam (100%) with theoretical and practical tests


1) Introduction to language resources

2) Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...

2a) Linguistic issues: conceptual gaps, cultural concepts...

2b) Applications

3) Syntactic-semantic databases and related corpora: EDBL, EPEC, Verbnet/PropBank, Nomlex / Nombank, Framenet...

3a) Language issues: entries, lexical units, morphological units, semantic roles, semantic classes, argument structure, lexical entries...

4) Annotation

5) Corpora evaluation: Intercoder Agreement, R basics


Compulsory materials

Material de clase disponible en eGela.

Basic bibliography

Robert Truswell: The Oxford Hanbook of Event Structure. Oxford University Press. 2019

Daniel Jurafsky, James H. Martin. Speech and Language Processing (2nd Edition). Pearson. 2008.

In-depth bibliography

Beth Levin. English Verb Classes and Alternations: A preliminary Investigation. The University of Chicago Press. 1993.