XSL Content

Building Language Resources

General details of the subject

Face-to-face degree course

Description and contextualization of the subject

The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.

Teaching staff

NameInstitutionCategoryDoctorTeaching profileAreaE-mail
ALDEZABAL ROTETA, IZASKUNUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualBasque
BARRENA MADINABEITIA, ANDERUniversity of the Basque CountryProfesorado Adjunto (Ayudante Doctor/A)DoctorBilingualComputer Languages and
GONZALEZ DIOS, ITZIARUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualBasque
LARRAÑAGA OLAGARAY, MIGUELUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualComputer Languages and
LOPEZ DE LACALLE LECUONA, OIERUniversity of the Basque CountryProfesorado Adjunto (Ayudante Doctor/A)DoctorBilingualComputer Languages and


Getting to know the existing tools for processing different languages (morphological, syntactic, semantic and parsers).20.0 %
Getting to know the general language resources for different languages.20.0 %
Understanding and using machine learning strategies in natural language processing (NLP).20.0 %
Using and adapting knowledge-based tools in NLP.20.0 %
Using and adapting existing NLP tools (morphological, syntactic, semantic and parsers) for different languages.20.0 %

Study types

TypeFace-to-face hoursNon face-to-face hoursTotal hours
Applied laboratory-based groups203050

Training activities

NameHoursPercentage of classroom teaching
Computer work practice, laboratory, site visits, field trips, external visits50.040 %
Lectures25.040 %

Assessment systems

NameMinimum weightingMaximum weighting
Drawing up reports and presentations50.0 % 50.0 %
Practical tasks50.0 % 50.0 %

Learning outcomes of the subject

Awareness and usage of reference resources in computational semantics, especially in English, but also in other languages.

Awareness and usage of corpora evaluation methods.

Assimilation of essential concepts in computational semantics: corpus linguistics, semantic classes, senses, variants, hierarchical and conceptual equivalences (hyperonymy/hyponymy, synonymy), semantic relations, semantic disambiguation and evaluation methods.

Linguistic criteria when designing and building linguistic resources.

Ordinary call: orientations and renunciation

Ordinary call:

- Class assignments (50%): exercises, notes, case analysis...

- Project (50%): research project on a topic discussed in class

Extraordinary call: orientations and renunciation

Extraordinary call:

- Final exam (100%) with theoretical and practical tests


1. Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...

a. Linguistic issues: conceptual gaps, cultural concepts...

b. Applications: summarization and text simplification

2. Syntactic-semantic databases and related corpora: Verbnet / PropBank, Nomlex / Nombank, Framenet

a. Language issues: semantic roles, semantic classes, argument structure, lexical entries...

3. Annotation:

a. Word Similarity (WS), Sematic Textual Similarity (STS)

i. Linguistic issues: antonymy, similarity

b. Sentiment analysis

i. Linguistic issues: sentiments, polarity

4. Corpora evaluation: Intercoder Agreement, R basics


Compulsory materials

Material de clase disponible en eGela.

Basic bibliography

Robert Truswell: The Oxford Hanbook of Event Structure. Oxford University Press. 2019

Daniel Jurafsky, James H. Martin. Speech and Language Processing (2nd Edition). Pearson. 2008.

In-depth bibliography

Beth Levin. English Verb Classes and Alternations: A preliminary Investigation. The University of Chicago Press. 1993.