Subject

XSL Content

Building Language Resources

General details of the subject

Mode
Face-to-face degree course
Language
English

Description and contextualization of the subject

The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.

Teaching staff

NameInstitutionCategoryDoctorTeaching profileAreaE-mail
ALDEZABAL ROTETA, IZASKUNUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualBasque Philologyizaskun.aldezabal@ehu.eus
GONZALEZ DIOS, ITZIARUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualBasque Philologyitziar.gonzalezd@ehu.eus

Competencies

NameWeight
Conocimiento de las herramientas existentes para el procesamiento de diferentes lenguas (analizadores morfológicos, sintácticos, semánticos).20.0 %
Conocimiento de los recursos lingüísticos masivos existentes para diferentes lenguas20.0 %
Capacidad para comprender estrategias de aprendizaje automático en el procesamiento del lenguaje humano.20.0 %
Habilidad para manejar las estrategias y herramientas basadas en conocimiento para el procesamiento del lenguaje humano.20.0 %
Habilidad para el manejo y adaptación de las herramientas existentes para el procesamiento de diferentes lenguas (analizadores morfológicos, sintácticos, semánticos...).20.0 %

Study types

TypeFace-to-face hoursNon face-to-face hoursTotal hours
Lecture-based101525
Applied computer-based groups203050

Learning outcomes of the subject

- Knowledge and management of resources that are reference in the computational field, especially in English but also other languages.

- Knowledge and management of corpus evaluation systems.

- Assimilation of necessary concepts in the field of computational lexico-semantics: corpus linguistics, linguistic unit, lemma/stem/morpheme, semantic classes, senses and variants, hierarchy and conceptual equivalence (hyperonymy/hyponymy, synonymy), semantic relationships, semantic disambiguation, evaluation methods

- Linguistic criteria when designing and building linguistic resources.

Ordinary call: orientations and renunciation

- Gelako ariketak (% 50): ariketak, apunteak, kasu praktikoak...



- Proiektua (% 50): ikasgaian landutako gai bati buruzko ikerketa proiektua

Extraordinary call: orientations and renunciation

- Azteketa (% 100) galdera teoriko eta praktikoekin

Temary

1) Introduction to language resources

2) Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...

2a) Linguistic issues: conceptual gaps, cultural concepts...

2b) Applications

3) Syntactic-semantic databases and related corpora: EDBL, EPEC, Verbnet/PropBank, Nomlex / Nombank, Framenet...

3a) Language issues: entries, lexical units, morphological units, semantic roles, semantic classes, argument structure, lexical entries..

4) Annotation

5) Corpora evaluation: Intercoder Agreement, R basics

Bibliography

Compulsory materials

Material de clase disponible en eGela.

Basic bibliography

Robert Truswell: The Oxford Hanbook of Event Structure. Oxford University Press. 2019

Daniel Jurafsky, James H. Martin. Speech and Language Processing (2nd Edition). Pearson. 2008.

In-depth bibliography

Beth Levin. English Verb Classes and Alternations: A preliminary Investigation. The University of Chicago Press. 1993.

Links

http://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl



https://verbs.colorado.edu/verb-index/



http://ixa2.si.ehu.es/e-rolda/index.php?lang=en



http://ixa2.si.ehu.es/stswiki/



XSL Content

It was not possible to load the content, please try again later. In case the problem persists contact CAU (Phone: 916014400 / E-mail: cau@ehu.eus / Website: https://lagun.ehu.eus).