Subject
Building Language Resources
General details of the subject
- Mode
- Face-to-face degree course
- Language
- English
Description and contextualization of the subject
The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.Teaching staff
Name | Institution | Category | Doctor | Teaching profile | Area | |
---|---|---|---|---|---|---|
ALDEZABAL ROTETA, IZASKUN | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Basque Philology | izaskun.aldezabal@ehu.eus |
Competencies
Name | Weight |
---|---|
Getting to know the existing tools for processing different languages (morphological, syntactic, semantic and parsers). | 20.0 % |
Getting to know the general language resources for different languages. | 20.0 % |
Understanding and using machine learning strategies in natural language processing (NLP). | 20.0 % |
Using and adapting knowledge-based tools in NLP. | 20.0 % |
Using and adapting existing NLP tools (morphological, syntactic, semantic and parsers) for different languages. | 20.0 % |
Study types
Type | Face-to-face hours | Non face-to-face hours | Total hours |
---|---|---|---|
Lecture-based | 10 | 15 | 25 |
Applied computer-based groups | 20 | 30 | 50 |
Training activities
Name | Hours | Percentage of classroom teaching |
---|---|---|
Computer work practice, laboratory, site visits, field trips, external visits | 50.0 | 40 % |
Lectures | 25.0 | 40 % |
Assessment systems
Name | Minimum weighting | Maximum weighting |
---|---|---|
Drawing up reports and presentations | 50.0 % | 50.0 % |
Practical tasks | 50.0 % | 50.0 % |
Learning outcomes of the subject
- Knowledge and management of resources that are reference in the computational field, especially in English but also other languages.- Knowledge and management of corpus evaluation systems.
- Assimilation of necessary concepts in the field of computational lexico-semantics: corpus linguistics, linguistic unit, lemma/stem/morpheme, semantic classes, senses and variants, hierarchy and conceptual equivalence (hyperonymy/hyponymy, synonymy), semantic relationships, semantic disambiguation, evaluation methods
- Linguistic criteria when designing and building linguistic resources.
Ordinary call: orientations and renunciation
Ordinary call:- Class assignments (50%): exercises, notes, case analysis...
- Project (50%): research project on a topic discussed in class
Extraordinary call: orientations and renunciation
Extraordinary call:- Final exam (100%) with theoretical and practical tests
Temary
1) Introduction to language resources2) Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...
2a) Linguistic issues: conceptual gaps, cultural concepts...
2b) Applications
3) Syntactic-semantic databases and related corpora: EDBL, EPEC, Verbnet/PropBank, Nomlex / Nombank, Framenet...
3a) Language issues: entries, lexical units, morphological units, semantic roles, semantic classes, argument structure, lexical entries...
4) Annotation
5) Corpora evaluation: Intercoder Agreement, R basics
Bibliography
Compulsory materials
Material de clase disponible en eGela.Basic bibliography
Robert Truswell: The Oxford Hanbook of Event Structure. Oxford University Press. 2019Daniel Jurafsky, James H. Martin. Speech and Language Processing (2nd Edition). Pearson. 2008.
In-depth bibliography
Beth Levin. English Verb Classes and Alternations: A preliminary Investigation. The University of Chicago Press. 1993.Links
http://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perlhttps://verbs.colorado.edu/verb-index/
http://ixa2.si.ehu.es/e-rolda/index.php?lang=en
http://ixa2.si.ehu.es/stswiki/