Building Language Resources
General details of the subject
- Face-to-face degree course
Description and contextualization of the subjectThe main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.
|ALDEZABAL ROTETA, IZASKUN||University of the Basque Country||Profesorado Agregado||Doctor||Bilingual||Basque Philologyemail@example.com|
|GONZALEZ DIOS, ITZIAR||University of the Basque Country||Profesorado Adjunto (Ayudante Doctor/A)||Doctor||Bilingual||Basque Philologyfirstname.lastname@example.org|
|Getting to know the existing tools for processing different languages (morphological, syntactic, semantic and parsers).||20.0 %|
|Getting to know the general language resources for different languages.||20.0 %|
|Understanding and using machine learning strategies in natural language processing (NLP).||20.0 %|
|Using and adapting knowledge-based tools in NLP.||20.0 %|
|Using and adapting existing NLP tools (morphological, syntactic, semantic and parsers) for different languages.||20.0 %|
|Type||Face-to-face hours||Non face-to-face hours||Total hours|
|Applied computer-based groups||20||30||50|
|Name||Hours||Percentage of classroom teaching|
|Computer work practice, laboratory, site visits, field trips, external visits||50.0||40 %|
|Name||Minimum weighting||Maximum weighting|
|Drawing up reports and presentations||50.0 %||50.0 %|
|Practical tasks||50.0 %||50.0 %|
Learning outcomes of the subject- Knowledge and management of resources that are reference in the computational field, especially in English but also other languages.
- Knowledge and management of corpus evaluation systems.
- Assimilation of necessary concepts in the field of computational lexico-semantics: corpus linguistics, linguistic unit, lemma/stem/morpheme, semantic classes, senses and variants, hierarchy and conceptual equivalence (hyperonymy/hyponymy, synonymy), semantic relationships, semantic disambiguation, evaluation methods
- Linguistic criteria when designing and building linguistic resources.
Ordinary call: orientations and renunciationOrdinary call:
- Class assignments (50%): exercises, notes, case analysis...
- Project (50%): research project on a topic discussed in class
Extraordinary call: orientations and renunciationExtraordinary call:
- Final exam (100%) with theoretical and practical tests
Temary1) Introduction to language resources
2) Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...
2a) Linguistic issues: conceptual gaps, cultural concepts...
3) Syntactic-semantic databases and related corpora: EDBL, EPEC, Verbnet/PropBank, Nomlex / Nombank, Framenet...
3a) Language issues: entries, lexical units, morphological units, semantic roles, semantic classes, argument structure, lexical entries...
5) Corpora evaluation: Intercoder Agreement, R basics
Compulsory materialsMaterial de clase disponible en eGela.
Basic bibliographyRobert Truswell: The Oxford Hanbook of Event Structure. Oxford University Press. 2019
Daniel Jurafsky, James H. Martin. Speech and Language Processing (2nd Edition). Pearson. 2008.
In-depth bibliographyBeth Levin. English Verb Classes and Alternations: A preliminary Investigation. The University of Chicago Press. 1993.