Subject
Building Language Resources
General details of the subject
- Mode
- Face-to-face degree course
- Language
- English
Description and contextualization of the subject
The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.Teaching staff
Name | Institution | Category | Doctor | Teaching profile | Area | |
---|---|---|---|---|---|---|
ALDEZABAL ROTETA, IZASKUN | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Basque Philology | izaskun.aldezabal@ehu.eus |
BARRENA MADINABEITIA, ANDER | University of the Basque Country | Profesorado Adjunto (Ayudante Doctor/A) | Doctor | Bilingual | Computer Languages and Systems | ander.barrena@ehu.eus |
GONZALEZ DIOS, ITZIAR | University of the Basque Country | Profesorado Adjunto (Ayudante Doctor/A) | Doctor | Bilingual | Basque Philology | itziar.gonzalezd@ehu.eus |
LARRAÑAGA OLAGARAY, MIGUEL | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Computer Languages and Systems | mikel.larranaga@ehu.eus |
LOPEZ DE LACALLE LECUONA, OIER | University of the Basque Country | Profesorado Adjunto (Ayudante Doctor/A) | Doctor | Bilingual | Computer Languages and Systems | oier.lopezdelacalle@ehu.eus |
Competencies
Name | Weight |
---|---|
Getting to know the existing tools for processing different languages (morphological, syntactic, semantic and parsers). | 20.0 % |
Getting to know the general language resources for different languages. | 20.0 % |
Understanding and using machine learning strategies in natural language processing (NLP). | 20.0 % |
Using and adapting knowledge-based tools in NLP. | 20.0 % |
Using and adapting existing NLP tools (morphological, syntactic, semantic and parsers) for different languages. | 20.0 % |
Study types
Type | Face-to-face hours | Non face-to-face hours | Total hours |
---|---|---|---|
Lecture-based | 10 | 15 | 25 |
Applied laboratory-based groups | 20 | 30 | 50 |
Training activities
Name | Hours | Percentage of classroom teaching |
---|---|---|
Computer work practice, laboratory, site visits, field trips, external visits | 50.0 | 40 % |
Lectures | 25.0 | 40 % |
Assessment systems
Name | Minimum weighting | Maximum weighting |
---|---|---|
Drawing up reports and presentations | 50.0 % | 50.0 % |
Practical tasks | 50.0 % | 50.0 % |
Learning outcomes of the subject
Awareness and usage of reference resources in computational semantics, especially in English, but also in other languages.Awareness and usage of corpora evaluation methods.
Assimilation of essential concepts in computational semantics: corpus linguistics, semantic classes, senses, variants, hierarchical and conceptual equivalences (hyperonymy/hyponymy, synonymy), semantic relations, semantic disambiguation and evaluation methods.
Linguistic criteria when designing and building linguistic resources.
Ordinary call: orientations and renunciation
Ordinary call:- Class assignments (50%): exercises, notes, case analysis...
- Project (50%): research project on a topic discussed in class
Extraordinary call: orientations and renunciation
Extraordinary call:- Final exam (100%) with theoretical and practical tests
Temary
1. Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...a. Linguistic issues: conceptual gaps, cultural concepts...
b. Applications: summarization and text simplification
2. Syntactic-semantic databases and related corpora: Verbnet / PropBank, Nomlex / Nombank, Framenet
a. Language issues: semantic roles, semantic classes, argument structure, lexical entries...
3. Annotation:
a. Word Similarity (WS), Sematic Textual Similarity (STS)
i. Linguistic issues: antonymy, similarity
b. Sentiment analysis
i. Linguistic issues: sentiments, polarity
4. Corpora evaluation: Intercoder Agreement, R basics
Bibliography
Compulsory materials
Material de clase disponible en eGela.Basic bibliography
Robert Truswell: The Oxford Hanbook of Event Structure. Oxford University Press. 2019Daniel Jurafsky, James H. Martin. Speech and Language Processing (2nd Edition). Pearson. 2008.
In-depth bibliography
Beth Levin. English Verb Classes and Alternations: A preliminary Investigation. The University of Chicago Press. 1993.Links
http://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perlhttps://verbs.colorado.edu/verb-index/
http://ixa2.si.ehu.es/e-rolda/index.php?lang=en
http://ixa2.si.ehu.es/stswiki/