Subject
Corpus Linguistics
General details of the subject
- Mode
- Face-to-face degree course
- Language
- English
Description and contextualization of the subject
En este curso estudiaremos el uso de corpus textuales en la lingüística computacional. Empezaremos con una breve introducción a corpus lingüísticos, incluyendo las anotaciones lingüísticas y los esquemas de representación. A continuación, trabajaremos la extracción de información relevante del corpus, como pueden ser colocaciones o la extracción de palabras clave utilizando técnicas estadísticas y distribucionales. Por último, aprenderemos el lenguaje de etiquetado XML. A lo largo del curso trabajaremos con corpus en varios idiomas (inglés, español, euskera, etc).Teaching staff
Name | Institution | Category | Doctor | Teaching profile | Area | |
---|---|---|---|---|---|---|
ARBELAIZ GALLEGO, OLATZ | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Computer Architecture and Technology | olatz.arbelaitz@ehu.eus |
PEREZ RAMIREZ, ALICIA | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Computer Languages and Systems | alicia.perez@ehu.eus |
SOROA ECHAVE, AITOR | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Science of Computation and Artificial Intelligence | a.soroa@ehu.eus |
Competencies
Name | Weight |
---|---|
Ability to design and manage big linguistic resources (textual and speech corpora, multilingual corpora and lexical-semantic databases). | 40.0 % |
Ability to develop heuristics and to modify classic algorithms to adapt them for specific tasks. | 20.0 % |
Ability to design and manage systems based on standard annotation languages based on XML such as TEI or NAF. | 40.0 % |
Study types
Type | Face-to-face hours | Non face-to-face hours | Total hours |
---|---|---|---|
Lecture-based | 10 | 15 | 25 |
Applied laboratory-based groups | 20 | 30 | 50 |
Training activities
Name | Hours | Percentage of classroom teaching |
---|---|---|
Computer work practice, laboratory, site visits, field trips, external visits | 50.0 | 40 % |
Lectures | 25.0 | 40 % |
Assessment systems
Name | Minimum weighting | Maximum weighting |
---|---|---|
Attendance and participation | 20.0 % | 20.0 % |
Portfolio | 20.0 % | 20.0 % |
Practical tasks | 40.0 % | 40.0 % |
Presentations | 20.0 % | 20.0 % |
Learning outcomes of the subject
The purpose of the course is to offer the students the capabilities for indentifying many language processings problems and rendering them as data analysis tasks. The students will also know the principles of corpus linguistics and linguistic annotations, including markup languages such as XML. At the end of the course, the students will be able to extract many relevant information from textual corpora based on statistical analysis.Temary
1. Introducción a corpus lingüísticos.2. Características y tipos de corpus.
- Ejemplos de corpus
3. Anotación de corpus.
- Etiquetas comunes y análisis de niveles
4. Representación lingüística
- El lenguaje XML
- Estándares de la representación (TEI, NAF, AWA)
Bibliography
Basic bibliography
Aarts, J. And Meijs, W. (eds.) (1986) Corpus Linguistics II, Amsterdam: Rodopi.Aijmer, K. and Altenberg, B. (Eds) (1991) English Corpus Linguistics: Studies In Honour Of Jan Svari. London: Longman.
Anthony, L. (2013) ¿A critical look at software tools in corpus linguistics¿, Linguistic Research, Volume 30, Issue 2, pp. 141-161.
Baker, P. (2010) Sociolinguistics and Corpus Linguistics. Edinburgh University Press, Edinburgh.
Garside, R., Leech, G. and McEnery, T. (1997) Corpus Annotation. Longman, Harlow.
Jurafsky D., Martin J.H. (2000) Speech and Language Processing. An Introduction To Natural Language Processing Computational Linguistics and Speech Recognition. Prentice-Hall.
Lawler J., Aristar H. (1998) Using Computers In Linguistics. A Practical Guide. Routledge.
Leech, G. And Fallon, R. (1992) "Computer Corpora - What Do They Tell Us About Culture". Icame Journal, 29-50.
McEnery, T. and Hardie, A (2012) Corpus Linguistics: Method, Theory and Practice. Cambridge University Press, Cambridge.
Text Encoding And Interchange, TEI P5 (2016) Chicago And Oxford: Text Encoding Initiative.