Subject
Speech Technologies
General details of the subject
- Mode
- Face-to-face degree course
- Language
- English
Description and contextualization of the subject
The objective is to familiarize students with the fundamental applications of speech signal processing, such as speech synthesis and automatic speech recognition. With a practical approach, the main paradigms that have been applied in both technologies will be analyzed, the procedures for the generation of these systems with the necessary creation of resources and voice databases will be described and the students will practice with real systems. Additionally, other applications related to speech technologies such as speaker and emotion recognition, voice conversion or diarization among others will be briefly reviewed.To take this course, the student must master the basic models of speech production, the concepts of temporal and frequency signal analysis and the fundamentals of signal digitization.
Teaching staff
Name | Institution | Category | Doctor | Teaching profile | Area | |
---|---|---|---|---|---|---|
HERNAEZ RIOJA, INMACULADA CONCEPCION | University of the Basque Country | Profesorado Catedratico De Universidad | Doctor | Not bilingual | Theory of Signals and Communications | inma.hernaez@ehu.eus |
NAVAS CORDON, EVA | University of the Basque Country | Profesorado Agregado | Doctor | Not bilingual | Theory of Signals and Communications | eva.navas@ehu.eus |
SARATXAGA COUCEIRO, IBON | University of the Basque Country | Profesorado Agregado | Doctor | Bilingual | Telematics Engineering | ibon.saratxaga@ehu.eus |
Competencies
Name | Weight |
---|---|
Understand and interpret the main parameters used in the representation of the speech signal. | 25.0 % |
Know the fundamental strategies used in speech synthesis and recognition. | 25.0 % |
Understand the terminology used in the field of speech processing, so that the student is able to interpret a research paper from a scientific journal. | 25.0 % |
Manage the basic computer tools for speech signal processing and processing the voice signal. | 25.0 % |
Study types
Type | Face-to-face hours | Non face-to-face hours | Total hours |
---|---|---|---|
Lecture-based | 15 | 22.5 | 37.5 |
Applied laboratory-based groups | 30 | 45 | 75 |
Training activities
Name | Hours | Percentage of classroom teaching |
---|---|---|
Computer work practice, laboratory, site visits, field trips, external visits | 75.0 | 40 % |
Lectures | 37.5 | 40 % |
Assessment systems
Name | Minimum weighting | Maximum weighting |
---|---|---|
Internship Report/Summary | 20.0 % | 50.0 % |
Multiple-choice examination | 30.0 % | 60.0 % |
Presentations | 20.0 % | 40.0 % |
Learning outcomes of the subject
RA1 Show understanding of the problems related to the acoustic modeling of the speech signal.RA2 Manage speech analysis and processing tools.
RA3 Show understanding of automatic speech and speaker recognition systems, speech synthesis systems, as well as of the techniques used to evaluate them.
RA4 Develop a basic speech recognition system.
RA5 Extract information from a scientific paper and present it orally to an interdisciplinary and international audience.
Ordinary call: orientations and renunciation
The evaluation is divided into three independent parts: master lessons, lab practices and presentation of a work about one selected speech technology.- The knowledge about the master lessons will be proven by means of an individual written test based on short and multiple-choice questions (40%).
- The lab practices will be evaluated by the reports and the work developed at the lab (40%).
- The work about one speech technology is evaluated with a grade that will take into account the development and presentation of the work (20%).
In the regular evaluation, the exam must be passed with almost a 4 over 10 and to succeed in the subject a minimum final grade of 5 over 10 is required, once the grades of the three parts are added up. The three parts are independent and once one part is passed, the corresponding grade is kept for future calls.
The students unable to follow the combined evaluation must justify their reasons with proper documentation sent to the subject lecturers, according to the procedure established by the current regulation in the first two weeks of the course. These students will be able to prove the achievement of the learning results by means of a final evaluation consisting of: a written exam (40%), a laboratory exam (45%) and the presentation of a work related to a speech technology (15%).
By not showing to the final written exam, the student refuses to take part in the call.
Extraordinary call: orientations and renunciation
In the extraordinary call, two different tests will be made: an exam for the master classes and another one for the lab practices. Each test will represent the 50% of the final note and both of them must be passed with at least a 5 over 10.The students that have not presented the written justification to elude in the continuous evaluation must prove they have correctly completed the lab practices.
Temary
1. Speech modelling2. Speech synthesis
3. Speech recognition
4. Other speech technologies
Bibliography
Compulsory materials
Documents provided via eGela both for master classes and laboratory practicesBasic bibliography
X. Huang, A. Acero, H. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, 2001. (ISBN: 978-0130226167)D. Jurafsky, J. H. Martin. Speech and Language Processing (2nd edition). Prentice Hall, 2008. (ISBN: 978-0131873216)
P. Taylor. Text-to-Speech Synthesis. Cambridge University Press, 2009. (ISBN: 978-0521899277)
L. Rabiner, B. H. Juang. Fundamentals of Speech Recognition. CRC Press, 1993. (ISBN: 978-0130151575)
D. Yu, L. Deng. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015. (ISBN: 978-1447157786)
W. C. Chu. Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. Wiley-Interscience, 2003. (ISBN: 978-0471373124)
Journals
Computer Speech and LanguageSpeech Communication
IEEE Transactions on Audio Speech and Language Processing
IEEE Transactions on Multimedia
Links
Página de voz en la CMU http://www.speech.cs.cmu.edu/European Languages Resources Association (ELRA) http://www.elra.info/en/catalogues/
Linguistic Data Consortium (LDC) https://www.ldc.upenn.edu/language-resources
Smithsonian Speech Synthesis History Project (SSSHP) http://www.mindspring.com/~ssshp/ssshp_cd/ss_home.htm
Síntesis de habla emocional http://emosamples.syntheticspeech.de/