Subject

XSL Content

Speech Technologies

General details of the subject

Mode
Face-to-face degree course
Language
English

Description and contextualization of the subject

The goal of the course is for students to become familiar with the fundamental applications of speech signal processing, such as spech synthesis and automatic speech recognition. With a practical approach, the main paradigms that have been applied in both technologies will be analyzed, the procedures for the development of these systems will be described and the necessary resources and voice databases will also be presented. Students will practice with real systems. Additionally, other applications related to voice processing such as speaker or emotion recognition, voice conversion or diarization, among others, will be briefly reviewed.



To take this subject, the student must master the basic models of speech production, the concepts of temporal and frequency analysis of signals and the fundamentals of signal digitization.

Teaching staff

NameInstitutionCategoryDoctorTeaching profileAreaE-mail
NAVAS CORDON, EVAUniversity of the Basque CountryProfesorado AgregadoDoctorNot bilingualTheory of Signals and Communicationseva.navas@ehu.eus
SARATXAGA COUCEIRO, IBONUniversity of the Basque CountryProfesorado AgregadoDoctorBilingualTelematics Engineeringibon.saratxaga@ehu.eus

Competencies

NameWeight
Comprender e interpretar los principales parámetros utilizados en la representación de la señal de voz.25.0 %
Conocer las estrategias fundamentales empleadas en los sistemas de síntesis y de reconocimiento de la señal de voz.25.0 %
Comprender la terminología empleada en el campo del tratamiento de la señal de voz, de forma que sea capaz de interpretar un trabajo de investigación descrito en una revista.25.0 %
Manejar las herramientas informáticas básicas para el procesado y tratamiento de la señal de voz.25.0 %

Study types

TypeFace-to-face hoursNon face-to-face hoursTotal hours
Lecture-based1522.537.5
Applied computer-based groups304575

Assessment systems

NameMinimum weightingMaximum weighting
Internship Report/Summary20.0 % 50.0 %
Multiple-choice examination30.0 % 60.0 %
Presentations20.0 % 40.0 %

Learning outcomes of the subject

• RA1 Demonstrate understanding of the problems related to the acoustic modeling of the voice signal.

• RA2 Manage tools for the analysis and processing of the voice signal.

• RA3 Demonstrate understanding of automatic speech and speaker recognition systems and voice synthesis, as well as the techniques used to evaluate them.

• RA4 Develop a basic speech recognition system.

• RA5 Extract information from a scientific article and present it orally to an interdisciplinary and international audience.

Ordinary call: orientations and renunciation

The evaluation is divided into three independent parts: master lessons, lab practices and presentation of a work about one selected speech technology.

- The knowledge about the master lessons will be proven by means of an individual written test based on short and multiple-choice questions (40%).

- The lab practices will be evaluated by the reports and the work developed at the lab (40%).

- The work about one speech technology is evaluated with a grade that will take into account the development and presentation of the work (20%).



In the regular evaluation, the exam must be passed with almost a 3 over 10 and to succeed in the subject a minimum final grade of 5 over 10 is required, once the grades of the three parts are added up. The three parts are independent and once one part is passed, the corresponding grade is kept for the next extraordinary call.



The students unable to follow the combined evaluation must justify their reasons with proper documentation sent to the subject lecturers, according to the procedure established by the current regulation in the first two weeks of the course. These students will be able to prove the achievement of the learning results by means of a final evaluation consisting of: a written exam (40%), a laboratory exam (45%) and the presentation of a work related to a speech technology (15%).



By not showing to the final written exam, the student refuses to take part in the call.

Extraordinary call: orientations and renunciation

In the extraordinary call, two different tests will be made: an exam for the master classes and another one for the lab practices. Each test will represent the 50% of the final note and the exam must be passed with at least a 3 over 10.

Temary

1. Speech modelling

2. Speech synthesis

3. Speech recognition

4. Other speech technologies

Bibliography

Compulsory materials

Documents provided via eGela both for master classes and laboratory practices

Basic bibliography

- J. G. Proakis, D. G. Manolakis. Digital signal processing. Principles, algorithms and applications (4th edition). Pearson Prentice Hall, 2007. (ISBN: 978-0131873742)



- L. R. Rabiner, R. W. Schafer. Digital processing of speech signal. Prentice-Hall, 1978. (ISBN: 978-0132136037)



- X. Huang, A. Acero, H. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, 2001. (ISBN: 978-0130226167)



- A. V. Oppenheim, R. W. Schafer. Discrete-Time signal processing (3rd edition). Pearson Prentice Hall, 2009. (ISBN: 978-0131988422)

In-depth bibliography

- P. Taylor. Text-to-Speech Synthesis. Cambridge University Press, 2009. (ISBN: 978-0521899277)







- L. Rabiner, B. H. Juang. Fundamentals of Speech Recognition. CRC Press, 1993. (ISBN: 78-0130151575)







- D. Yu, L. Deng. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015. (ISBN: 978-1447157786)







- W. C. Chu. Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. Wiley-Interscience, 2003. (ISBN: 978-0471373124)

Journals

Computer Speech and Language







Speech Communication







IEEE Transactions on Audio, Speech & Language Processing







IEEE Transactions on Systems, Man and Cybernetics-Part B







IEEE Transactions on Multimedia







Journal of the Acoustical Society of America

Links

- Speech Technologies



http://www.speech.cs.cmu.edu/ news:comp.speech http://festvox.org/



- Review of Speech Synthesis Technology



http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/contents.html



- Speech Technology Hyperlinks Page



http://www.speech.cs.cmu.edu/comp.speech/Section5/speechlinks.html



- Demos de sistemas TTS



http://www.acapela-group.com/text-to-speech-interactive-demo.html



http://www.loquendo.com/en/demo-center/tts-demo/



http://enterprisecontent.nuance.com/vocalizer5-network-demo/index.html



http://aholab.ehu.es/tts/tts_en.html



XSL Content

It was not possible to load the content, please try again later. In case the problem persists contact CAU (Phone: 916014400 / E-mail: cau@ehu.eus / Website: https://lagun.ehu.eus).