Subject

XSL Content

Speech Technologies

General details of the subject

Mode: Face-to-face degree course
Language: English

Description and contextualization of the subject

The goal of the course is for students to become familiar with the fundamental applications of speech signal processing, such as spech synthesis and automatic speech recognition. With a practical approach, the main paradigms that have been applied in both technologies will be analyzed, the procedures for the development of these systems will be described and the necessary resources and voice databases will also be presented. Students will practice with real systems. Additionally, other applications related to voice processing such as speaker or emotion recognition, voice conversion or diarization, among others, will be briefly reviewed.

To take this subject, the student must master the basic models of speech production, the concepts of temporal and frequency analysis of signals and the fundamentals of signal digitization.

Teaching staff

Name	Institution	Category	Doctor	Teaching profile	Area	E-mail
NAVAS CORDON, EVA	University of the Basque Country	Profesorado Agregado	Doctor	Not bilingual	Theory of Signals and Communications	eva.navas@ehu.eus
SARATXAGA COUCEIRO, IBON	University of the Basque Country	Profesorado Agregado	Doctor	Bilingual	Telematics Engineering	ibon.saratxaga@ehu.eus

Competencies

Name	Weight
Comprender e interpretar los principales parámetros utilizados en la representación de la señal de voz.	25.0 %
Conocer las estrategias fundamentales empleadas en los sistemas de síntesis y de reconocimiento de la señal de voz.	25.0 %
Comprender la terminología empleada en el campo del tratamiento de la señal de voz, de forma que sea capaz de interpretar un trabajo de investigación descrito en una revista.	25.0 %
Manejar las herramientas informáticas básicas para el procesado y tratamiento de la señal de voz.	25.0 %

Study types

Type	Face-to-face hours	Non face-to-face hours	Total hours
Lecture-based	15	22.5	37.5
Applied computer-based groups	30	45	75

Assessment systems

Name	Minimum weighting	Maximum weighting
Internship Report/Summary	20.0 %	50.0 %
Multiple-choice examination	30.0 %	60.0 %
Presentations	20.0 %	40.0 %

Learning outcomes of the subject

• RA1 Demonstrate understanding of the problems related to the acoustic modeling of the voice signal.

• RA2 Manage tools for the analysis and processing of the voice signal.

• RA3 Demonstrate understanding of automatic speech and speaker recognition systems and voice synthesis, as well as the techniques used to evaluate them.

• RA4 Develop a basic speech recognition system.

• RA5 Extract information from a scientific article and present it orally to an interdisciplinary and international audience.

Ordinary call: orientations and renunciation

For the evaluation of students in the ordinary call, different tools will be used:

- Written exam

At the end of the course there will be a written exam. In it, students must demonstrate their mastery of the concepts explained in the lectures and that are described in the notes provided. The exam may contain a multiple choice answer part and a development part.

- Lab practices

The practices will be carried out partially during the face-to-face classes. The student must compulsorily submit a report of the practices carried out, according to the indications provided. The compulsory practice reports must be delivered on the dates indicated during the course.

- Oral presentation

At the beginning of the course, topics will be proposed to be worked on by the students. At the end of the course there will be oral presentations of said works. Both the presentations made and the descriptive document of the work will be evaluated.

To waive the ordinary call, the student must notify the teaching staff of the subject two weeks before the end of the course.

Extraordinary call: orientations and renunciation

In the extraordinary call, the students will be evaluated for the same competences as in the ordinary call, for which the following assessment tools are established:

- Written exam that will cover the concepts taught in the lectures. The exam may have a multiple choice part and a development part.

- Laboratory practices: the reports of the practices proposed during the course must be submitted compulsorily.

- Oral presentation: An oral presentation will be made on a topic chosen by the student and a document will be delivered with the description of the work carried out.

Temary

1. Speech modelling

2. Speech synthesis

3. Speech recognition

4. Other speech technologies

Bibliography

Compulsory materials

Students will have the following materials necessary for the development of the subject available at eGela (http://egela.ehu.eus/):

- Slides with the theoretical content of the subject, topics T1-T4.

- Scripts for the lab practices, one file per practice with a theoretical introduction and the description of the work that must be done in the lab.

- Necessary signals, matlab programs, linux scripts and other necessary tools for the development of the practices.

Basic bibliography

- J. G. Proakis, D. G. Manolakis. Digital signal processing. Principles, algorithms and applications (4th edition). Pearson Prentice Hall, 2007. (ISBN: 978-0131873742)

- L. R. Rabiner, R. W. Schafer. Digital processing of speech signal. Prentice-Hall, 1978. (ISBN: 978-0132136037)

- X. Huang, A. Acero, H. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, 2001. (ISBN: 978-0130226167)

- A. V. Oppenheim, R. W. Schafer. Discrete-Time signal processing (3rd edition). Pearson Prentice Hall, 2009. (ISBN: 978-0131988422)

In-depth bibliography

- P. Taylor. Text-to-Speech Synthesis. Cambridge University Press, 2009. (ISBN: 978-0521899277)

- L. Rabiner, B. H. Juang. Fundamentals of Speech Recognition. CRC Press, 1993. (ISBN: 78-0130151575)

- D. Yu, L. Deng. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015. (ISBN: 978-1447157786)

- W. C. Chu. Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. Wiley-Interscience, 2003. (ISBN: 978-0471373124)

Journals

Computer Speech and Language

Speech Communication

IEEE Transactions on Audio, Speech & Language Processing

IEEE Transactions on Systems, Man and Cybernetics-Part B

IEEE Transactions on Multimedia

Journal of the Acoustical Society of America

Links

- Speech Technologies

http://www.speech.cs.cmu.edu/ news:comp.speech http://festvox.org/

- Review of Speech Synthesis Technology

http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/contents.html

- Speech Technology Hyperlinks Page

http://www.speech.cs.cmu.edu/comp.speech/Section5/speechlinks.html

- Demos de sistemas TTS

http://www.acapela-group.com/text-to-speech-interactive-demo.html

http://www.loquendo.com/en/demo-center/tts-demo/

http://enterprisecontent.nuance.com/vocalizer5-network-demo/index.html

http://aholab.ehu.es/tts/tts_en.html

XSL Content

It was not possible to load the content, please try again later. In case the problem persists contact CAU (Phone: 916014400 / E-mail: cau@ehu.eus / Website: https://lagun.ehu.eus).

Menu Display

Search Bar

Master in Language Analysis and Processing

Subject

XSL Content

Speech Technologies

General details of the subject

Description and contextualization of the subject

Teaching staff

Competencies

Study types

Assessment systems

Learning outcomes of the subject

Ordinary call: orientations and renunciation

Extraordinary call: orientations and renunciation

Temary

Bibliography

Compulsory materials

Basic bibliography

In-depth bibliography

Journals

Links

XSL Content

Menu Display

Search Bar

Breadcrumb

Subject

XSL Content

Speech Technologies

General details of the subject

Description and contextualization of the subject

Teaching staff

Competencies

Study types

Assessment systems

Learning outcomes of the subject

Ordinary call: orientations and renunciation

Extraordinary call: orientations and renunciation

Temary

Bibliography

Compulsory materials

Basic bibliography

In-depth bibliography

Journals

Links

XSL Content