Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish

Xavier Gonzalvo, Joan-Claudi Socoró, Ignasi Iriondo, Carlos Monzo, Elisa Martínez Marroquín

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements have been performed on phonetic information and contextual factors according to the Castilian Spanish language and speech generation using a mixed excitation (ME) technique. The results show the preference of the new HMM-TTS system in front of the previous system and a better MOS in comparison with a real E-PSOLA in terms of acceptability, intelligibility and stability.
Original languageEnglish
Title of host publicationSSW6
Subtitle of host publicationISCA Speech
Pages362-367
Number of pages6
Publication statusPublished - 2007
Externally publishedYes

Publication series

NameSSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis

Fingerprint

Speech synthesis
Linguistics
Hidden Markov models
Speech analysis

Cite this

Gonzalvo, X., Socoró, J-C., Iriondo, I., Monzo, C., & Martínez Marroquín, E. (2007). Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. In SSW6: ISCA Speech (pp. 362-367). (SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis).
Gonzalvo, Xavier ; Socoró, Joan-Claudi ; Iriondo, Ignasi ; Monzo, Carlos ; Martínez Marroquín, Elisa. / Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. SSW6: ISCA Speech. 2007. pp. 362-367 (SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis).
@inproceedings{1efb5e4224ca4bce8cfb1adefaac96da,
title = "Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish",
abstract = "Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements have been performed on phonetic information and contextual factors according to the Castilian Spanish language and speech generation using a mixed excitation (ME) technique. The results show the preference of the new HMM-TTS system in front of the previous system and a better MOS in comparison with a real E-PSOLA in terms of acceptability, intelligibility and stability.",
keywords = "CV citaci{\'o}",
author = "Xavier Gonzalvo and Joan-Claudi Socor{\'o} and Ignasi Iriondo and Carlos Monzo and {Mart{\'i}nez Marroqu{\'i}n}, Elisa",
year = "2007",
language = "English",
series = "SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis",
pages = "362--367",
booktitle = "SSW6",

}

Gonzalvo, X, Socoró, J-C, Iriondo, I, Monzo, C & Martínez Marroquín, E 2007, Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. in SSW6: ISCA Speech. SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis, pp. 362-367.

Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. / Gonzalvo, Xavier; Socoró, Joan-Claudi; Iriondo, Ignasi; Monzo, Carlos; Martínez Marroquín, Elisa.

SSW6: ISCA Speech. 2007. p. 362-367 (SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish

AU - Gonzalvo, Xavier

AU - Socoró, Joan-Claudi

AU - Iriondo, Ignasi

AU - Monzo, Carlos

AU - Martínez Marroquín, Elisa

PY - 2007

Y1 - 2007

N2 - Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements have been performed on phonetic information and contextual factors according to the Castilian Spanish language and speech generation using a mixed excitation (ME) technique. The results show the preference of the new HMM-TTS system in front of the previous system and a better MOS in comparison with a real E-PSOLA in terms of acceptability, intelligibility and stability.

AB - Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements have been performed on phonetic information and contextual factors according to the Castilian Spanish language and speech generation using a mixed excitation (ME) technique. The results show the preference of the new HMM-TTS system in front of the previous system and a better MOS in comparison with a real E-PSOLA in terms of acceptability, intelligibility and stability.

KW - CV citació

UR - https://www.mendeley.com/catalogue/linguistic-mixed-excitation-improvements-hmmbased-speech-synthesis-castilian-spanish/

UR - https://www.isca-speech.org/iscaweb/index.php/archive/online-archive

UR - https://www.isca-speech.org/archive_open/ssw6/ssw6_362.html

M3 - Conference contribution

T3 - SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis

SP - 362

EP - 367

BT - SSW6

ER -

Gonzalvo X, Socoró J-C, Iriondo I, Monzo C, Martínez Marroquín E. Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. In SSW6: ISCA Speech. 2007. p. 362-367. (SSW6-2007. Proceedings of the Sixth ISCATutorial and Research Workshop in Speech Synthesis).