Prosody plays an important role in the human recognition process; therefore, prosodic elements are normally used by impersonators aiming to resemble someone else. Since such voice imitation is one of the potential threats to security systems relying on automatic speaker recognition, and prosodic features have been considered for state-of-the-art recognition systems in recent years, the question arises as to what extent a mimicker is able to get close the prosodic characteristics of a target speaker. To this end, two experiments are conducted for twelve individual features in order to determine how a prosodic speaker identification system would perform against professionally imitated voices. The results show that the identification error rate increases for all the features except F0 range when the impersonators' modified voices are used instead of the impersonators natural voices. Moreover, it seems easier to copy prosody on the basis of a whole sentence than for a specific word.
|Title of host publication||Proceedings of Interspeech 2008|
|Subtitle of host publication||incorporating SST 2008, 22-26 September 2008, Brisbane, Australia|
|Editors||Fletcher, Goecke, Burnham, Wagner|
|Place of Publication||Australia|
|Publisher||International Speech Communication Association|
|Number of pages||6|
|Publication status||Published - 2008|
|Event||Interspeech 2008 - Brisbane, Australia|
Duration: 22 Sep 2008 → 26 Sep 2008
|Period||22/09/08 → 26/09/08|
Farrus, M., Wagner, M., Anguita, J., & Hernando, J. (2008). Robustness of Prosodic Features to Voice Imitation. In Fletcher, Goecke, Burnham, & Wagner (Eds.), Proceedings of Interspeech 2008: incorporating SST 2008, 22-26 September 2008, Brisbane, Australia (pp. 1-6). Australia: International Speech Communication Association.