The automatic detection of people’s identity from their voices is part of modern telecommunication services. This generally requires the telephone transmission of the speech to remote servers that perform the recognition task. The transmission may introduce severe distortions that degrade the system performance and hence represents one of the major challenges speech technologies are currently facing. Similarly, humans also cope with the difficulty of reliably identifying talkers from speech transmitted over communication channels, particularly if the utterance heard is of short duration. The present research work addresses the evaluation of the human and of the automatic performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.
|Date of Award
|Michael Wagner (Supervisor), Sebastian Moeller (Supervisor) & Roland Goecke (Supervisor)