The application of the linear prediction Model for speech waveform analysis to context-independent automatic speaker recognition is explored, primarily in terns of the parametric sensitivity of the model. Feature vectors to characterize speakers are formed from linear prediction speech parameters computed as inverse filter coefficients, reflection coefficients or cepstral coefficients, and also power spectrum parameters via Fast Fourier Transform coefficients. The comparative performance of these parameters is investigated in speaker recognition experiments. The stability of the linear prediction parameters is tested over a range of model order from p=6 to p=30. Two independent speech databases are used to substantiate the experimental results. The quality of the automatic recognition technique is assessed in a novel experiment based on a direct performance comparison with the human skill of aural recognition. Correlation is sought between the performance of the aural and automatic recognition methods, for each of the four parameter sets. Although the recognition accuracy of the automatic system is superior to that of the direct aural technique, the error distributions are highly variable. The performance of the automatic system is shown to be empirically based and unlike the intuitive human process. An extended preamble to the description of the experiments reviews the current art of automatic speaker recognition, with a critical consideration of the performance of linear prediction techniques. As supported by our experimental results, it is concluded that success in the laboratory rests upon a rather fragile foundation. Application to problems beyond the controlled laboratory environment is seen, therefore, to be still more precarious.
|Date of Award||1982|