Abstract
Building on algorithms developed in earlier work (Hawkins et al., 1994a, 1994b; Hawkins, 1997; Hawkins et al., 2002), this study develops a new technique for improving the accuracy of formant estimates produced by an analysis-by-synthesis formant tracker (DPTRAK, Clermont, 1992). DPTRAK is evaluated by comparing its formant estimates against those obtained manually by the first author when he inspected the spectrogram of each vowel produced by each speaker. Applied to 13 male speakers uttering the 11 monophthongs and eight diphthongs of Australian English, DPTRAK produced results that varied in accuracy across speakers. The percentage of speech frames tracked accurately varied from 99% for the best speaker through to 58% for the worst speaker. We develop the SpeechSifter algorithm to sift through the speech frames tracked by the DPTRAK formant tracker (or any other formant tracker) and select only those frames that are likely to be accurately tracked. This unsupervised algorithm first selects the ideal speaker on which to train a Replicator Neural Net (Hawkins et al., 2002). The trained Replicator Neural Net is then used to screen those speech frames on which the formant tracker is highly likely to have made accurate formant estimates and to discard the rest. We demonstrate the value of this approach. First, we demonstrate that we can accurately predict which speaker will provide the ideal training speaker for the RNN. Next, we apply the trained RNN to a speaker and show that that it is possible to achieve a 90% accuracy rate whilst retaining 75% of the speaker’s original speech frames. This is an improvement on the DPTRAK algorithm which achieves an accuracy rate of only 81% for this speaker
Original language | English |
---|---|
Title of host publication | Proceedings of the 11th Australasian International Conference on Speech Science and Technology |
Editors | Paul Warren, Catherine Watson |
Place of Publication | Auckland NZ |
Publisher | Australian Speech Science and Technology |
Pages | 216-221 |
Number of pages | 6 |
ISBN (Print) | 0958194629 |
Publication status | Published - 2006 |
Event | 11th Australasian International Conference on Speech Science - Auckland, New Zealand Duration: 6 Dec 2006 → 8 Dec 2006 |
Conference
Conference | 11th Australasian International Conference on Speech Science |
---|---|
Country/Territory | New Zealand |
City | Auckland |
Period | 6/12/06 → 8/12/06 |