Formant Estimation using an Auto Associative neural Network

Simon Hawkins, Haiying Li

    Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review


    Building on algorithms developed in earlier work (Hawkins et al., 1994a, 1994b; Hawkins, 1997; Hawkins et al., 2002), this study develops a new technique for improving the accuracy of formant estimates produced by an analysis-by-synthesis formant tracker (DPTRAK, Clermont, 1992). DPTRAK is evaluated by comparing its formant estimates against those obtained manually by the first author when he inspected the spectrogram of each vowel produced by each speaker. Applied to 13 male speakers uttering the 11 monophthongs and eight diphthongs of Australian English, DPTRAK produced results that varied in accuracy across speakers. The percentage of speech frames tracked accurately varied from 99% for the best speaker through to 58% for the worst speaker. We develop the SpeechSifter algorithm to sift through the speech frames tracked by the DPTRAK formant tracker (or any other formant tracker) and select only those frames that are likely to be accurately tracked. This unsupervised algorithm first selects the ideal speaker on which to train a Replicator Neural Net (Hawkins et al., 2002). The trained Replicator Neural Net is then used to screen those speech frames on which the formant tracker is highly likely to have made accurate formant estimates and to discard the rest. We demonstrate the value of this approach. First, we demonstrate that we can accurately predict which speaker will provide the ideal training speaker for the RNN. Next, we apply the trained RNN to a speaker and show that that it is possible to achieve a 90% accuracy rate whilst retaining 75% of the speaker’s original speech frames. This is an improvement on the DPTRAK algorithm which achieves an accuracy rate of only 81% for this speaker
    Original languageEnglish
    Title of host publicationProceedings of the 11th Australasian International Conference on Speech Science and Technology
    EditorsPaul Warren, Catherine Watson
    Place of PublicationAuckland NZ
    PublisherAustralian Speech Science and Technology
    Number of pages6
    ISBN (Print)0958194629
    Publication statusPublished - 2006
    Event11th Australasian International Conference on Speech Science - Auckland, New Zealand
    Duration: 6 Dec 20068 Dec 2006


    Conference11th Australasian International Conference on Speech Science
    Country/TerritoryNew Zealand


    Dive into the research topics of 'Formant Estimation using an Auto Associative neural Network'. Together they form a unique fingerprint.

    Cite this