Formant Estimation using an Auto Associative neural Network

Simon Hawkins, Haiying Li

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    Abstract

    Building on algorithms developed in earlier work (Hawkins et al., 1994a, 1994b; Hawkins, 1997; Hawkins et al., 2002), this study develops a new technique for improving the accuracy of formant estimates produced by an analysis-by-synthesis formant tracker (DPTRAK, Clermont, 1992). DPTRAK is evaluated by comparing its formant estimates against those obtained manually by the first author when he inspected the spectrogram of each vowel produced by each speaker. Applied to 13 male speakers uttering the 11 monophthongs and eight diphthongs of Australian English, DPTRAK produced results that varied in accuracy across speakers. The percentage of speech frames tracked accurately varied from 99% for the best speaker through to 58% for the worst speaker. We develop the SpeechSifter algorithm to sift through the speech frames tracked by the DPTRAK formant tracker (or any other formant tracker) and select only those frames that are likely to be accurately tracked. This unsupervised algorithm first selects the ideal speaker on which to train a Replicator Neural Net (Hawkins et al., 2002). The trained Replicator Neural Net is then used to screen those speech frames on which the formant tracker is highly likely to have made accurate formant estimates and to discard the rest. We demonstrate the value of this approach. First, we demonstrate that we can accurately predict which speaker will provide the ideal training speaker for the RNN. Next, we apply the trained RNN to a speaker and show that that it is possible to achieve a 90% accuracy rate whilst retaining 75% of the speaker’s original speech frames. This is an improvement on the DPTRAK algorithm which achieves an accuracy rate of only 81% for this speaker
    Original languageEnglish
    Title of host publicationProceedings of the 11th Australasian International Conference on Speech Science and Technology
    EditorsPaul Warren, Catherine Watson
    Place of PublicationAuckland NZ
    PublisherAustralian Speech Science and Technology
    Pages216-221
    Number of pages6
    ISBN (Print)0958194629
    Publication statusPublished - 2006
    Event11th Australasian International Conference on Speech Science - Auckland, New Zealand
    Duration: 6 Dec 20068 Dec 2006

    Conference

    Conference11th Australasian International Conference on Speech Science
    CountryNew Zealand
    CityAuckland
    Period6/12/068/12/06

    Fingerprint

    Neural networks

    Cite this

    Hawkins, S., & Li, H. (2006). Formant Estimation using an Auto Associative neural Network. In P. Warren, & C. Watson (Eds.), Proceedings of the 11th Australasian International Conference on Speech Science and Technology (pp. 216-221). Auckland NZ: Australian Speech Science and Technology.
    Hawkins, Simon ; Li, Haiying. / Formant Estimation using an Auto Associative neural Network. Proceedings of the 11th Australasian International Conference on Speech Science and Technology. editor / Paul Warren ; Catherine Watson. Auckland NZ : Australian Speech Science and Technology, 2006. pp. 216-221
    @inproceedings{6d689616c811493aa717b3c46c48a99e,
    title = "Formant Estimation using an Auto Associative neural Network",
    abstract = "Building on algorithms developed in earlier work (Hawkins et al., 1994a, 1994b; Hawkins, 1997; Hawkins et al., 2002), this study develops a new technique for improving the accuracy of formant estimates produced by an analysis-by-synthesis formant tracker (DPTRAK, Clermont, 1992). DPTRAK is evaluated by comparing its formant estimates against those obtained manually by the first author when he inspected the spectrogram of each vowel produced by each speaker. Applied to 13 male speakers uttering the 11 monophthongs and eight diphthongs of Australian English, DPTRAK produced results that varied in accuracy across speakers. The percentage of speech frames tracked accurately varied from 99{\%} for the best speaker through to 58{\%} for the worst speaker. We develop the SpeechSifter algorithm to sift through the speech frames tracked by the DPTRAK formant tracker (or any other formant tracker) and select only those frames that are likely to be accurately tracked. This unsupervised algorithm first selects the ideal speaker on which to train a Replicator Neural Net (Hawkins et al., 2002). The trained Replicator Neural Net is then used to screen those speech frames on which the formant tracker is highly likely to have made accurate formant estimates and to discard the rest. We demonstrate the value of this approach. First, we demonstrate that we can accurately predict which speaker will provide the ideal training speaker for the RNN. Next, we apply the trained RNN to a speaker and show that that it is possible to achieve a 90{\%} accuracy rate whilst retaining 75{\%} of the speaker’s original speech frames. This is an improvement on the DPTRAK algorithm which achieves an accuracy rate of only 81{\%} for this speaker",
    author = "Simon Hawkins and Haiying Li",
    year = "2006",
    language = "English",
    isbn = "0958194629",
    pages = "216--221",
    editor = "Paul Warren and Catherine Watson",
    booktitle = "Proceedings of the 11th Australasian International Conference on Speech Science and Technology",
    publisher = "Australian Speech Science and Technology",

    }

    Hawkins, S & Li, H 2006, Formant Estimation using an Auto Associative neural Network. in P Warren & C Watson (eds), Proceedings of the 11th Australasian International Conference on Speech Science and Technology. Australian Speech Science and Technology, Auckland NZ, pp. 216-221, 11th Australasian International Conference on Speech Science, Auckland, New Zealand, 6/12/06.

    Formant Estimation using an Auto Associative neural Network. / Hawkins, Simon; Li, Haiying.

    Proceedings of the 11th Australasian International Conference on Speech Science and Technology. ed. / Paul Warren; Catherine Watson. Auckland NZ : Australian Speech Science and Technology, 2006. p. 216-221.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - Formant Estimation using an Auto Associative neural Network

    AU - Hawkins, Simon

    AU - Li, Haiying

    PY - 2006

    Y1 - 2006

    N2 - Building on algorithms developed in earlier work (Hawkins et al., 1994a, 1994b; Hawkins, 1997; Hawkins et al., 2002), this study develops a new technique for improving the accuracy of formant estimates produced by an analysis-by-synthesis formant tracker (DPTRAK, Clermont, 1992). DPTRAK is evaluated by comparing its formant estimates against those obtained manually by the first author when he inspected the spectrogram of each vowel produced by each speaker. Applied to 13 male speakers uttering the 11 monophthongs and eight diphthongs of Australian English, DPTRAK produced results that varied in accuracy across speakers. The percentage of speech frames tracked accurately varied from 99% for the best speaker through to 58% for the worst speaker. We develop the SpeechSifter algorithm to sift through the speech frames tracked by the DPTRAK formant tracker (or any other formant tracker) and select only those frames that are likely to be accurately tracked. This unsupervised algorithm first selects the ideal speaker on which to train a Replicator Neural Net (Hawkins et al., 2002). The trained Replicator Neural Net is then used to screen those speech frames on which the formant tracker is highly likely to have made accurate formant estimates and to discard the rest. We demonstrate the value of this approach. First, we demonstrate that we can accurately predict which speaker will provide the ideal training speaker for the RNN. Next, we apply the trained RNN to a speaker and show that that it is possible to achieve a 90% accuracy rate whilst retaining 75% of the speaker’s original speech frames. This is an improvement on the DPTRAK algorithm which achieves an accuracy rate of only 81% for this speaker

    AB - Building on algorithms developed in earlier work (Hawkins et al., 1994a, 1994b; Hawkins, 1997; Hawkins et al., 2002), this study develops a new technique for improving the accuracy of formant estimates produced by an analysis-by-synthesis formant tracker (DPTRAK, Clermont, 1992). DPTRAK is evaluated by comparing its formant estimates against those obtained manually by the first author when he inspected the spectrogram of each vowel produced by each speaker. Applied to 13 male speakers uttering the 11 monophthongs and eight diphthongs of Australian English, DPTRAK produced results that varied in accuracy across speakers. The percentage of speech frames tracked accurately varied from 99% for the best speaker through to 58% for the worst speaker. We develop the SpeechSifter algorithm to sift through the speech frames tracked by the DPTRAK formant tracker (or any other formant tracker) and select only those frames that are likely to be accurately tracked. This unsupervised algorithm first selects the ideal speaker on which to train a Replicator Neural Net (Hawkins et al., 2002). The trained Replicator Neural Net is then used to screen those speech frames on which the formant tracker is highly likely to have made accurate formant estimates and to discard the rest. We demonstrate the value of this approach. First, we demonstrate that we can accurately predict which speaker will provide the ideal training speaker for the RNN. Next, we apply the trained RNN to a speaker and show that that it is possible to achieve a 90% accuracy rate whilst retaining 75% of the speaker’s original speech frames. This is an improvement on the DPTRAK algorithm which achieves an accuracy rate of only 81% for this speaker

    M3 - Conference contribution

    SN - 0958194629

    SP - 216

    EP - 221

    BT - Proceedings of the 11th Australasian International Conference on Speech Science and Technology

    A2 - Warren, Paul

    A2 - Watson, Catherine

    PB - Australian Speech Science and Technology

    CY - Auckland NZ

    ER -

    Hawkins S, Li H. Formant Estimation using an Auto Associative neural Network. In Warren P, Watson C, editors, Proceedings of the 11th Australasian International Conference on Speech Science and Technology. Auckland NZ: Australian Speech Science and Technology. 2006. p. 216-221