Skip to main navigation Skip to search Skip to main content

Combining acoustic analysis and phonotactic analysis to improve automatic speech recognition

  • Susan Nulsen

    Student thesis: Doctoral Thesis

    Abstract

    This thesis addresses the problem of automatic speech recognition, specifically, how to transform an acoustic waveform into a string of words or phonemes. A preliminary chapter gives linguistic information potentially useful in automatic speech recognition. This is followed by a description of the Wave Analysis Laboratory (WAL), a rule-based system which detects features in speech and was designed as the acoustic front end of a speech recognition system. Temporal reasoning as used in WAL rules is examined. The use of WAL in recognizing one particular class of speech sounds, the nasal consonants, is described in detail.
    The remainder of the thesis looks at the statistical analysis of samples of spontaneous speech. An orthographic transcription of a large sample of spontaneous speech is automatically translated into phonemes. Tables of the frequencies of word initial and word final phoneme clusters are constructed to illustrate some of the phonotactic constraints of the language. Statistical data is used to assign phonemes to phonotactic classes. These classes are unlike the acoustic classes, although there is a general distinction between the vowels, the consonants and the word boundary.
    A way of measuring the phonetic balance of a sample of speech is described. This can be used as a means of ranking potential test samples in terms of how well they represent the language.
    A phoneme n-gram model is used to measure the entropy of the language. The broad acoustic encoding output from WAL is used with this language model to reconstruct a small test sample.
    “Branching” a simpler alternative to perplexity is introduced and found to give similar results to perplexity. Finally, the drop in branching is calculated as knowledge of various sets of acoustic classes is considered.
    In the work described in this thesis the main contributions made to automatic speech recognition and the study of speech are in the development of the Wave Analysis Laboratory and in the analysis of speech from a phonotactic point of view. The phoneme cluster frequencies provide new information on spoken language, as do the phonotactic classes. The measures of phonetic balance and branching provide additional tools for use in the development of speech recognition systems.
    Date of Award1998
    Original languageEnglish
    SupervisorMary O'Kane (Supervisor)

    Cite this

    '