Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction

Hanna SUOMINEN, Gabriela Ferraro

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

In Australian healthcare, failures in information flow cause over one-tenth of preventable adverse events and are tangible in clinical handover. Regardless of a good verbal handover, anything from two-thirds to all of this information is lost after 3– 5 shifts if notes are taken by hand or not taken. Speech to text (SST) and information extraction (IE) have been proposed for taking the notes and filling in a handover form with extrapolated evaluations from related studies promising over 90 per cent correctness for both STT and IE. However, this cascading evokes a fruitful methodological challenge: the severe implications that errors may have in clinical decision-making call for superiority in STT; the correctness percentage measured in a peaceful laboratory is decreased to 77 by noise in clinical practise; and the STT errors multiply when cascaded with IE. We provide an analysis of STT errors and dis- cuss the feasibility of phonetic similarity for their correction in this paper. Our data consists of one hundred simulated handover records in Australian English with STT recognising 73 per cent of the 7 ; 277 words (1 h 8 min 5 s) correctly. In text relevant to the form, 836 unique error types are present. The most common errors include inserting and , in , are , arm , is , a , the , or am ( 5 n 94 ), deleting is ( n = 17 ), and substituting and , obs are , 2 , he with in, also, to, or and she (7≤n≤11), respectively. Eighteen per cent of word substitutions sound exactly the same as the correct word and 26 per cent have a similarity percentage above 75. This encourages using phonetic similarity to improve STT.
Original languageEnglish
Title of host publicationProceedings of Australasian Language Technology Association Workshop
EditorsSarvnaz Karimi, Karin Verspoor
PublisherAssociation for Computational Linguistics
Pages34-42
Number of pages9
Publication statusPublished - 2013

Publication series

NameAustralasian Language Technology Association Workshop
PublisherAustralasian Language Technology Association
Volume11
ISSN (Electronic)1834-7037

Fingerprint

Phonetics
Information Storage and Retrieval
Noise
Patient Handoff
Arm
Hand
Delivery of Health Care

Cite this

SUOMINEN, H., & Ferraro, G. (2013). Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction. In S. Karimi, & K. Verspoor (Eds.), Proceedings of Australasian Language Technology Association Workshop (pp. 34-42). [UL13-1006] (Australasian Language Technology Association Workshop; Vol. 11). Association for Computational Linguistics.
SUOMINEN, Hanna ; Ferraro, Gabriela. / Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction. Proceedings of Australasian Language Technology Association Workshop. editor / Sarvnaz Karimi ; Karin Verspoor. Association for Computational Linguistics, 2013. pp. 34-42 (Australasian Language Technology Association Workshop).
@inproceedings{da0452d7dbe746cc88181a0e7ae14e14,
title = "Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction",
abstract = "In Australian healthcare, failures in information flow cause over one-tenth of preventable adverse events and are tangible in clinical handover. Regardless of a good verbal handover, anything from two-thirds to all of this information is lost after 3– 5 shifts if notes are taken by hand or not taken. Speech to text (SST) and information extraction (IE) have been proposed for taking the notes and filling in a handover form with extrapolated evaluations from related studies promising over 90 per cent correctness for both STT and IE. However, this cascading evokes a fruitful methodological challenge: the severe implications that errors may have in clinical decision-making call for superiority in STT; the correctness percentage measured in a peaceful laboratory is decreased to 77 by noise in clinical practise; and the STT errors multiply when cascaded with IE. We provide an analysis of STT errors and dis- cuss the feasibility of phonetic similarity for their correction in this paper. Our data consists of one hundred simulated handover records in Australian English with STT recognising 73 per cent of the 7 ; 277 words (1 h 8 min 5 s) correctly. In text relevant to the form, 836 unique error types are present. The most common errors include inserting and , in , are , arm , is , a , the , or am ( 5 n 94 ), deleting is ( n = 17 ), and substituting and , obs are , 2 , he with in, also, to, or and she (7≤n≤11), respectively. Eighteen per cent of word substitutions sound exactly the same as the correct word and 26 per cent have a similarity percentage above 75. This encourages using phonetic similarity to improve STT.",
keywords = "Speech-to-text",
author = "Hanna SUOMINEN and Gabriela Ferraro",
year = "2013",
language = "English",
series = "Australasian Language Technology Association Workshop",
publisher = "Association for Computational Linguistics",
pages = "34--42",
editor = "Sarvnaz Karimi and Karin Verspoor",
booktitle = "Proceedings of Australasian Language Technology Association Workshop",

}

SUOMINEN, H & Ferraro, G 2013, Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction. in S Karimi & K Verspoor (eds), Proceedings of Australasian Language Technology Association Workshop., UL13-1006, Australasian Language Technology Association Workshop, vol. 11, Association for Computational Linguistics, pp. 34-42.

Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction. / SUOMINEN, Hanna; Ferraro, Gabriela.

Proceedings of Australasian Language Technology Association Workshop. ed. / Sarvnaz Karimi; Karin Verspoor. Association for Computational Linguistics, 2013. p. 34-42 UL13-1006 (Australasian Language Technology Association Workshop; Vol. 11).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction

AU - SUOMINEN, Hanna

AU - Ferraro, Gabriela

PY - 2013

Y1 - 2013

N2 - In Australian healthcare, failures in information flow cause over one-tenth of preventable adverse events and are tangible in clinical handover. Regardless of a good verbal handover, anything from two-thirds to all of this information is lost after 3– 5 shifts if notes are taken by hand or not taken. Speech to text (SST) and information extraction (IE) have been proposed for taking the notes and filling in a handover form with extrapolated evaluations from related studies promising over 90 per cent correctness for both STT and IE. However, this cascading evokes a fruitful methodological challenge: the severe implications that errors may have in clinical decision-making call for superiority in STT; the correctness percentage measured in a peaceful laboratory is decreased to 77 by noise in clinical practise; and the STT errors multiply when cascaded with IE. We provide an analysis of STT errors and dis- cuss the feasibility of phonetic similarity for their correction in this paper. Our data consists of one hundred simulated handover records in Australian English with STT recognising 73 per cent of the 7 ; 277 words (1 h 8 min 5 s) correctly. In text relevant to the form, 836 unique error types are present. The most common errors include inserting and , in , are , arm , is , a , the , or am ( 5 n 94 ), deleting is ( n = 17 ), and substituting and , obs are , 2 , he with in, also, to, or and she (7≤n≤11), respectively. Eighteen per cent of word substitutions sound exactly the same as the correct word and 26 per cent have a similarity percentage above 75. This encourages using phonetic similarity to improve STT.

AB - In Australian healthcare, failures in information flow cause over one-tenth of preventable adverse events and are tangible in clinical handover. Regardless of a good verbal handover, anything from two-thirds to all of this information is lost after 3– 5 shifts if notes are taken by hand or not taken. Speech to text (SST) and information extraction (IE) have been proposed for taking the notes and filling in a handover form with extrapolated evaluations from related studies promising over 90 per cent correctness for both STT and IE. However, this cascading evokes a fruitful methodological challenge: the severe implications that errors may have in clinical decision-making call for superiority in STT; the correctness percentage measured in a peaceful laboratory is decreased to 77 by noise in clinical practise; and the STT errors multiply when cascaded with IE. We provide an analysis of STT errors and dis- cuss the feasibility of phonetic similarity for their correction in this paper. Our data consists of one hundred simulated handover records in Australian English with STT recognising 73 per cent of the 7 ; 277 words (1 h 8 min 5 s) correctly. In text relevant to the form, 836 unique error types are present. The most common errors include inserting and , in , are , arm , is , a , the , or am ( 5 n 94 ), deleting is ( n = 17 ), and substituting and , obs are , 2 , he with in, also, to, or and she (7≤n≤11), respectively. Eighteen per cent of word substitutions sound exactly the same as the correct word and 26 per cent have a similarity percentage above 75. This encourages using phonetic similarity to improve STT.

KW - Speech-to-text

M3 - Conference contribution

T3 - Australasian Language Technology Association Workshop

SP - 34

EP - 42

BT - Proceedings of Australasian Language Technology Association Workshop

A2 - Karimi, Sarvnaz

A2 - Verspoor, Karin

PB - Association for Computational Linguistics

ER -

SUOMINEN H, Ferraro G. Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction. In Karimi S, Verspoor K, editors, Proceedings of Australasian Language Technology Association Workshop. Association for Computational Linguistics. 2013. p. 34-42. UL13-1006. (Australasian Language Technology Association Workshop).