Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations.

Hanna Suominen, Liyuan Zhou, Leif Hanlen, Gabriela Ferraro

Research output: Contribution to journalArticle

2 Downloads (Pure)

Abstract

BACKGROUND: Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off.

OBJECTIVE: The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow.

METHODS: We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness.

RESULTS: The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents.

CONCLUSIONS: The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.

Original languageEnglish
Pages (from-to)1-23
Number of pages23
JournalJMIR medical informatics
Volume3
Issue number2
DOIs
Publication statusPublished - 27 Apr 2015

Fingerprint

Patient Handoff
Benchmarking
Information Storage and Retrieval
Research
Natural Language Processing
Workflow
Documentation
Language
Software
Hand
Nurses
Delivery of Health Care
Equipment and Supplies

Cite this

Suominen, Hanna ; Zhou, Liyuan ; Hanlen, Leif ; Ferraro, Gabriela. / Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations. In: JMIR medical informatics. 2015 ; Vol. 3, No. 2. pp. 1-23.
@article{33d1f78f64744506aecfc91182e003a7,
title = "Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations.",
abstract = "BACKGROUND: Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off.OBJECTIVE: The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow.METHODS: We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness.RESULTS: The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents.CONCLUSIONS: The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.",
keywords = "computer systems evaluation, data collection, information extraction, nursing records, patient handoff, records as topic, speech recognition software",
author = "Hanna Suominen and Liyuan Zhou and Leif Hanlen and Gabriela Ferraro",
year = "2015",
month = "4",
day = "27",
doi = "10.2196/medinform.4321",
language = "English",
volume = "3",
pages = "1--23",
journal = "JMIR medical informatics",
issn = "2291-9694",
number = "2",

}

Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations. / Suominen, Hanna; Zhou, Liyuan; Hanlen, Leif; Ferraro, Gabriela.

In: JMIR medical informatics, Vol. 3, No. 2, 27.04.2015, p. 1-23.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations.

AU - Suominen, Hanna

AU - Zhou, Liyuan

AU - Hanlen, Leif

AU - Ferraro, Gabriela

PY - 2015/4/27

Y1 - 2015/4/27

N2 - BACKGROUND: Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off.OBJECTIVE: The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow.METHODS: We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness.RESULTS: The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents.CONCLUSIONS: The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.

AB - BACKGROUND: Over a tenth of preventable adverse events in health care are caused by failures in information flow. These failures are tangible in clinical handover; regardless of good verbal handover, from two-thirds to all of this information is lost after 3-5 shifts if notes are taken by hand, or not at all. Speech recognition and information extraction provide a way to fill out a handover form for clinical proofing and sign-off.OBJECTIVE: The objective of the study was to provide a recorded spoken handover, annotated verbatim transcriptions, and evaluations to support research in spoken and written natural language processing for filling out a clinical handover form. This dataset is based on synthetic patient profiles, thereby avoiding ethical and legal restrictions, while maintaining efficacy for research in speech-to-text conversion and information extraction, based on realistic clinical scenarios. We also introduce a Web app to demonstrate the system design and workflow.METHODS: We experiment with Dragon Medical 11.0 for speech recognition and CRF++ for information extraction. To compute features for information extraction, we also apply CoreNLP, MetaMap, and Ontoserver. Our evaluation uses cross-validation techniques to measure processing correctness.RESULTS: The data provided were a simulation of nursing handover, as recorded using a mobile device, built from simulated patient records and handover scripts, spoken by an Australian registered nurse. Speech recognition recognized 5276 of 7277 words in our 100 test documents correctly. We considered 50 mutually exclusive categories in information extraction and achieved the F1 (ie, the harmonic mean of Precision and Recall) of 0.86 in the category for irrelevant text and the macro-averaged F1 of 0.70 over the remaining 35 nonempty categories of the form in our 101 test documents.CONCLUSIONS: The significance of this study hinges on opening our data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. The data are used in the CLEFeHealth 2015 evaluation laboratory for a shared task on speech recognition.

KW - computer systems evaluation

KW - data collection

KW - information extraction

KW - nursing records

KW - patient handoff

KW - records as topic

KW - speech recognition software

U2 - 10.2196/medinform.4321

DO - 10.2196/medinform.4321

M3 - Article

VL - 3

SP - 1

EP - 23

JO - JMIR medical informatics

JF - JMIR medical informatics

SN - 2291-9694

IS - 2

ER -