Evaluation data and benchmarks for cascaded speech recognition and entity extraction

Liyuan Zhou, Hanna Suominen, Leif Hanlen

Research output: A Conference proceeding or a Chapter in BookConference contribution

3 Citations (Scopus)
1 Downloads (Pure)

Abstract

During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expertannotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90% F1 in extracting information for the patient's age, current bed, current room, and given name and over 70% F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78%. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.
Original languageEnglish
Title of host publicationSLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015
EditorsGuillaume Gravier, Martha Larson, Gareth Jones, Roeland Ordelman
PublisherAssociation for Computing Machinery (ACM)
Pages15-18
Number of pages4
ISBN (Electronic)9781450337496
ISBN (Print)9781450337496
DOIs
Publication statusPublished - 30 Oct 2015
EventACM Multimedia 2015: The third Edition Workshop on Speech, Language & Audio in Multimedia - Brisbane Exhibition & Convention Centre, Brisbane, Australia
Duration: 26 Oct 201530 Oct 2015
http://www.acmmm.org/2015/proceedings/

Publication series

NameSLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015

Conference

ConferenceACM Multimedia 2015
CountryAustralia
CityBrisbane
Period26/10/1530/10/15
Internet address

Fingerprint

Speech recognition
Ion exchange
Health
Calibration
Processing

Cite this

Zhou, L., Suominen, H., & Hanlen, L. (2015). Evaluation data and benchmarks for cascaded speech recognition and entity extraction. In G. Gravier, M. Larson, G. Jones, & R. Ordelman (Eds.), SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015 (pp. 15-18). (SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015). Association for Computing Machinery (ACM). https://doi.org/10.1145/2802558.2814649
Zhou, Liyuan ; Suominen, Hanna ; Hanlen, Leif. / Evaluation data and benchmarks for cascaded speech recognition and entity extraction. SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015. editor / Guillaume Gravier ; Martha Larson ; Gareth Jones ; Roeland Ordelman. Association for Computing Machinery (ACM), 2015. pp. 15-18 (SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015).
@inproceedings{3b9844ebe6c8411b9ffb201cf0b4bdbf,
title = "Evaluation data and benchmarks for cascaded speech recognition and entity extraction",
abstract = "During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expertannotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90{\%} F1 in extracting information for the patient's age, current bed, current room, and given name and over 70{\%} F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78{\%}. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.",
keywords = "Entity extraction, Evaluation, Speech recognition",
author = "Liyuan Zhou and Hanna Suominen and Leif Hanlen",
year = "2015",
month = "10",
day = "30",
doi = "10.1145/2802558.2814649",
language = "English",
isbn = "9781450337496",
series = "SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015",
publisher = "Association for Computing Machinery (ACM)",
pages = "15--18",
editor = "Guillaume Gravier and Martha Larson and Jones, {Gareth } and Roeland Ordelman",
booktitle = "SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015",
address = "United States",

}

Zhou, L, Suominen, H & Hanlen, L 2015, Evaluation data and benchmarks for cascaded speech recognition and entity extraction. in G Gravier, M Larson, G Jones & R Ordelman (eds), SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015. SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015, Association for Computing Machinery (ACM), pp. 15-18, ACM Multimedia 2015, Brisbane, Australia, 26/10/15. https://doi.org/10.1145/2802558.2814649

Evaluation data and benchmarks for cascaded speech recognition and entity extraction. / Zhou, Liyuan; Suominen, Hanna; Hanlen, Leif.

SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015. ed. / Guillaume Gravier; Martha Larson; Gareth Jones; Roeland Ordelman. Association for Computing Machinery (ACM), 2015. p. 15-18 (SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Evaluation data and benchmarks for cascaded speech recognition and entity extraction

AU - Zhou, Liyuan

AU - Suominen, Hanna

AU - Hanlen, Leif

PY - 2015/10/30

Y1 - 2015/10/30

N2 - During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expertannotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90% F1 in extracting information for the patient's age, current bed, current room, and given name and over 70% F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78%. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.

AB - During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expertannotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90% F1 in extracting information for the patient's age, current bed, current room, and given name and over 70% F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78%. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.

KW - Entity extraction

KW - Evaluation

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84964330879&partnerID=8YFLogxK

U2 - 10.1145/2802558.2814649

DO - 10.1145/2802558.2814649

M3 - Conference contribution

SN - 9781450337496

T3 - SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015

SP - 15

EP - 18

BT - SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015

A2 - Gravier, Guillaume

A2 - Larson, Martha

A2 - Jones, Gareth

A2 - Ordelman, Roeland

PB - Association for Computing Machinery (ACM)

ER -

Zhou L, Suominen H, Hanlen L. Evaluation data and benchmarks for cascaded speech recognition and entity extraction. In Gravier G, Larson M, Jones G, Ordelman R, editors, SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015. Association for Computing Machinery (ACM). 2015. p. 15-18. (SLAM 2015 - Proceedings of the 2015 Workshop on Speech, Language and Audio in Multimedia, co-located with ACM MM 2015). https://doi.org/10.1145/2802558.2814649