Abstract
Best practice for clinical handover and its documentation recommends standardized, structured, and synchronous processes with patient involvement. Cascaded speech recognition (SR) and information extraction could support their compliance and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges. First, multiple people speak clinical jargon in the presence of background noise with limited possibilities for SR personalization. Second, errors multiply in cascading and hence, SR correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015. The task released 100 synthetic handover documents for training and another 100 documents for testing in both verbal and written formats. It attracted 48 team registrations, 21 email confirmations, and four method submissions by two teams. The submissions were compared against a leading commercial SR engine and simple majority baseline. Although this engine performed significantly better than any submission [i.e., 38.5 vs. 52.8 test error percentage of the best submission with the Wilcoxon signed-rank test value of 302.5 (p < 10-12)], the releases of data, tools, and evaluations contribute to the body of knowledge on the task difficulty and method suitability.
Original language | English |
---|---|
Title of host publication | the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015 |
Subtitle of host publication | 16th Conference and Labs of the Evaluation Forum, CLEF 2015 |
Editors | Linda Cappellato, Nicola Ferro, Gareth J.F. Jones, Eric San Juan |
Place of Publication | Toulouse, France |
Publisher | CEUR Workshop Proceedings |
Pages | 1-18 |
Number of pages | 18 |
Volume | 1391 |
Publication status | Published - 8 Sep 2015 |
Event | 6th International Conference on Labs of the Evaluation Forum, CLEF 2015 - Toulouse, Toulouse, France Duration: 8 Sep 2015 → 11 Sep 2015 http://clef2015.clef-initiative.eu/publications.php |
Publication series
Name | CLEF2015 Working Notes |
---|---|
Publisher | CEUR Worshop Proceedings |
Volume | 1391 |
ISSN (Print) | 1613-0073 |
Conference
Conference | 6th International Conference on Labs of the Evaluation Forum, CLEF 2015 |
---|---|
Abbreviated title | CLEF 2015 |
Country | France |
City | Toulouse |
Period | 8/09/15 → 11/09/15 |
Other | CLEF 2015 is the sixth CLEF conference continuing the popular CLEF campaigns which have run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. Building on the format first introduced in 2010, CLEF 2015 consists of an independent peer-reviewed conference on a broad range of issues in the fields of multilingual and multimodal information access evaluation, and a set of labs and workshops designed to test different aspects of mono and cross-language Information retrieval systems. Together, the conference and the lab series will maintain and expand upon the CLEF tradition of community-based evaluation and discussion on evaluation issues |
Internet address |
Fingerprint
Cite this
}
Task 1a of the CLEF eHealth Evaluation Lab 2015. / Suominen, Hanna; Hanlen, Leif; Goeuriot, Lorraine; Kelly, Liadh; Jones, Gareth J.F.
the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015: 16th Conference and Labs of the Evaluation Forum, CLEF 2015. ed. / Linda Cappellato; Nicola Ferro; Gareth J.F. Jones; Eric San Juan. Vol. 1391 Toulouse, France : CEUR Workshop Proceedings, 2015. p. 1-18 (CLEF2015 Working Notes; Vol. 1391).Research output: A Conference proceeding or a Chapter in Book › Conference contribution
TY - GEN
T1 - Task 1a of the CLEF eHealth Evaluation Lab 2015
AU - Suominen, Hanna
AU - Hanlen, Leif
AU - Goeuriot, Lorraine
AU - Kelly, Liadh
AU - Jones, Gareth J.F.
PY - 2015/9/8
Y1 - 2015/9/8
N2 - Best practice for clinical handover and its documentation recommends standardized, structured, and synchronous processes with patient involvement. Cascaded speech recognition (SR) and information extraction could support their compliance and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges. First, multiple people speak clinical jargon in the presence of background noise with limited possibilities for SR personalization. Second, errors multiply in cascading and hence, SR correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015. The task released 100 synthetic handover documents for training and another 100 documents for testing in both verbal and written formats. It attracted 48 team registrations, 21 email confirmations, and four method submissions by two teams. The submissions were compared against a leading commercial SR engine and simple majority baseline. Although this engine performed significantly better than any submission [i.e., 38.5 vs. 52.8 test error percentage of the best submission with the Wilcoxon signed-rank test value of 302.5 (p < 10-12)], the releases of data, tools, and evaluations contribute to the body of knowledge on the task difficulty and method suitability.
AB - Best practice for clinical handover and its documentation recommends standardized, structured, and synchronous processes with patient involvement. Cascaded speech recognition (SR) and information extraction could support their compliance and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges. First, multiple people speak clinical jargon in the presence of background noise with limited possibilities for SR personalization. Second, errors multiply in cascading and hence, SR correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015. The task released 100 synthetic handover documents for training and another 100 documents for testing in both verbal and written formats. It attracted 48 team registrations, 21 email confirmations, and four method submissions by two teams. The submissions were compared against a leading commercial SR engine and simple majority baseline. Although this engine performed significantly better than any submission [i.e., 38.5 vs. 52.8 test error percentage of the best submission with the Wilcoxon signed-rank test value of 302.5 (p < 10-12)], the releases of data, tools, and evaluations contribute to the body of knowledge on the task difficulty and method suitability.
KW - Computer systems evaluation
KW - Data collection
KW - Information extraction
KW - Medical informatics
KW - Nursing records
KW - Patient Hand-over
KW - Patient handoff
KW - Records as topic
KW - Software design
KW - Speech recognition
KW - Test-set generation
UR - http://www.scopus.com/inward/record.url?scp=84982805922&partnerID=8YFLogxK
M3 - Conference contribution
VL - 1391
T3 - CLEF2015 Working Notes
SP - 1
EP - 18
BT - the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015
A2 - Cappellato, Linda
A2 - Ferro, Nicola
A2 - Jones, Gareth J.F.
A2 - San Juan, Eric
PB - CEUR Workshop Proceedings
CY - Toulouse, France
ER -