TY - JOUR
T1 - Normalizing acronyms and abbreviations to aid patient understanding of clinical texts
T2 - ShARe/CLEF eHealth Challenge 2013, Task 2
AU - Mowery, Danielle L.
AU - South, Brett R.
AU - Christensen, Lee
AU - Leng, Jianwei
AU - Peltonen, Laura Maria
AU - Salanterä, Sanna
AU - Suominen, Hanna
AU - Martinez, David
AU - Velupillai, Sumithra
AU - Elhadad, Noémie
AU - Savova, Guergana
AU - Pradhan, Sameer
AU - Chapman, Wendy W.
N1 - Funding Information:
We extend our gratitude to our funding sources, natural language processing experts, and annotators for their invaluable contributions. We thank Ken Pierce for working with us to complete task 2 data requests from Physionet.org and Qing Zeng for making the Consumer Health Vocabulary available to the community. We appreciate the useful feedback and suggestions from our anonymous reviewers. This work was partially funded by NICTA, which was supported by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Center of Excellence Program, the CLEF Initiative, European Science Foundation (ESF) project ELIAS, Khresmoi project, funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 257528, ShARe project funded by the US National Institutes of Health (R01GM090187), US Department of Veterans Affairs (VA) Consortium for Healthcare Informatics Research (CHIR), US Office of the National Coordinator of Healthcare Technology, Strategic Health IT Advanced Research Projects (SHARP) 90TR0002, Vårdal Foundation (Sweden), Academy of Finland (140323), and National Library of Medicine 5T15LM007059.
Publisher Copyright:
© 2016 Mowery et al.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.
AB - The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.
KW - Abbreviations
KW - Acronyms
KW - Consumer health information
KW - Natural language processing
KW - Unified Medical Language System
UR - http://www.scopus.com/inward/record.url?scp=84976876680&partnerID=8YFLogxK
U2 - 10.1186/s13326-016-0084-y
DO - 10.1186/s13326-016-0084-y
M3 - Article
AN - SCOPUS:84976876680
SN - 2041-1480
VL - 7
SP - 1
EP - 13
JO - Journal of Biomedical Semantics
JF - Journal of Biomedical Semantics
IS - 1
M1 - 43
ER -