Normalizing acronyms and abbreviations to aid patient understanding of clinical texts

ShARe/CLEF eHealth Challenge 2013, Task 2

Danielle L. Mowery, Brett R. South, Lee Christensen, Jianwei Leng, Laura Maria Peltonen, Sanna Salanterä, Hanna Suominen, David Martinez, Sumithra Velupillai, Noémie Elhadad, Guergana Savova, Sameer Pradhan, Wendy W. Chapman

    Research output: Contribution to journalArticle

    6 Citations (Scopus)
    1 Downloads (Pure)

    Abstract

    The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.

    Original languageEnglish
    Article number43
    Pages (from-to)1-13
    Number of pages13
    JournalJournal of Biomedical Semantics
    Volume7
    Issue number1
    DOIs
    Publication statusPublished - 1 Jul 2016

    Fingerprint

    Vocabulary
    Telemedicine
    Health
    Unified Medical Language System
    Natural Language Processing
    Information retrieval
    Knowledge Bases
    Information Storage and Retrieval
    Processing
    Technology

    Cite this

    Mowery, D. L., South, B. R., Christensen, L., Leng, J., Peltonen, L. M., Salanterä, S., ... Chapman, W. W. (2016). Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. Journal of Biomedical Semantics, 7(1), 1-13. [43]. https://doi.org/10.1186/s13326-016-0084-y
    Mowery, Danielle L. ; South, Brett R. ; Christensen, Lee ; Leng, Jianwei ; Peltonen, Laura Maria ; Salanterä, Sanna ; Suominen, Hanna ; Martinez, David ; Velupillai, Sumithra ; Elhadad, Noémie ; Savova, Guergana ; Pradhan, Sameer ; Chapman, Wendy W. / Normalizing acronyms and abbreviations to aid patient understanding of clinical texts : ShARe/CLEF eHealth Challenge 2013, Task 2. In: Journal of Biomedical Semantics. 2016 ; Vol. 7, No. 1. pp. 1-13.
    @article{22f04455b488441b960adba58d4f5109,
    title = "Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2",
    abstract = "The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 {\%}. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 {\%}) ranged from 52 to 78 {\%} accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 {\%}) ranged from 23 to 57 {\%} accuracy, and with two or more senses with high ambiguity (majority sense less than 50 {\%}) ranged from 2 to 45 {\%} accuracy. With respect to the ShARe test set, 69 {\%} of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 {\%} accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.",
    keywords = "Abbreviations, Acronyms, Consumer health information, Natural language processing, Unified Medical Language System",
    author = "Mowery, {Danielle L.} and South, {Brett R.} and Lee Christensen and Jianwei Leng and Peltonen, {Laura Maria} and Sanna Salanter{\"a} and Hanna Suominen and David Martinez and Sumithra Velupillai and No{\'e}mie Elhadad and Guergana Savova and Sameer Pradhan and Chapman, {Wendy W.}",
    year = "2016",
    month = "7",
    day = "1",
    doi = "10.1186/s13326-016-0084-y",
    language = "English",
    volume = "7",
    pages = "1--13",
    journal = "Journal of Biomedical Semantics",
    issn = "2041-1480",
    publisher = "BioMed Central",
    number = "1",

    }

    Mowery, DL, South, BR, Christensen, L, Leng, J, Peltonen, LM, Salanterä, S, Suominen, H, Martinez, D, Velupillai, S, Elhadad, N, Savova, G, Pradhan, S & Chapman, WW 2016, 'Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2', Journal of Biomedical Semantics, vol. 7, no. 1, 43, pp. 1-13. https://doi.org/10.1186/s13326-016-0084-y

    Normalizing acronyms and abbreviations to aid patient understanding of clinical texts : ShARe/CLEF eHealth Challenge 2013, Task 2. / Mowery, Danielle L.; South, Brett R.; Christensen, Lee; Leng, Jianwei; Peltonen, Laura Maria; Salanterä, Sanna; Suominen, Hanna; Martinez, David; Velupillai, Sumithra; Elhadad, Noémie; Savova, Guergana; Pradhan, Sameer; Chapman, Wendy W.

    In: Journal of Biomedical Semantics, Vol. 7, No. 1, 43, 01.07.2016, p. 1-13.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Normalizing acronyms and abbreviations to aid patient understanding of clinical texts

    T2 - ShARe/CLEF eHealth Challenge 2013, Task 2

    AU - Mowery, Danielle L.

    AU - South, Brett R.

    AU - Christensen, Lee

    AU - Leng, Jianwei

    AU - Peltonen, Laura Maria

    AU - Salanterä, Sanna

    AU - Suominen, Hanna

    AU - Martinez, David

    AU - Velupillai, Sumithra

    AU - Elhadad, Noémie

    AU - Savova, Guergana

    AU - Pradhan, Sameer

    AU - Chapman, Wendy W.

    PY - 2016/7/1

    Y1 - 2016/7/1

    N2 - The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.

    AB - The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.

    KW - Abbreviations

    KW - Acronyms

    KW - Consumer health information

    KW - Natural language processing

    KW - Unified Medical Language System

    UR - http://www.scopus.com/inward/record.url?scp=84976876680&partnerID=8YFLogxK

    U2 - 10.1186/s13326-016-0084-y

    DO - 10.1186/s13326-016-0084-y

    M3 - Article

    VL - 7

    SP - 1

    EP - 13

    JO - Journal of Biomedical Semantics

    JF - Journal of Biomedical Semantics

    SN - 2041-1480

    IS - 1

    M1 - 43

    ER -