PHIs (Protected Health Information) identification from free text clinical records based on machine learning

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

To preserve patient confidentiality, there is a need to identify PHIs (Protected Health Information) from free text text clinical records, and such sensitive information must either be removed or replaced. Identification of the PHI's are normally performed manually on large sets of structured EHR databases, which is time-consuming, prohibitively expensive and error-prone. Hence, methods for automatic or semi-automatic identification of personal health information are of significant scientific and commercial interest. In this paper, we propose an innovative computational framework based on novel text mining and machine learning algorithms for automatic identification of PHIs from massive, unstructured free text clinical records, discharge summaries and other care documents. The experimental evaluation of the proposed algorithmic framework development, for several publicly available i2b2 challenge datasets from Informatics for Integrating Biology & the Bedside (i2b2) shared tasks, has shown promising outcomes.

Original languageEnglish
Title of host publicationProceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017)
Place of PublicationHawaii
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages9
Volume2018-January
ISBN (Electronic)9781538627259
ISBN (Print)9781538627273
DOIs
Publication statusPublished - 27 Nov 2017
Event2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Honolulu, United States
Duration: 27 Nov 20171 Dec 2017

Conference

Conference2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017
CountryUnited States
CityHonolulu
Period27/11/171/12/17

Fingerprint

Learning systems
Identification (control systems)
Machine Learning
Health
Learning algorithms
Text Mining
Confidentiality
Experimental Evaluation
Large Set
Biology
Learning Algorithm
Text
Framework

Cite this

Rajput, K., Chetty, G., & Davey, R. (2017). PHIs (Protected Health Information) identification from free text clinical records based on machine learning. In Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017) (Vol. 2018-January). Hawaii: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/SSCI.2017.8285286
Rajput, Kunal ; Chetty, Girija ; Davey, Rachel. / PHIs (Protected Health Information) identification from free text clinical records based on machine learning. Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017). Vol. 2018-January Hawaii : IEEE, Institute of Electrical and Electronics Engineers, 2017.
@inproceedings{5d62e4b577924731938975bc43f625c5,
title = "PHIs (Protected Health Information) identification from free text clinical records based on machine learning",
abstract = "To preserve patient confidentiality, there is a need to identify PHIs (Protected Health Information) from free text text clinical records, and such sensitive information must either be removed or replaced. Identification of the PHI's are normally performed manually on large sets of structured EHR databases, which is time-consuming, prohibitively expensive and error-prone. Hence, methods for automatic or semi-automatic identification of personal health information are of significant scientific and commercial interest. In this paper, we propose an innovative computational framework based on novel text mining and machine learning algorithms for automatic identification of PHIs from massive, unstructured free text clinical records, discharge summaries and other care documents. The experimental evaluation of the proposed algorithmic framework development, for several publicly available i2b2 challenge datasets from Informatics for Integrating Biology & the Bedside (i2b2) shared tasks, has shown promising outcomes.",
keywords = "Clinical, De-identified Health Records, Machine Learning, NLP text features, Protected Health Information (PHI)",
author = "Kunal Rajput and Girija Chetty and Rachel Davey",
year = "2017",
month = "11",
day = "27",
doi = "10.1109/SSCI.2017.8285286",
language = "English",
isbn = "9781538627273",
volume = "2018-January",
booktitle = "Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017)",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States",

}

Rajput, K, Chetty, G & Davey, R 2017, PHIs (Protected Health Information) identification from free text clinical records based on machine learning. in Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017). vol. 2018-January, IEEE, Institute of Electrical and Electronics Engineers, Hawaii, 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017, Honolulu, United States, 27/11/17. https://doi.org/10.1109/SSCI.2017.8285286

PHIs (Protected Health Information) identification from free text clinical records based on machine learning. / Rajput, Kunal; Chetty, Girija; Davey, Rachel.

Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017). Vol. 2018-January Hawaii : IEEE, Institute of Electrical and Electronics Engineers, 2017.

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - PHIs (Protected Health Information) identification from free text clinical records based on machine learning

AU - Rajput, Kunal

AU - Chetty, Girija

AU - Davey, Rachel

PY - 2017/11/27

Y1 - 2017/11/27

N2 - To preserve patient confidentiality, there is a need to identify PHIs (Protected Health Information) from free text text clinical records, and such sensitive information must either be removed or replaced. Identification of the PHI's are normally performed manually on large sets of structured EHR databases, which is time-consuming, prohibitively expensive and error-prone. Hence, methods for automatic or semi-automatic identification of personal health information are of significant scientific and commercial interest. In this paper, we propose an innovative computational framework based on novel text mining and machine learning algorithms for automatic identification of PHIs from massive, unstructured free text clinical records, discharge summaries and other care documents. The experimental evaluation of the proposed algorithmic framework development, for several publicly available i2b2 challenge datasets from Informatics for Integrating Biology & the Bedside (i2b2) shared tasks, has shown promising outcomes.

AB - To preserve patient confidentiality, there is a need to identify PHIs (Protected Health Information) from free text text clinical records, and such sensitive information must either be removed or replaced. Identification of the PHI's are normally performed manually on large sets of structured EHR databases, which is time-consuming, prohibitively expensive and error-prone. Hence, methods for automatic or semi-automatic identification of personal health information are of significant scientific and commercial interest. In this paper, we propose an innovative computational framework based on novel text mining and machine learning algorithms for automatic identification of PHIs from massive, unstructured free text clinical records, discharge summaries and other care documents. The experimental evaluation of the proposed algorithmic framework development, for several publicly available i2b2 challenge datasets from Informatics for Integrating Biology & the Bedside (i2b2) shared tasks, has shown promising outcomes.

KW - Clinical

KW - De-identified Health Records

KW - Machine Learning

KW - NLP text features

KW - Protected Health Information (PHI)

UR - http://www.scopus.com/inward/record.url?scp=85046080108&partnerID=8YFLogxK

U2 - 10.1109/SSCI.2017.8285286

DO - 10.1109/SSCI.2017.8285286

M3 - Conference contribution

SN - 9781538627273

VL - 2018-January

BT - Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017)

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - Hawaii

ER -

Rajput K, Chetty G, Davey R. PHIs (Protected Health Information) identification from free text clinical records based on machine learning. In Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017). Vol. 2018-January. Hawaii: IEEE, Institute of Electrical and Electronics Engineers. 2017 https://doi.org/10.1109/SSCI.2017.8285286