TY - GEN
T1 - PHIs (Protected Health Information) identification from free text clinical records based on machine learning
AU - Rajput, Kunal
AU - Chetty, Girija
AU - Davey, Rachel
PY - 2017/11/27
Y1 - 2017/11/27
N2 - To preserve patient confidentiality, there is a need to identify PHIs (Protected Health Information) from free text text clinical records, and such sensitive information must either be removed or replaced. Identification of the PHI's are normally performed manually on large sets of structured EHR databases, which is time-consuming, prohibitively expensive and error-prone. Hence, methods for automatic or semi-automatic identification of personal health information are of significant scientific and commercial interest. In this paper, we propose an innovative computational framework based on novel text mining and machine learning algorithms for automatic identification of PHIs from massive, unstructured free text clinical records, discharge summaries and other care documents. The experimental evaluation of the proposed algorithmic framework development, for several publicly available i2b2 challenge datasets from Informatics for Integrating Biology & the Bedside (i2b2) shared tasks, has shown promising outcomes.
AB - To preserve patient confidentiality, there is a need to identify PHIs (Protected Health Information) from free text text clinical records, and such sensitive information must either be removed or replaced. Identification of the PHI's are normally performed manually on large sets of structured EHR databases, which is time-consuming, prohibitively expensive and error-prone. Hence, methods for automatic or semi-automatic identification of personal health information are of significant scientific and commercial interest. In this paper, we propose an innovative computational framework based on novel text mining and machine learning algorithms for automatic identification of PHIs from massive, unstructured free text clinical records, discharge summaries and other care documents. The experimental evaluation of the proposed algorithmic framework development, for several publicly available i2b2 challenge datasets from Informatics for Integrating Biology & the Bedside (i2b2) shared tasks, has shown promising outcomes.
KW - Clinical
KW - De-identified Health Records
KW - Machine Learning
KW - NLP text features
KW - Protected Health Information (PHI)
UR - http://www.scopus.com/inward/record.url?scp=85046080108&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/19879ea4-6cd7-3d04-a48a-9add6e780108/
UR - https://www.ele.uri.edu/ieee-ssci2017/
U2 - 10.1109/SSCI.2017.8285286
DO - 10.1109/SSCI.2017.8285286
M3 - Conference contribution
AN - SCOPUS:85046080108
SN - 9781538627273
VL - 2018-January
T3 - 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings
SP - 1
EP - 9
BT - Proceedings 2017 IEEE Symposium Series on Computational Intelligence (SSCI 2017)
A2 - Bonissone, Piero
A2 - Fogel, David
PB - IEEE, Institute of Electrical and Electronics Engineers
CY - Hawaii
T2 - IEEE Symposium Series on Computational Intelligence 2017
Y2 - 27 November 2017 through 1 December 2017
ER -