Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling

Method and clinical application

Filip Ginter, Hanna Suominen, Sampo Pyysalo, Tapio I. Salakoski

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results: The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

Original languageEnglish
Pages (from-to)e1-e6
Number of pages6
JournalInternational Journal of Medical Informatics
Volume78
Issue number12
DOIs
Publication statusPublished - Dec 2009
Externally publishedYes

Fingerprint

Semantics
Critical Care Nursing

Cite this

@article{addab0d68aaf468f908bf312b456c8e8,
title = "Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application",
abstract = "Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results: The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.",
keywords = "Computerized patient records, Hidden Markov models, Information retrieval, Latent semantic analysis, Nursing, Topic classification, Topic segmentation",
author = "Filip Ginter and Hanna Suominen and Sampo Pyysalo and Salakoski, {Tapio I.}",
year = "2009",
month = "12",
doi = "10.1016/j.ijmedinf.2009.02.003",
language = "English",
volume = "78",
pages = "e1--e6",
journal = "International Journal of Bio-Medical Computing",
issn = "1386-5056",
publisher = "Elsevier Ireland Ltd",
number = "12",

}

Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling : Method and clinical application. / Ginter, Filip; Suominen, Hanna; Pyysalo, Sampo; Salakoski, Tapio I.

In: International Journal of Medical Informatics, Vol. 78, No. 12, 12.2009, p. e1-e6.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling

T2 - Method and clinical application

AU - Ginter, Filip

AU - Suominen, Hanna

AU - Pyysalo, Sampo

AU - Salakoski, Tapio I.

PY - 2009/12

Y1 - 2009/12

N2 - Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results: The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

AB - Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results: The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

KW - Computerized patient records

KW - Hidden Markov models

KW - Information retrieval

KW - Latent semantic analysis

KW - Nursing

KW - Topic classification

KW - Topic segmentation

UR - http://www.scopus.com/inward/record.url?scp=71849108574&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2009.02.003

DO - 10.1016/j.ijmedinf.2009.02.003

M3 - Article

VL - 78

SP - e1-e6

JO - International Journal of Bio-Medical Computing

JF - International Journal of Bio-Medical Computing

SN - 1386-5056

IS - 12

ER -