Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application

Filip Ginter, Hanna Suominen, Sampo Pyysalo, Tapio I. Salakoski

Research output: A Conference proceeding or a Chapter in BookConference contribution

1 Citation (Scopus)

Abstract

Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. We introduce an unsupervised method based on a combination of Hidden Markov Models and latent semantic indexing which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. The method is evaluated in an application domain of intensive care nursing narratives. It is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

Original languageEnglish
Title of host publication3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings
Place of PublicationTurku, Finland
Pages37-44
Number of pages8
Publication statusPublished - 1 Sep 2008
Externally publishedYes
Event3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Turku, Finland
Duration: 1 Sep 20083 Sep 2008

Conference

Conference3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008
CountryFinland
CityTurku
Period1/09/083/09/08

Fingerprint

Nursing
Hidden Markov models
Labeling
Semantics

Cite this

Ginter, F., Suominen, H., Pyysalo, S., & Salakoski, T. I. (2008). Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application. In 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings (pp. 37-44). Turku, Finland.
Ginter, Filip ; Suominen, Hanna ; Pyysalo, Sampo ; Salakoski, Tapio I. / Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application. 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings. Turku, Finland, 2008. pp. 37-44
@inproceedings{5ec68c5afcf548a8a0931799727871ea,
title = "Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application",
abstract = "Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. We introduce an unsupervised method based on a combination of Hidden Markov Models and latent semantic indexing which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. The method is evaluated in an application domain of intensive care nursing narratives. It is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.",
keywords = "Hidden Markov Model, latent semantic analysis, topic segmentation",
author = "Filip Ginter and Hanna Suominen and Sampo Pyysalo and Salakoski, {Tapio I.}",
year = "2008",
month = "9",
day = "1",
language = "English",
pages = "37--44",
booktitle = "3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings",

}

Ginter, F, Suominen, H, Pyysalo, S & Salakoski, TI 2008, Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application. in 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings. Turku, Finland, pp. 37-44, 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008, Turku, Finland, 1/09/08.

Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application. / Ginter, Filip; Suominen, Hanna; Pyysalo, Sampo; Salakoski, Tapio I.

3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings. Turku, Finland, 2008. p. 37-44.

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application

AU - Ginter, Filip

AU - Suominen, Hanna

AU - Pyysalo, Sampo

AU - Salakoski, Tapio I.

PY - 2008/9/1

Y1 - 2008/9/1

N2 - Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. We introduce an unsupervised method based on a combination of Hidden Markov Models and latent semantic indexing which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. The method is evaluated in an application domain of intensive care nursing narratives. It is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

AB - Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. We introduce an unsupervised method based on a combination of Hidden Markov Models and latent semantic indexing which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. The method is evaluated in an application domain of intensive care nursing narratives. It is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.

KW - Hidden Markov Model

KW - latent semantic analysis

KW - topic segmentation

UR - http://www.scopus.com/inward/record.url?scp=71849089213&partnerID=8YFLogxK

M3 - Conference contribution

SP - 37

EP - 44

BT - 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings

CY - Turku, Finland

ER -

Ginter F, Suominen H, Pyysalo S, Salakoski TI. Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application. In 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings. Turku, Finland. 2008. p. 37-44