Abstract
Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. We introduce an unsupervised method based on a combination of Hidden Markov Models and latent semantic indexing which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. The method is evaluated in an application domain of intensive care nursing narratives. It is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.
Original language | English |
---|---|
Title of host publication | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Proceedings |
Place of Publication | Turku, Finland |
Pages | 37-44 |
Number of pages | 8 |
Publication status | Published - 1 Sept 2008 |
Externally published | Yes |
Event | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Turku, Finland Duration: 1 Sept 2008 → 3 Sept 2008 |
Conference
Conference | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 |
---|---|
Country/Territory | Finland |
City | Turku |
Period | 1/09/08 → 3/09/08 |