Grammatical Dependency-Based Relations for Term Weighting in Text Classification

Dat Hyunh, Dat Tran, Wanli Ma, Dharmendra Sharma

Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review

1 Citation (Scopus)

Abstract

Term frequency and term co-occurrence are currently used to estimate term weightings in a document. However these methods do not employ relations based on grammatical dependency among terms to measure dependency between word features. In this paper, we propose a new approach that employs grammatical relations to estimate weightings of terms in a text document and present how to apply the term weighting scheme to text classification. A graph model is used to encode the extracted relations. A graph centrality algorithm is then applied to calculate scores that represent significance values of the terms in the document context. Experiments performed on many corpora with SVM classifier show that the proposed term weighting approach outperforms those based on term frequency and term co-occurrence
Original languageEnglish
Title of host publicationPacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011
Subtitle of host publicationAdvances in Knowledge Discovery and Data Mining
EditorsJoshua Zhexue, Huang Longbing Cao, Jaideep Srivastava
Place of PublicationBerlin
PublisherSpringer
Pages476-487
Number of pages12
Volume6634
ISBN (Electronic)9783642208416
ISBN (Print)9783642208409
DOIs
Publication statusPublished - 2011
Event15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining: PAKDD 2011 - Shenzen, Shenzen, China
Duration: 24 May 201127 May 2011
http://pakdd2011.pakdd.org/

Publication series

NameLNCS (Lecture Notes in Computer Science)
PublisherSpringer
Volume6634

Conference

Conference15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Abbreviated titlePAKDD 2011
Country/TerritoryChina
CityShenzen
Period24/05/1127/05/11
OtherThe Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is a leading international conference in the areas of data mining and knowledge discovery (KDD). It provides an international forum for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, aritificial intelligence, databases, statistics, knowledge engineering, visualization, and decision-making systems.
The conference calls for research papers reporting original investigation results and industrial papers reporting real data mining applications and system development experience. The conference will confer a Best Paper Award to the best full paper, and the Best Student Papers from amongst the student submissions. The proceedings of the conference will be published by Springer as a volume of the LNAI series. PAKDD2011 will be held in Shenzhen, one of the most attractive cities in China
Internet address

Fingerprint

Dive into the research topics of 'Grammatical Dependency-Based Relations for Term Weighting in Text Classification'. Together they form a unique fingerprint.

Cite this