Grammatical Dependency-Based Relations for Term Weighting in Text Classification

Research output: A Conference proceeding or a Chapter in BookConference contribution

1 Citation (Scopus)

Abstract

Term frequency and term co-occurrence are currently used to estimate term weightings in a document. However these methods do not employ relations based on grammatical dependency among terms to measure dependency between word features. In this paper, we propose a new approach that employs grammatical relations to estimate weightings of terms in a text document and present how to apply the term weighting scheme to text classification. A graph model is used to encode the extracted relations. A graph centrality algorithm is then applied to calculate scores that represent significance values of the terms in the document context. Experiments performed on many corpora with SVM classifier show that the proposed term weighting approach outperforms those based on term frequency and term co-occurrence
Original languageEnglish
Title of host publicationPacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011
Subtitle of host publicationAdvances in Knowledge Discovery and Data Mining
EditorsJoshua Zhexue, Huang Longbing Cao, Jaideep Srivastava
Place of PublicationBerlin
PublisherSpringer
Pages476-487
Number of pages12
Volume6634
ISBN (Electronic)9783642208416
ISBN (Print)9783642208409
DOIs
Publication statusPublished - 2011
Event15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining: PAKDD 2011 - Shenzen, Shenzen, China
Duration: 24 May 201127 May 2011
http://pakdd2011.pakdd.org/

Publication series

NameLNCS (Lecture Notes in Computer Science)
PublisherSpringer
Volume6634

Conference

Conference15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Abbreviated titlePAKDD 2011
CountryChina
CityShenzen
Period24/05/1127/05/11
OtherThe Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is a leading international conference in the areas of data mining and knowledge discovery (KDD). It provides an international forum for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, aritificial intelligence, databases, statistics, knowledge engineering, visualization, and decision-making systems.
The conference calls for research papers reporting original investigation results and industrial papers reporting real data mining applications and system development experience. The conference will confer a Best Paper Award to the best full paper, and the Best Student Papers from amongst the student submissions. The proceedings of the conference will be published by Springer as a volume of the LNAI series. PAKDD2011 will be held in Shenzhen, one of the most attractive cities in China
Internet address

Fingerprint

Classifiers
Experiments

Cite this

Hyunh, D., Tran, D., Ma, W., & Sharma, D. (2011). Grammatical Dependency-Based Relations for Term Weighting in Text Classification. In J. Zhexue, H. L. Cao, & J. Srivastava (Eds.), Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011: Advances in Knowledge Discovery and Data Mining (Vol. 6634, pp. 476-487). (LNCS (Lecture Notes in Computer Science); Vol. 6634). Berlin: Springer. https://doi.org/10.1007/978-3-642-20841-6_39
Hyunh, Dat ; Tran, Dat ; Ma, Wanli ; Sharma, Dharmendra. / Grammatical Dependency-Based Relations for Term Weighting in Text Classification. Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011: Advances in Knowledge Discovery and Data Mining . editor / Joshua Zhexue ; Huang Longbing Cao ; Jaideep Srivastava. Vol. 6634 Berlin : Springer, 2011. pp. 476-487 (LNCS (Lecture Notes in Computer Science)).
@inproceedings{990adbc3390440a7af00d52b90e45acf,
title = "Grammatical Dependency-Based Relations for Term Weighting in Text Classification",
abstract = "Term frequency and term co-occurrence are currently used to estimate term weightings in a document. However these methods do not employ relations based on grammatical dependency among terms to measure dependency between word features. In this paper, we propose a new approach that employs grammatical relations to estimate weightings of terms in a text document and present how to apply the term weighting scheme to text classification. A graph model is used to encode the extracted relations. A graph centrality algorithm is then applied to calculate scores that represent significance values of the terms in the document context. Experiments performed on many corpora with SVM classifier show that the proposed term weighting approach outperforms those based on term frequency and term co-occurrence",
keywords = "Machine Learning, Text Classification",
author = "Dat Hyunh and Dat Tran and Wanli Ma and Dharmendra Sharma",
year = "2011",
doi = "10.1007/978-3-642-20841-6_39",
language = "English",
isbn = "9783642208409",
volume = "6634",
series = "LNCS (Lecture Notes in Computer Science)",
publisher = "Springer",
pages = "476--487",
editor = "Joshua Zhexue and Cao, {Huang Longbing} and Jaideep Srivastava",
booktitle = "Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011",
address = "Netherlands",

}

Hyunh, D, Tran, D, Ma, W & Sharma, D 2011, Grammatical Dependency-Based Relations for Term Weighting in Text Classification. in J Zhexue, HL Cao & J Srivastava (eds), Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011: Advances in Knowledge Discovery and Data Mining . vol. 6634, LNCS (Lecture Notes in Computer Science), vol. 6634, Springer, Berlin, pp. 476-487, 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Shenzen, China, 24/05/11. https://doi.org/10.1007/978-3-642-20841-6_39

Grammatical Dependency-Based Relations for Term Weighting in Text Classification. / Hyunh, Dat; Tran, Dat; Ma, Wanli; Sharma, Dharmendra.

Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011: Advances in Knowledge Discovery and Data Mining . ed. / Joshua Zhexue; Huang Longbing Cao; Jaideep Srivastava. Vol. 6634 Berlin : Springer, 2011. p. 476-487 (LNCS (Lecture Notes in Computer Science); Vol. 6634).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Grammatical Dependency-Based Relations for Term Weighting in Text Classification

AU - Hyunh, Dat

AU - Tran, Dat

AU - Ma, Wanli

AU - Sharma, Dharmendra

PY - 2011

Y1 - 2011

N2 - Term frequency and term co-occurrence are currently used to estimate term weightings in a document. However these methods do not employ relations based on grammatical dependency among terms to measure dependency between word features. In this paper, we propose a new approach that employs grammatical relations to estimate weightings of terms in a text document and present how to apply the term weighting scheme to text classification. A graph model is used to encode the extracted relations. A graph centrality algorithm is then applied to calculate scores that represent significance values of the terms in the document context. Experiments performed on many corpora with SVM classifier show that the proposed term weighting approach outperforms those based on term frequency and term co-occurrence

AB - Term frequency and term co-occurrence are currently used to estimate term weightings in a document. However these methods do not employ relations based on grammatical dependency among terms to measure dependency between word features. In this paper, we propose a new approach that employs grammatical relations to estimate weightings of terms in a text document and present how to apply the term weighting scheme to text classification. A graph model is used to encode the extracted relations. A graph centrality algorithm is then applied to calculate scores that represent significance values of the terms in the document context. Experiments performed on many corpora with SVM classifier show that the proposed term weighting approach outperforms those based on term frequency and term co-occurrence

KW - Machine Learning

KW - Text Classification

U2 - 10.1007/978-3-642-20841-6_39

DO - 10.1007/978-3-642-20841-6_39

M3 - Conference contribution

SN - 9783642208409

VL - 6634

T3 - LNCS (Lecture Notes in Computer Science)

SP - 476

EP - 487

BT - Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011

A2 - Zhexue, Joshua

A2 - Cao, Huang Longbing

A2 - Srivastava, Jaideep

PB - Springer

CY - Berlin

ER -

Hyunh D, Tran D, Ma W, Sharma D. Grammatical Dependency-Based Relations for Term Weighting in Text Classification. In Zhexue J, Cao HL, Srivastava J, editors, Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD 2011: Advances in Knowledge Discovery and Data Mining . Vol. 6634. Berlin: Springer. 2011. p. 476-487. (LNCS (Lecture Notes in Computer Science)). https://doi.org/10.1007/978-3-642-20841-6_39