Spam Recognition using Linear Regression and Radial Basis Function Neural Network

Tich Phuoc Tran, Min Li, Dat Tran, Dam Duong Ton

    Research output: A Conference proceeding or a Chapter in BookChapter

    Abstract

    Spamming is the abuse of electronic messaging systems to send unsolicited bulk messages. It is becoming a serious problem for organizations and individual email users due to the growing popularity and low cost of electronic mails. Unlike other web threats such as hacking and Internet worms which directly damage our information assets, spam could harm the computer networks in an indirect way ranging from network problems like increased server load, decreased network performance and viruses to personnel issues like lost employee time, phishing scams, and offensive content. Though a large amount of research has been conducted in this area to prevent spamming from undermining the usability of email, currently existing filtering methods' performance still suffers from extensive computation (with large volume of emails received) and unreliable predictive capability (due to highly dynamic nature of emails). In this chapter, we discuss the challenging problems of Spam Recognition and then propose an anti-spam filtering framework; in which appropriate dimension reduction schemes and powerful classification models are employed. In particular, Principal Component Analysis transforms data to a lower dimensional space which is subsequently used to train an Artificial Neural Network based classifier. A cost-sensitive empirical analysis with a publicly available email corpus, namely Ling-Spam, suggests that our spam recognition framework outperforms other state¬of-the-art learning methods in terms of spam detection capability. In the case of extremely high misclassification cost, while other methods' performance deteriorates significantly as the cost factor increases, our model still remains stable accuracy with low computation cost.
    Original languageEnglish
    Title of host publicationPattern Recognition
    EditorsPeng-Yeng Yin
    Place of PublicationIndia
    PublisherIn-Tech
    Pages513-532
    Number of pages20
    Edition1
    ISBN (Print)9789533070148
    DOIs
    Publication statusPublished - 2009

    Fingerprint

    Electronic mail
    Linear regression
    Neural networks
    Spamming
    Costs
    Computer networks
    Personnel
    Network performance
    Viruses
    Principal component analysis
    Classifiers
    Servers
    Internet

    Cite this

    Tran, T. P., Li, M., Tran, D., & Ton, D. D. (2009). Spam Recognition using Linear Regression and Radial Basis Function Neural Network. In P-Y. Yin (Ed.), Pattern Recognition (1 ed., pp. 513-532). India: In-Tech. https://doi.org/10.5772/7529
    Tran, Tich Phuoc ; Li, Min ; Tran, Dat ; Ton, Dam Duong. / Spam Recognition using Linear Regression and Radial Basis Function Neural Network. Pattern Recognition. editor / Peng-Yeng Yin. 1. ed. India : In-Tech, 2009. pp. 513-532
    @inbook{072b974136684af0b4ab2aa29575cee9,
    title = "Spam Recognition using Linear Regression and Radial Basis Function Neural Network",
    abstract = "Spamming is the abuse of electronic messaging systems to send unsolicited bulk messages. It is becoming a serious problem for organizations and individual email users due to the growing popularity and low cost of electronic mails. Unlike other web threats such as hacking and Internet worms which directly damage our information assets, spam could harm the computer networks in an indirect way ranging from network problems like increased server load, decreased network performance and viruses to personnel issues like lost employee time, phishing scams, and offensive content. Though a large amount of research has been conducted in this area to prevent spamming from undermining the usability of email, currently existing filtering methods' performance still suffers from extensive computation (with large volume of emails received) and unreliable predictive capability (due to highly dynamic nature of emails). In this chapter, we discuss the challenging problems of Spam Recognition and then propose an anti-spam filtering framework; in which appropriate dimension reduction schemes and powerful classification models are employed. In particular, Principal Component Analysis transforms data to a lower dimensional space which is subsequently used to train an Artificial Neural Network based classifier. A cost-sensitive empirical analysis with a publicly available email corpus, namely Ling-Spam, suggests that our spam recognition framework outperforms other state¬of-the-art learning methods in terms of spam detection capability. In the case of extremely high misclassification cost, while other methods' performance deteriorates significantly as the cost factor increases, our model still remains stable accuracy with low computation cost.",
    author = "Tran, {Tich Phuoc} and Min Li and Dat Tran and Ton, {Dam Duong}",
    year = "2009",
    doi = "10.5772/7529",
    language = "English",
    isbn = "9789533070148",
    pages = "513--532",
    editor = "Peng-Yeng Yin",
    booktitle = "Pattern Recognition",
    publisher = "In-Tech",
    edition = "1",

    }

    Tran, TP, Li, M, Tran, D & Ton, DD 2009, Spam Recognition using Linear Regression and Radial Basis Function Neural Network. in P-Y Yin (ed.), Pattern Recognition. 1 edn, In-Tech, India, pp. 513-532. https://doi.org/10.5772/7529

    Spam Recognition using Linear Regression and Radial Basis Function Neural Network. / Tran, Tich Phuoc; Li, Min; Tran, Dat; Ton, Dam Duong.

    Pattern Recognition. ed. / Peng-Yeng Yin. 1. ed. India : In-Tech, 2009. p. 513-532.

    Research output: A Conference proceeding or a Chapter in BookChapter

    TY - CHAP

    T1 - Spam Recognition using Linear Regression and Radial Basis Function Neural Network

    AU - Tran, Tich Phuoc

    AU - Li, Min

    AU - Tran, Dat

    AU - Ton, Dam Duong

    PY - 2009

    Y1 - 2009

    N2 - Spamming is the abuse of electronic messaging systems to send unsolicited bulk messages. It is becoming a serious problem for organizations and individual email users due to the growing popularity and low cost of electronic mails. Unlike other web threats such as hacking and Internet worms which directly damage our information assets, spam could harm the computer networks in an indirect way ranging from network problems like increased server load, decreased network performance and viruses to personnel issues like lost employee time, phishing scams, and offensive content. Though a large amount of research has been conducted in this area to prevent spamming from undermining the usability of email, currently existing filtering methods' performance still suffers from extensive computation (with large volume of emails received) and unreliable predictive capability (due to highly dynamic nature of emails). In this chapter, we discuss the challenging problems of Spam Recognition and then propose an anti-spam filtering framework; in which appropriate dimension reduction schemes and powerful classification models are employed. In particular, Principal Component Analysis transforms data to a lower dimensional space which is subsequently used to train an Artificial Neural Network based classifier. A cost-sensitive empirical analysis with a publicly available email corpus, namely Ling-Spam, suggests that our spam recognition framework outperforms other state¬of-the-art learning methods in terms of spam detection capability. In the case of extremely high misclassification cost, while other methods' performance deteriorates significantly as the cost factor increases, our model still remains stable accuracy with low computation cost.

    AB - Spamming is the abuse of electronic messaging systems to send unsolicited bulk messages. It is becoming a serious problem for organizations and individual email users due to the growing popularity and low cost of electronic mails. Unlike other web threats such as hacking and Internet worms which directly damage our information assets, spam could harm the computer networks in an indirect way ranging from network problems like increased server load, decreased network performance and viruses to personnel issues like lost employee time, phishing scams, and offensive content. Though a large amount of research has been conducted in this area to prevent spamming from undermining the usability of email, currently existing filtering methods' performance still suffers from extensive computation (with large volume of emails received) and unreliable predictive capability (due to highly dynamic nature of emails). In this chapter, we discuss the challenging problems of Spam Recognition and then propose an anti-spam filtering framework; in which appropriate dimension reduction schemes and powerful classification models are employed. In particular, Principal Component Analysis transforms data to a lower dimensional space which is subsequently used to train an Artificial Neural Network based classifier. A cost-sensitive empirical analysis with a publicly available email corpus, namely Ling-Spam, suggests that our spam recognition framework outperforms other state¬of-the-art learning methods in terms of spam detection capability. In the case of extremely high misclassification cost, while other methods' performance deteriorates significantly as the cost factor increases, our model still remains stable accuracy with low computation cost.

    U2 - 10.5772/7529

    DO - 10.5772/7529

    M3 - Chapter

    SN - 9789533070148

    SP - 513

    EP - 532

    BT - Pattern Recognition

    A2 - Yin, Peng-Yeng

    PB - In-Tech

    CY - India

    ER -

    Tran TP, Li M, Tran D, Ton DD. Spam Recognition using Linear Regression and Radial Basis Function Neural Network. In Yin P-Y, editor, Pattern Recognition. 1 ed. India: In-Tech. 2009. p. 513-532 https://doi.org/10.5772/7529