VQ-Based Written Language Identification

Dat Tran, T Pham

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    4 Citations (Scopus)

    Abstract

    Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method
    Original languageEnglish
    Title of host publicationProceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications
    EditorsK Abed-Meraim, I Bloch
    Place of PublicationFrance
    PublisherIEEE
    Pages513-516
    Number of pages4
    ISBN (Print)0-7803-7947-0
    DOIs
    Publication statusPublished - 2003
    Event7th International Symposium on Signal Processing and Its Applications - , France
    Duration: 1 Jul 20034 Jul 2003

    Conference

    Conference7th International Symposium on Signal Processing and Its Applications
    CountryFrance
    Period1/07/034/07/03

    Fingerprint Dive into the research topics of 'VQ-Based Written Language Identification'. Together they form a unique fingerprint.

  • Cite this

    Tran, D., & Pham, T. (2003). VQ-Based Written Language Identification. In K. Abed-Meraim, & I. Bloch (Eds.), Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications (pp. 513-516). IEEE. https://doi.org/10.1109/ISSPA.2003.1224752