VQ-Based Written Language Identification

Dat Tran, T Pham

    Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review

    6 Citations (Scopus)
    18 Downloads (Pure)

    Abstract

    Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method
    Original languageEnglish
    Title of host publicationProceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications
    EditorsK Abed-Meraim, I Bloch
    Place of PublicationFrance
    PublisherIEEE
    Pages513-516
    Number of pages4
    ISBN (Print)0-7803-7947-0
    DOIs
    Publication statusPublished - 2003
    Event7th International Symposium on Signal Processing and Its Applications - , France
    Duration: 1 Jul 20034 Jul 2003

    Conference

    Conference7th International Symposium on Signal Processing and Its Applications
    Country/TerritoryFrance
    Period1/07/034/07/03

    Fingerprint

    Dive into the research topics of 'VQ-Based Written Language Identification'. Together they form a unique fingerprint.

    Cite this