VQ-Based Written Language Identification

Dat Tran, T Pham

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    4 Citations (Scopus)

    Abstract

    Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method
    Original languageEnglish
    Title of host publicationProceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications
    EditorsK Abed-Meraim, I Bloch
    Place of PublicationFrance
    PublisherIEEE
    Pages513-516
    Number of pages4
    ISBN (Print)0-7803-7947-0
    DOIs
    Publication statusPublished - 2003
    Event7th International Symposium on Signal Processing and Its Applications - , France
    Duration: 1 Jul 20034 Jul 2003

    Conference

    Conference7th International Symposium on Signal Processing and Its Applications
    CountryFrance
    Period1/07/034/07/03

    Fingerprint

    Vector quantization
    Learning systems

    Cite this

    Tran, D., & Pham, T. (2003). VQ-Based Written Language Identification. In K. Abed-Meraim, & I. Bloch (Eds.), Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications (pp. 513-516). France: IEEE. https://doi.org/10.1109/ISSPA.2003.1224752
    Tran, Dat ; Pham, T. / VQ-Based Written Language Identification. Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications. editor / K Abed-Meraim ; I Bloch. France : IEEE, 2003. pp. 513-516
    @inproceedings{9b639261eaad4befb7b2b68c452448ca,
    title = "VQ-Based Written Language Identification",
    abstract = "Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method",
    author = "Dat Tran and T Pham",
    year = "2003",
    doi = "10.1109/ISSPA.2003.1224752",
    language = "English",
    isbn = "0-7803-7947-0",
    pages = "513--516",
    editor = "K Abed-Meraim and I Bloch",
    booktitle = "Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications",
    publisher = "IEEE",

    }

    Tran, D & Pham, T 2003, VQ-Based Written Language Identification. in K Abed-Meraim & I Bloch (eds), Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications. IEEE, France, pp. 513-516, 7th International Symposium on Signal Processing and Its Applications, France, 1/07/03. https://doi.org/10.1109/ISSPA.2003.1224752

    VQ-Based Written Language Identification. / Tran, Dat; Pham, T.

    Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications. ed. / K Abed-Meraim; I Bloch. France : IEEE, 2003. p. 513-516.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - VQ-Based Written Language Identification

    AU - Tran, Dat

    AU - Pham, T

    PY - 2003

    Y1 - 2003

    N2 - Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method

    AB - Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method

    U2 - 10.1109/ISSPA.2003.1224752

    DO - 10.1109/ISSPA.2003.1224752

    M3 - Conference contribution

    SN - 0-7803-7947-0

    SP - 513

    EP - 516

    BT - Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications

    A2 - Abed-Meraim, K

    A2 - Bloch, I

    PB - IEEE

    CY - France

    ER -

    Tran D, Pham T. VQ-Based Written Language Identification. In Abed-Meraim K, Bloch I, editors, Proceedings of 2003 Seventh International Symposium on Signal Processing and Its Applications. France: IEEE. 2003. p. 513-516 https://doi.org/10.1109/ISSPA.2003.1224752