A computational linguistic approach for the identification of translator stylometry using Arabic-English text

Heba El-Fiqi, Eleni Petraki, Hussein Aly Abbass

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    6 Citations (Scopus)
    6 Downloads (Pure)

    Abstract

    Translator Stylometry is a small but growing area of research in computational linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient literature on the topic. Some authors even claimed that this problem does not have a solution; a claim we will challenge in this paper. We present an innovative set of translator stylometric features that can be used as signatures to detect and identify translators. The features are based on the concept of network motifs: small graph local substructures which have been used successfully in characterizing global network dynamics. The text is transformed into a network, where words become nodes and their adjacencies in a sentence are represented through links. Motifs of size 3 are then extracted from this network and their distribution is used as a signature for the corresponding translator.

    We then investigate the impact of sample size, method of normalization and imbalance dataset on classification accuracy. We also adopt the Fuzzy Lattice Reasoning Classifier (FLR) among others, where FLR achieved the best performance with a classification accuracy reaching the 70% mark
    Original languageEnglish
    Title of host publication2011 IEEE International Conference on Fuzzy Systems
    EditorsShyi Ming Chen
    Place of PublicationTaipei, Taiwan
    PublisherIEEE
    Pages2039-2045
    Number of pages7
    Volume1
    ISBN (Print)9781424473151
    DOIs
    Publication statusPublished - 2011
    EventIEEE International Conference on Fuzzy Systems - Taipei, Taiwan, Province of China
    Duration: 1 Jan 201130 Jun 2011

    Conference

    ConferenceIEEE International Conference on Fuzzy Systems
    Abbreviated titleFUZZ-IEEE
    CountryTaiwan, Province of China
    CityTaipei
    Period1/01/1130/06/11

    Fingerprint

    Computational linguistics
    Classifiers

    Cite this

    El-Fiqi, H., Petraki, E., & Abbass, H. A. (2011). A computational linguistic approach for the identification of translator stylometry using Arabic-English text. In S. M. Chen (Ed.), 2011 IEEE International Conference on Fuzzy Systems (Vol. 1, pp. 2039-2045). Taipei, Taiwan: IEEE. https://doi.org/10.1109/FUZZY.2011.6007535
    El-Fiqi, Heba ; Petraki, Eleni ; Abbass, Hussein Aly. / A computational linguistic approach for the identification of translator stylometry using Arabic-English text. 2011 IEEE International Conference on Fuzzy Systems. editor / Shyi Ming Chen. Vol. 1 Taipei, Taiwan : IEEE, 2011. pp. 2039-2045
    @inproceedings{28d1636688dc4b1ba685ef226d2363fa,
    title = "A computational linguistic approach for the identification of translator stylometry using Arabic-English text",
    abstract = "Translator Stylometry is a small but growing area of research in computational linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient literature on the topic. Some authors even claimed that this problem does not have a solution; a claim we will challenge in this paper. We present an innovative set of translator stylometric features that can be used as signatures to detect and identify translators. The features are based on the concept of network motifs: small graph local substructures which have been used successfully in characterizing global network dynamics. The text is transformed into a network, where words become nodes and their adjacencies in a sentence are represented through links. Motifs of size 3 are then extracted from this network and their distribution is used as a signature for the corresponding translator. We then investigate the impact of sample size, method of normalization and imbalance dataset on classification accuracy. We also adopt the Fuzzy Lattice Reasoning Classifier (FLR) among others, where FLR achieved the best performance with a classification accuracy reaching the 70{\%} mark",
    author = "Heba El-Fiqi and Eleni Petraki and Abbass, {Hussein Aly}",
    year = "2011",
    doi = "10.1109/FUZZY.2011.6007535",
    language = "English",
    isbn = "9781424473151",
    volume = "1",
    pages = "2039--2045",
    editor = "Chen, {Shyi Ming}",
    booktitle = "2011 IEEE International Conference on Fuzzy Systems",
    publisher = "IEEE",

    }

    El-Fiqi, H, Petraki, E & Abbass, HA 2011, A computational linguistic approach for the identification of translator stylometry using Arabic-English text. in SM Chen (ed.), 2011 IEEE International Conference on Fuzzy Systems. vol. 1, IEEE, Taipei, Taiwan, pp. 2039-2045, IEEE International Conference on Fuzzy Systems, Taipei, Taiwan, Province of China, 1/01/11. https://doi.org/10.1109/FUZZY.2011.6007535

    A computational linguistic approach for the identification of translator stylometry using Arabic-English text. / El-Fiqi, Heba; Petraki, Eleni; Abbass, Hussein Aly.

    2011 IEEE International Conference on Fuzzy Systems. ed. / Shyi Ming Chen. Vol. 1 Taipei, Taiwan : IEEE, 2011. p. 2039-2045.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - A computational linguistic approach for the identification of translator stylometry using Arabic-English text

    AU - El-Fiqi, Heba

    AU - Petraki, Eleni

    AU - Abbass, Hussein Aly

    PY - 2011

    Y1 - 2011

    N2 - Translator Stylometry is a small but growing area of research in computational linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient literature on the topic. Some authors even claimed that this problem does not have a solution; a claim we will challenge in this paper. We present an innovative set of translator stylometric features that can be used as signatures to detect and identify translators. The features are based on the concept of network motifs: small graph local substructures which have been used successfully in characterizing global network dynamics. The text is transformed into a network, where words become nodes and their adjacencies in a sentence are represented through links. Motifs of size 3 are then extracted from this network and their distribution is used as a signature for the corresponding translator. We then investigate the impact of sample size, method of normalization and imbalance dataset on classification accuracy. We also adopt the Fuzzy Lattice Reasoning Classifier (FLR) among others, where FLR achieved the best performance with a classification accuracy reaching the 70% mark

    AB - Translator Stylometry is a small but growing area of research in computational linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient literature on the topic. Some authors even claimed that this problem does not have a solution; a claim we will challenge in this paper. We present an innovative set of translator stylometric features that can be used as signatures to detect and identify translators. The features are based on the concept of network motifs: small graph local substructures which have been used successfully in characterizing global network dynamics. The text is transformed into a network, where words become nodes and their adjacencies in a sentence are represented through links. Motifs of size 3 are then extracted from this network and their distribution is used as a signature for the corresponding translator. We then investigate the impact of sample size, method of normalization and imbalance dataset on classification accuracy. We also adopt the Fuzzy Lattice Reasoning Classifier (FLR) among others, where FLR achieved the best performance with a classification accuracy reaching the 70% mark

    U2 - 10.1109/FUZZY.2011.6007535

    DO - 10.1109/FUZZY.2011.6007535

    M3 - Conference contribution

    SN - 9781424473151

    VL - 1

    SP - 2039

    EP - 2045

    BT - 2011 IEEE International Conference on Fuzzy Systems

    A2 - Chen, Shyi Ming

    PB - IEEE

    CY - Taipei, Taiwan

    ER -

    El-Fiqi H, Petraki E, Abbass HA. A computational linguistic approach for the identification of translator stylometry using Arabic-English text. In Chen SM, editor, 2011 IEEE International Conference on Fuzzy Systems. Vol. 1. Taipei, Taiwan: IEEE. 2011. p. 2039-2045 https://doi.org/10.1109/FUZZY.2011.6007535