Building a computer system that has extraordinary capacity to understand human language has become one of the greatest challenges. One of the barriers in building such a system is to construct a semantic interpretation of human language. Many prior approaches, which have been proposed to address the issue, can be categorized into knowledge-based approaches and content-based approaches. While knowledge-based approaches utilise human-crafted knowledge repositories to con- struct semantic interpretation, content-based approaches analyse a large amount of unstructured text data available. Although knowledge-based approaches produce out- standing performance compared to content-based approaches on semantic interpretation as well as semantic distance, they are limited themselves within in particular text knowledge domains. On the other hand, despite content-based approaches have the ability to process unstructured text data on different domains and languages, there are certain aspects that are worthy of considerations. First, most of the prior content-based approaches popularly use a single type of feature aspects to construct word meaning representation. This raises a concern how multiple feature aspects can be used to model meaning representation. Secondly, experiments of the content-based approaches using various sets of features were undertaken, their performances tested on the task of measuring semantic distance are still under expectation. The main focus of this research is to propose new content-based approaches for semantic interpretation of human language. By undertaking semantic analysis on large amount of unstructured text available, our proposed methods have presented new sets of features for semantic interpretation. On the one hand, multiple aspect sets of features have been proposed, which help to cover semantic meanings of words in different angles. On the other hands, feature transformations and combinations have been proposed to not only reduce size of feature dimensions, but also encourage the interaction between different aspects of word features. The contributions of this research are listed as the followings: { Relational Feature Analysis For Semantic Interpretation : this approach con- structs representing features of a word by considering its relations extracted from the word's local contexts as well as the hidden aspects built from the sets of relations. The effectiveness of generated features is evaluated based on the task of measuring semantic distance. Experimental results have demonstrated the promising capacity of the relation-based features in modelling word meaning compared to traditional context-based features when tested on the same benchmarks. { Conceptual Topic Analysis For Semantic Distance: this approach introduces a new way to construct a semantic prole of word meanings and measuring se- mantic distance by using topical clues from surrounding contexts to characterise meanings of a word. With the experiment on various standard benchmarks, the method demonstrates outstanding performance compared to related methods using topical information. { Multi-way Feature Analysis for Semantic Interpretation: this approach proposes a tensor-based technique for semantic interpretation by building meaning representation of words directly from text and does not require pre-existing linguistic knowledge. Taking in to account structural information such as word order and syntactic information, the method that utilises tensor analysis to build representation of word meaning. This content-based model demonstrates significantly improved performance when compared to a robust baseline model on a number of semantic distance measures. The success of semantic interpretation of words contributes for building a reliable metric for semantic distance, which involves in most tasks of natural language processing and understanding.
Computation on meanings : content-based feature analysis for semantic interpretation
Huynh, D. T. (Author). 2015
Student thesis: Doctoral Thesis