In this paper a novel feature selection technique based on mutual dependency modelling between genes is proposed for multiclass microarray gene expression classification. Several studies on analysis of gene expression data has shown that the genes (whether or not they belong to the same gene group) get co-expressed via a variety of pathways. Further, a gene may participate in multiple pathways that may or may not be co-active for all samples. It is therefore biologically meaningful to simultaneously divide genes into functional groups and samples into co-active categories. This can be done by modeling gene profiles for multiclass microarray gene data sets based on mutual dependency models, which model complex gene interactions. Most of the current works in multiclass microarray gene expression studies are based on statistical models with little or no consideration of gene interactions. This has led to lack of robustness and overly optimistic estimates of accuracy and noise reduction. In this paper, we propose multivariate analysis techniques which model the mutual dependency between the features and take into account complex interactions for extracting a subset of genes. The two techniques, the cross modal factor analysis (CFA) and canonical correlation analysis(CCA) show a significant reduction in dimensionality and class-prediction error, and improvement in classification accuracy for multiclass microarray gene expression datasets.
|Title of host publication||Pattern Recognition in Bioinformatics|
|Editors||Visakan Kadirkamanathan, Guido Sanguinetti, Josselin Noirel|
|Place of Publication||New York, USA|
|Number of pages||10|
|Publication status||Published - 2009|