Protein folding recognition is a complex problem in bioinformatics where different structures of proteins are extracted from a large amount of harvested data including functional and genetic features of proteins. The data generated consist of thousands of feature vectors with fewer protein sequences. In such a case, we need computational tools to analyze and extract useful information from the vast amount of raw data to predict the major biological functions of genes and proteins with respect to their structural behavior. In this chapter, we discuss the predictability of protein folds using a new hybrid approach for selecting features and classifying protein data using support vector machine (SVM) classifiers with quadratic discriminant analysis (QDA) and principal component analysis (PCA) as generative classifiers to enhance the performance and accuracy. In one of the applied methods, we reduced the data dimensionality by using data reduction algorithms such as PCA. We compare our results with previous results cited in the literature and show that use of an appropriate feature selection technique is promising and can result in a higher recognition ratio compared with other competing methods proposed in previous studies. However, new approaches are still needed, as the problem is complex and the results are far from satisfactory. After this introductory section, the chapter is organized as follows: In Sect. 17.1 we discuss the problem of protein fold prediction, protein database, and its extracted feature vectors. Section 17.2 describes feature selection and classification using SVM and fused hybrid classifiers, while Sect. 17.4 presents the experimental results. Section 17.5 discusses experimental results, including conclusions and future work.
|Title of host publication||Springer Handbook of Bio-/Neuro-Informatics|
|Place of Publication||Berlin, Germany|
|Number of pages||9|
|Publication status||Published - 2014|