TY - JOUR
T1 - An analytical code quality methodology using Latent Dirichlet Allocation and Convolutional Neural Networks
AU - Sorour, Shaymaa E.
AU - Abdelkader, Hanan E.
AU - Sallam, Karam M.
AU - Chakrabortty, Ripon K.
AU - Ryan, Michael J.
AU - Abohany, Amr
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2022/9
Y1 - 2022/9
N2 - Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon's non-parametric statistical test (α=0.05).
AB - Recently, Code Quality (CQ) has become critical in a wide range of organizations and in many areas from academia to industry. CQ, in terms of readability, security, and testability, is a major goal throughout the software development process because it affects overall Software Quality (SQ) in terms of subsequent releases, maintenance, and updates. It is particularly important for the development of safety critical systems. Existing studies on CQ have several shortcomings in that they are based on incomplete information about the source code, and tend to focus on only one feature, which is likely to determine the performance of the model. Moreover, these considerations often limit obtaining high accuracy because there is no strong relationship between the input data and the output data. Thus, it is necessary to design an effective and efficient SQ measurement system for measuring multiple quality factors. To that end, we propose a deep learning framework that employed a Latent Dirichlet Allocation (LDA) with Convolutional Neural Networks (CNN), called CNN-LDA, to classify input data into topics that are related to CQ features and to identify hidden patterns and correlations in programming data. Three SQ metrics (i.e., readability, security, and testability) and machine learning techniques (e.g., random forest (RF) and support vector machine (SVM)) are taken into account to validate the proposed model. The proposed CNN-LDA outperformed its peers across the vast majority of datasets examined. The average overall F-measure for readability, security, and testability are 94%,94% and 93%. The average overall accuracy for readability, security, and testability are 93%,93% and 92%. The superiority of LDA-CNN over the other classifiers was very clear based on a Wilcoxon's non-parametric statistical test (α=0.05).
KW - Classification
KW - Code Quality (CQ)
KW - Convolutional Neural Networks (CNN)
KW - Deep Learning (DL)
KW - Latent Dirichlet Allocation (LDA)
UR - http://www.scopus.com/inward/record.url?scp=85124175866&partnerID=8YFLogxK
U2 - 10.1016/j.jksuci.2022.01.013
DO - 10.1016/j.jksuci.2022.01.013
M3 - Article
AN - SCOPUS:85124175866
SN - 1319-1578
VL - 34
SP - 5979
EP - 5997
JO - Journal of King Saud University - Computer and Information Sciences
JF - Journal of King Saud University - Computer and Information Sciences
IS - 8
ER -