Recursive feature elimination optimization using shapley additive explanations in software defect prediction with lightgbm classification

Main Article Content

Hartati Hartati

Abstract

Software defect refers to issues where the software does not function properly. The mistakes in the software development process are the reasons for software defects. Software defect prediction is performed to ensure the software is defect-free. Machine learning classification is used to classify defects in software. To improve the classification model, it is necessary to select the best features from the dataset. Recursive Feature Elimination (RFE) is a feature selection method. Shapley Additive Explanations (SHAP) is a method that can optimize feature selection algorithms to produce better results. In this research, the popular boosting algorithm LightGBM will be selected as a classifier to predict software defects. Meanwhile, RFE-SHAP will be used for feature selection to identify the best subset of features. The results and discussion show that RFE-SHAP feature selection slightly outperforms RFE, with average AUC values of 0.864 and 0.858, respectively. Moreover, RFE-SHAP produces more significant results in feature selection compared to RFE. The RFE feature selection T-Test results are Pvalue = 0.039 < α = 0.05 and tcount = 3.011 > ttable = 2.776. On the contrary, the RFE-SHAP feature selection T-Test results are Pvalue = 0.000 < α = 0.05 and tcount = 11.91 > ttable = 2.776.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
H. Hartati, “Recursive feature elimination optimization using shapley additive explanations in software defect prediction with lightgbm classification”, INFOTEL, vol. 17, no. 1, pp. 1-16, Feb. 2025.
Section
Informatics