Main Article Content
Abstract
Software defect prediction (SDP) is one of the most critical aspects of software quality improvement and efficient use of testing resources. Traditional machine learning models tend to lack both generalizability and performance, especially when faced with imbalanced or small datasets. To overcome these limitations, the current research proposed a stacked ensemble learning model that combines Random Forest, Gradient Boosting, and AdaBoost as base learners, and logistic regression as a meta-learner. A selected collection of 500 software modules was sampled out of four benchmark repositories: CM1, PC1, JM1, and KC1. Stratified sampling, Min-Max normalization, SMOTE-based class balancing, feature selection via Recursive Feature Elimination (RFE), and mutual information ranking were used as preprocessing steps. The training of the models used 10-fold cross-validation, and hyperparameter optimization was done using Grid Search. The findings showed that the stacked ensemble performed better than any single classifier on all measures, with the highest accuracy of 0.88 and statistically significant improvements in precision, recall, and F1-score (p < 0.05). Data balancing and feature selection methods also increased model stability and interpretability. In summary, the suggested framework will provide a powerful, scalable, and resource-optimal system to predict software defects. This method can be replicated in future studies on larger datasets and with deep learning–based meta-models to improve adaptability. Its integration of Recursive Feature Elimination and mutual-information feature ranking within an optimized stacking design, applied to NASA repositories for the first time, demonstrates measurable improvements in generalization and robustness.
Keywords
Article Details
References
- A. Alazba and H. Aljamaan, “Software defect prediction using stacking generalization of optimized tree-based ensembles,” Applied Sciences, vol. 12, no. 9, p. 4577, Apr. 2022, doi: 10.3390/app12094577.
- M. Ali, T. Mazhar, A. Al-Rasheed, T. Shahzad, Y. Y. Ghadi, and M. A. Khan, “Enhancing software defect prediction: A framework with improved feature selection and ensemble machine learning,” PeerJ Computer Science, vol. 10, p. e1860, Feb. 2024, doi: 10.7717/peerj-cs.1860.
- M. Ali, T. Mazhar, Y. Arif, S. Al-Otaibi, Y. Y. Ghadi, T. Shahzad, M. A. Khan, and H. Hamam, “Software defect prediction using an intelligent ensemble-based model,” IEEE Access, vol. 12, pp. 20376–20395, Jan. 2024, doi: 10.1109/ACCESS.2024.3358201.
- U. Ali, S. Aftab, A. Iqbal, Z. Nawaz, M. S. Bashir, and M. A. Saeed, “Software defect prediction using variant-based ensemble learning and feature selection techniques,” International Journal of Modern Education and Computer Science, vol. 13, no. 5, pp. 29–39, Oct. 2020, doi: 10.5815/ijmecs.2020.05.03.
- A. B. Farid, E. M. Fathy, A. S. Eldin, and L. A. Abd-Elmegid, “Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM),” PeerJ Computer Science, vol. 7, p. e739, Nov. 2021, doi: 10.7717/peerj-cs.739.
- G. Giray, K. E. Bennin, Ö. Köksal, Ö. Babur, and B. Tekinerdogan, “On the use of deep learning in software defect prediction,” Journal of Systems and Software, vol. 195, p. 111537, Jan. 2023, doi: 10.1016/j.jss.2022.111537.
- H. Aljamaan and A. Alazba, “Software defect prediction using tree-based ensembles,” in Proc. 16th ACM Int. Conf. Predictive Models and Data Analytics in Software Engineering, Nov. 2020, pp. 1–10, doi: 10.1145/3416508.3417114.
- A. O. Balogun, A. O. Bajeh, V. A. Orie, and W. A. Yusuf-Asaju, “Software defect prediction using ensemble learning: An ANP-based evaluation method,” FUOYE Journal of Engineering and Technology, vol. 3, no. 2, pp. 50–55, Sep. 2018, doi: 10.46792/fuoyejet.v3i2.200.
- A. Iqbal, S. Aftab, U. Ali, Z. Nawaz, L. Sana, M. Ahmad, and A. Husen, “Performance analysis of machine learning techniques on software defect prediction using NASA datasets,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 5, pp. 1–9, 2019, doi: 10.14569/IJACSA.2019.0100538.
- M. A. Khan, N. S. Elmitwally, S. Abbas, S. Aftab, M. Ahmad, M. Fayaz, and F. Khan, “Software defect prediction using artificial neural networks: A systematic literature review,” Scientific Programming, vol. 2022, no. 1, pp. 1–21, 2022, doi: 10.1155/2022/2117339.
- I. Mehmood, S. Shahid, H. Hussain, I. Khan, S. Ahmad, S. Rahman, N. Ullah, and S. Huda, “A novel approach to improve software defect prediction accuracy using machine learning,” IEEE Access, vol. 11, pp. 63579–63597, Jun. 2023, doi: 10.1109/ACCESS.2023.3287326.
- N. A. Khleel and K. Nehéz, “A novel approach for software defect prediction using CNN and GRU based on SMOTE-Tomek method,” Journal of Intelligent Information Systems, vol. 60, no. 3, pp. 673–707, Jun. 2023, doi: 10.1007/s10844-023-00793-1.
- S. Goyal, “Heterogeneous stacked ensemble classifier for software defect prediction,” in Proc. 6th Int. Conf. Parallel, Distributed and Grid Computing (PDGC), Nov. 2020, pp. 126–130, doi: 10.1109/PDGC50313.2020.9315754.
- M. Cetiner and O. K. Sahingoz, “A comparative analysis for machine learning based software defect prediction systems,” in Proc. 11th Int. Conf. Computing, Communication and Networking Technologies (ICCCNT), Jul. 2020, pp. 1–7, doi: 10.1109/ICCCNT49239.2020.9225352.
- A. Iqbal, S. Aftab, I. Ullah, M. S. Bashir, and M. A. Saeed, “A feature selection based ensemble classification framework for software defect prediction,” International Journal of Modern Education and Computer Science, vol. 11, no. 9, pp. 54–63, Sep. 2019, doi: 10.5815/ijmecs.2019.09.06.
- A. Khalid, G. Badshah, N. Ayub, M. Shiraz, and M. Ghouse, “Software defect prediction analysis using machine learning techniques,” Sustainability, vol. 15, no. 6, p. 5517, Mar. 2023, doi: 10.3390/su15065517.
- S. S. Rathore and S. Kumar, “An empirical study of ensemble techniques for software fault prediction,” Applied Intelligence, vol. 51, pp. 3615–3644, Jun. 2021, doi: 10.1007/s10489-020-01935-6.
- T. Sharma, A. Jatain, S. Bhaskar, and K. Pabreja, “Ensemble machine learning paradigms in software defect prediction,” Procedia Computer Science, vol. 218, pp. 199–209, Jan. 2023, doi: 10.1016/j.procs.2023.01.002.
- Y. Tang, Q. Dai, M. Yang, T. Du, and L. Chen, “Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm,” International Journal of Machine Learning and Cybernetics, vol. 14, no. 6, pp. 1967–1987, Jun. 2023, doi: 10.1007/s13042-022-01740-2.
- L. Qiao, X. Li, Q. Umer, and P. Guo, “Deep learning based software defect prediction,” Neurocomputing, vol. 385, pp. 100–110, Apr. 2020, doi: 10.1016/j.neucom.2019.11.067.
- T. Zhou, X. Sun, X. Xia, B. Li, and X. Chen, “Improving defect prediction with deep forest,” Information and Software Technology, vol. 114, pp. 204–216, Oct. 2019, doi: 10.1016/j.infsof.2019.07.003.
- J. Pachouly, S. Ahirrao, K. Kotecha, G. Selvachandran, and A. Abraham, “A systematic literature review on software defect prediction using artificial intelligence: Datasets, data validation methods, approaches, and tools,” Engineering Applications of Artificial Intelligence, vol. 111, p. 104773, May 2022, doi: 10.1016/j.engappai.2022.104773.
- Z. M. Zain, S. Sakri, and N. H. Ismail, “Application of deep learning in software defect prediction: Systematic literature review and meta-analysis,” Information and Software Technology, vol. 158, p. 107175, Jun. 2023, doi: 10.1016/j.infsof.2023.107175.
- I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Information and Software Technology, vol. 58, pp. 388–402, Feb. 2015, doi: 10.1016/j.infsof.2014.07.005.
- S. Stradowski and L. Madeyski, “Industrial applications of software defect prediction using machine learning: A business-driven systematic literature review,” Information and Software Technology, vol. 159, p. 107192, Jul. 2023, doi: 10.1016/j.infsof.2023.107192.
- S. Mehta and K. S. Patnaik, “Improved prediction of software defects using ensemble machine learning techniques,” Neural Computing and Applications, vol. 33, no. 16, pp. 10551–10562, Aug. 2021, doi: 10.1007/s00521-021-05811-3.
- M. Nevendra and P. Singh, “Empirical investigation of hyperparameter optimization for software defect count prediction,” Expert Systems with Applications, vol. 191, p. 116217, Apr. 2022, doi: 10.1016/j.eswa.2021.116217.
- C. L. Prabha and N. Shivakumar, “Software defect prediction using machine learning techniques,” in Proc. 4th Int. Conf. Trends in Electronics and Informatics (ICOEI), Jun. 2020, pp. 728–733, doi: 10.1109/ICOEI48184.2020.9142909.
- T. Siddiqui and M. Mustaqeem, “Performance evaluation of software defect prediction with NASA dataset using machine learning techniques,” International Journal of Information Technology, vol. 15, no. 8, pp. 4131–4139, Dec. 2023, doi: 10.1007/s41870-023-01528-9.
References
A. Alazba and H. Aljamaan, “Software defect prediction using stacking generalization of optimized tree-based ensembles,” Applied Sciences, vol. 12, no. 9, p. 4577, Apr. 2022, doi: 10.3390/app12094577.
M. Ali, T. Mazhar, A. Al-Rasheed, T. Shahzad, Y. Y. Ghadi, and M. A. Khan, “Enhancing software defect prediction: A framework with improved feature selection and ensemble machine learning,” PeerJ Computer Science, vol. 10, p. e1860, Feb. 2024, doi: 10.7717/peerj-cs.1860.
M. Ali, T. Mazhar, Y. Arif, S. Al-Otaibi, Y. Y. Ghadi, T. Shahzad, M. A. Khan, and H. Hamam, “Software defect prediction using an intelligent ensemble-based model,” IEEE Access, vol. 12, pp. 20376–20395, Jan. 2024, doi: 10.1109/ACCESS.2024.3358201.
U. Ali, S. Aftab, A. Iqbal, Z. Nawaz, M. S. Bashir, and M. A. Saeed, “Software defect prediction using variant-based ensemble learning and feature selection techniques,” International Journal of Modern Education and Computer Science, vol. 13, no. 5, pp. 29–39, Oct. 2020, doi: 10.5815/ijmecs.2020.05.03.
A. B. Farid, E. M. Fathy, A. S. Eldin, and L. A. Abd-Elmegid, “Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM),” PeerJ Computer Science, vol. 7, p. e739, Nov. 2021, doi: 10.7717/peerj-cs.739.
G. Giray, K. E. Bennin, Ö. Köksal, Ö. Babur, and B. Tekinerdogan, “On the use of deep learning in software defect prediction,” Journal of Systems and Software, vol. 195, p. 111537, Jan. 2023, doi: 10.1016/j.jss.2022.111537.
H. Aljamaan and A. Alazba, “Software defect prediction using tree-based ensembles,” in Proc. 16th ACM Int. Conf. Predictive Models and Data Analytics in Software Engineering, Nov. 2020, pp. 1–10, doi: 10.1145/3416508.3417114.
A. O. Balogun, A. O. Bajeh, V. A. Orie, and W. A. Yusuf-Asaju, “Software defect prediction using ensemble learning: An ANP-based evaluation method,” FUOYE Journal of Engineering and Technology, vol. 3, no. 2, pp. 50–55, Sep. 2018, doi: 10.46792/fuoyejet.v3i2.200.
A. Iqbal, S. Aftab, U. Ali, Z. Nawaz, L. Sana, M. Ahmad, and A. Husen, “Performance analysis of machine learning techniques on software defect prediction using NASA datasets,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 5, pp. 1–9, 2019, doi: 10.14569/IJACSA.2019.0100538.
M. A. Khan, N. S. Elmitwally, S. Abbas, S. Aftab, M. Ahmad, M. Fayaz, and F. Khan, “Software defect prediction using artificial neural networks: A systematic literature review,” Scientific Programming, vol. 2022, no. 1, pp. 1–21, 2022, doi: 10.1155/2022/2117339.
I. Mehmood, S. Shahid, H. Hussain, I. Khan, S. Ahmad, S. Rahman, N. Ullah, and S. Huda, “A novel approach to improve software defect prediction accuracy using machine learning,” IEEE Access, vol. 11, pp. 63579–63597, Jun. 2023, doi: 10.1109/ACCESS.2023.3287326.
N. A. Khleel and K. Nehéz, “A novel approach for software defect prediction using CNN and GRU based on SMOTE-Tomek method,” Journal of Intelligent Information Systems, vol. 60, no. 3, pp. 673–707, Jun. 2023, doi: 10.1007/s10844-023-00793-1.
S. Goyal, “Heterogeneous stacked ensemble classifier for software defect prediction,” in Proc. 6th Int. Conf. Parallel, Distributed and Grid Computing (PDGC), Nov. 2020, pp. 126–130, doi: 10.1109/PDGC50313.2020.9315754.
M. Cetiner and O. K. Sahingoz, “A comparative analysis for machine learning based software defect prediction systems,” in Proc. 11th Int. Conf. Computing, Communication and Networking Technologies (ICCCNT), Jul. 2020, pp. 1–7, doi: 10.1109/ICCCNT49239.2020.9225352.
A. Iqbal, S. Aftab, I. Ullah, M. S. Bashir, and M. A. Saeed, “A feature selection based ensemble classification framework for software defect prediction,” International Journal of Modern Education and Computer Science, vol. 11, no. 9, pp. 54–63, Sep. 2019, doi: 10.5815/ijmecs.2019.09.06.
A. Khalid, G. Badshah, N. Ayub, M. Shiraz, and M. Ghouse, “Software defect prediction analysis using machine learning techniques,” Sustainability, vol. 15, no. 6, p. 5517, Mar. 2023, doi: 10.3390/su15065517.
S. S. Rathore and S. Kumar, “An empirical study of ensemble techniques for software fault prediction,” Applied Intelligence, vol. 51, pp. 3615–3644, Jun. 2021, doi: 10.1007/s10489-020-01935-6.
T. Sharma, A. Jatain, S. Bhaskar, and K. Pabreja, “Ensemble machine learning paradigms in software defect prediction,” Procedia Computer Science, vol. 218, pp. 199–209, Jan. 2023, doi: 10.1016/j.procs.2023.01.002.
Y. Tang, Q. Dai, M. Yang, T. Du, and L. Chen, “Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm,” International Journal of Machine Learning and Cybernetics, vol. 14, no. 6, pp. 1967–1987, Jun. 2023, doi: 10.1007/s13042-022-01740-2.
L. Qiao, X. Li, Q. Umer, and P. Guo, “Deep learning based software defect prediction,” Neurocomputing, vol. 385, pp. 100–110, Apr. 2020, doi: 10.1016/j.neucom.2019.11.067.
T. Zhou, X. Sun, X. Xia, B. Li, and X. Chen, “Improving defect prediction with deep forest,” Information and Software Technology, vol. 114, pp. 204–216, Oct. 2019, doi: 10.1016/j.infsof.2019.07.003.
J. Pachouly, S. Ahirrao, K. Kotecha, G. Selvachandran, and A. Abraham, “A systematic literature review on software defect prediction using artificial intelligence: Datasets, data validation methods, approaches, and tools,” Engineering Applications of Artificial Intelligence, vol. 111, p. 104773, May 2022, doi: 10.1016/j.engappai.2022.104773.
Z. M. Zain, S. Sakri, and N. H. Ismail, “Application of deep learning in software defect prediction: Systematic literature review and meta-analysis,” Information and Software Technology, vol. 158, p. 107175, Jun. 2023, doi: 10.1016/j.infsof.2023.107175.
I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Information and Software Technology, vol. 58, pp. 388–402, Feb. 2015, doi: 10.1016/j.infsof.2014.07.005.
S. Stradowski and L. Madeyski, “Industrial applications of software defect prediction using machine learning: A business-driven systematic literature review,” Information and Software Technology, vol. 159, p. 107192, Jul. 2023, doi: 10.1016/j.infsof.2023.107192.
S. Mehta and K. S. Patnaik, “Improved prediction of software defects using ensemble machine learning techniques,” Neural Computing and Applications, vol. 33, no. 16, pp. 10551–10562, Aug. 2021, doi: 10.1007/s00521-021-05811-3.
M. Nevendra and P. Singh, “Empirical investigation of hyperparameter optimization for software defect count prediction,” Expert Systems with Applications, vol. 191, p. 116217, Apr. 2022, doi: 10.1016/j.eswa.2021.116217.
C. L. Prabha and N. Shivakumar, “Software defect prediction using machine learning techniques,” in Proc. 4th Int. Conf. Trends in Electronics and Informatics (ICOEI), Jun. 2020, pp. 728–733, doi: 10.1109/ICOEI48184.2020.9142909.
T. Siddiqui and M. Mustaqeem, “Performance evaluation of software defect prediction with NASA dataset using machine learning techniques,” International Journal of Information Technology, vol. 15, no. 8, pp. 4131–4139, Dec. 2023, doi: 10.1007/s41870-023-01528-9.