Benchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Software defect prediction (SDP) is essential for improving software quality and reliability. Traditional machine learning methods, while effective, often fail in capturing complex interactions among software metrics. Recently, specialized deep learning architectures designed for tabular data, including TabNet, Neural Oblivious Decision Ensembles (NODE), and FT-Transformer, have emerged, offering promising potential to enhance prediction accuracy and interpretability. This study comprehensively benchmarks the TabNet, NODE and FT-Transformer models on the challenging NASA JM1 dataset from the PROMISE repository. We address severe class imbalance using NearMiss undersampling and ensure hyperparameter optimization for fairness across comparisons. The performance of the models was evaluated using standard metrics: precision, recall, F1-score, and accuracy. In addition, the interpretability of the model was assessed using SHAP and LIME methods. The FT-Transformer and NODE models demonstrated superior performance, achieving 88% accuracy compared to the accuracy of TabNet 86%. FT-Transformer showed exceptional precision (99%) for defect detection, emphasizing its low false-positive rate. SHAP and LIME analyzes revealed unique attention patterns for each model, highlighting differences in feature importance and decision-making processes. FT-Transformer and NODE outperform TabNet in accuracy and balance between recall and precision. Interpretability analysis provides actionable insights into feature importance, enabling better decision-making in practical SDP workflows.









