Benchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis

dc.contributor.authorAsal, Burcak
dc.contributor.authorYalciner, Burcu
dc.date.accessioned2026-02-27T07:33:40Z
dc.date.available2026-02-27T07:33:40Z
dc.date.issued2026
dc.description.abstractSoftware defect prediction (SDP) is essential for improving software quality and reliability. Traditional machine learning methods, while effective, often fail in capturing complex interactions among software metrics. Recently, specialized deep learning architectures designed for tabular data, including TabNet, Neural Oblivious Decision Ensembles (NODE), and FT-Transformer, have emerged, offering promising potential to enhance prediction accuracy and interpretability. This study comprehensively benchmarks the TabNet, NODE and FT-Transformer models on the challenging NASA JM1 dataset from the PROMISE repository. We address severe class imbalance using NearMiss undersampling and ensure hyperparameter optimization for fairness across comparisons. The performance of the models was evaluated using standard metrics: precision, recall, F1-score, and accuracy. In addition, the interpretability of the model was assessed using SHAP and LIME methods. The FT-Transformer and NODE models demonstrated superior performance, achieving 88% accuracy compared to the accuracy of TabNet 86%. FT-Transformer showed exceptional precision (99%) for defect detection, emphasizing its low false-positive rate. SHAP and LIME analyzes revealed unique attention patterns for each model, highlighting differences in feature importance and decision-making processes. FT-Transformer and NODE outperform TabNet in accuracy and balance between recall and precision. Interpretability analysis provides actionable insights into feature importance, enabling better decision-making in practical SDP workflows.
dc.identifier.doi10.1109/ACCESS.2026.3656247
dc.identifier.endpage11681
dc.identifier.issn2169-3536
dc.identifier.startpage11660
dc.identifier.urihttp://dx.doi.org/10.1109/ACCESS.2026.3656247
dc.identifier.urihttps://hdl.handle.net/20.500.14669/4662
dc.identifier.volume14
dc.identifier.wosWOS:001673759200007
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherIEEE-Inst Electrical Electronics Engineers Inc
dc.relation.ispartofIEEEAccess
dc.relation.publicationcategoryMakale - Uluslararas� Hakemli Dergi - Kurum ��retim Eleman�
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20260302
dc.subjectSoftware
dc.subjectMeasurement
dc.subjectPredictive models
dc.subjectCodes
dc.subjectFeature extraction
dc.subjectDeep learning
dc.subjectBenchmark testing
dc.subjectAccuracy
dc.subjectMachine learning
dc.subjectBiological system modeling
dc.subjectClass imbalance
dc.subjectdeep learning
dc.subjectexplainable AI
dc.subjectFT-Transformer
dc.subjectLIME
dc.subjectNODE
dc.subjectPROMISE dataset
dc.subjectSHAP
dc.subjectsoftware defect prediction
dc.subjectTabNet
dc.subjecttabular data
dc.titleBenchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis
dc.typeArticle

Dosyalar