Benchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis

Asal, Burcak; Yalciner, Burcu

Benchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis

Tarih

2026

Yazarlar

Asal, Burcak

Yalciner, Burcu

Yayıncı

IEEE-Inst Electrical Electronics Engineers Inc

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Software defect prediction (SDP) is essential for improving software quality and reliability. Traditional machine learning methods, while effective, often fail in capturing complex interactions among software metrics. Recently, specialized deep learning architectures designed for tabular data, including TabNet, Neural Oblivious Decision Ensembles (NODE), and FT-Transformer, have emerged, offering promising potential to enhance prediction accuracy and interpretability. This study comprehensively benchmarks the TabNet, NODE and FT-Transformer models on the challenging NASA JM1 dataset from the PROMISE repository. We address severe class imbalance using NearMiss undersampling and ensure hyperparameter optimization for fairness across comparisons. The performance of the models was evaluated using standard metrics: precision, recall, F1-score, and accuracy. In addition, the interpretability of the model was assessed using SHAP and LIME methods. The FT-Transformer and NODE models demonstrated superior performance, achieving 88% accuracy compared to the accuracy of TabNet 86%. FT-Transformer showed exceptional precision (99%) for defect detection, emphasizing its low false-positive rate. SHAP and LIME analyzes revealed unique attention patterns for each model, highlighting differences in feature importance and decision-making processes. FT-Transformer and NODE outperform TabNet in accuracy and balance between recall and precision. Interpretability analysis provides actionable insights into feature importance, enabling better decision-making in practical SDP workflows.

Anahtar Kelimeler

Software, Measurement, Predictive models, Codes, Feature extraction, Deep learning, Benchmark testing, Accuracy, Machine learning, Biological system modeling, Class imbalance, deep learning, explainable AI, FT-Transformer, LIME, NODE, PROMISE dataset, SHAP, software defect prediction, TabNet, tabular data

Kaynak

IEEEAccess

Cilt

14

Bağlantı

http://dx.doi.org/10.1109/ACCESS.2026.3656247
https://hdl.handle.net/20.500.14669/4662

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Benchmarking TabNet, NODE, and FT-Transformer for Software Defect Prediction: An Empirical Comparison and Explainability Analysis

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon