A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data
| dc.contributor.author | Aci, Cigdem Inan | |
| dc.contributor.author | Mutlu, Gizen | |
| dc.contributor.author | Ozen, Murat | |
| dc.contributor.author | Sarac, Esra | |
| dc.contributor.author | Uzel, Vahide Nida Kilic | |
| dc.date.accessioned | 2026-02-27T07:33:00Z | |
| dc.date.available | 2026-02-27T07:33:00Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Predicting driver injury severity is critical for enhancing road safety, but it is complicated because fatal accidents inherently create class imbalance within datasets. This study conducts a comparative analysis of machine-learning (ML) and deep-learning (DL) models for multi-class driver injury severity prediction using a comprehensive dataset of 107,195 traffic accidents from the Adana, Mersin, and Antalya provinces in Turkey (2018-2023). To address the significant imbalance between fatal, injury, and non-injury classes, the hybrid SMOTE-ENN algorithm was employed for data balancing. Subsequently, feature selection techniques, including Relief-F, Extra Trees, and Recursive Feature Elimination (RFE), were utilized to identify the most influential predictors. Various ML models (K-Nearest Neighbors (KNN), XGBoost, Random Forest) and DL architectures (Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN)) were developed and rigorously evaluated. The findings demonstrate that traditional ML models, particularly KNN (0.95 accuracy, 0.95 F1-macro) and XGBoost (0.92 accuracy, 0.92 F1-macro), significantly outperformed DL models. The SMOTE-ENN technique proved effective in managing class imbalance, and RFE identified a critical 25-feature subset including driver fault, speed limit, and road conditions. This research highlights the efficacy of well-preprocessed ML approaches for tabular crash data, offering valuable insights for developing robust predictive tools to improve traffic safety outcomes. | |
| dc.description.sponsorship | Scientific and Technological Research Council of Trkiye (TUBITAK) [123E601] | |
| dc.description.sponsorship | This work was supported by the Scientific and Technological Research Council of Tuerkiye (TUBITAK) within the 1001-Technological Research Projects Support Program (Grant No: 123E601). | |
| dc.identifier.doi | 10.3390/electronics14173377 | |
| dc.identifier.issn | 2079-9292 | |
| dc.identifier.issue | 17 | |
| dc.identifier.uri | http://dx.doi.org/10.3390/electronics14173377 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14669/4414 | |
| dc.identifier.volume | 14 | |
| dc.identifier.wos | WOS:001569642500001 | |
| dc.indekslendigikaynak | Web of Science | |
| dc.language.iso | en | |
| dc.publisher | MDPI | |
| dc.relation.ispartof | Electronics | |
| dc.relation.publicationcategory | Makale - Uluslararas� Hakemli Dergi - Kurum ��retim Eleman� | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.snmz | KA_20260302 | |
| dc.subject | traffic accidents | |
| dc.subject | driver injury severity | |
| dc.subject | machine learning | |
| dc.subject | deep learning | |
| dc.title | A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data | |
| dc.type | Article |









