A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data

dc.contributor.authorAci, Cigdem Inan
dc.contributor.authorMutlu, Gizen
dc.contributor.authorOzen, Murat
dc.contributor.authorSarac, Esra
dc.contributor.authorUzel, Vahide Nida Kilic
dc.date.accessioned2026-02-27T07:33:00Z
dc.date.available2026-02-27T07:33:00Z
dc.date.issued2025
dc.description.abstractPredicting driver injury severity is critical for enhancing road safety, but it is complicated because fatal accidents inherently create class imbalance within datasets. This study conducts a comparative analysis of machine-learning (ML) and deep-learning (DL) models for multi-class driver injury severity prediction using a comprehensive dataset of 107,195 traffic accidents from the Adana, Mersin, and Antalya provinces in Turkey (2018-2023). To address the significant imbalance between fatal, injury, and non-injury classes, the hybrid SMOTE-ENN algorithm was employed for data balancing. Subsequently, feature selection techniques, including Relief-F, Extra Trees, and Recursive Feature Elimination (RFE), were utilized to identify the most influential predictors. Various ML models (K-Nearest Neighbors (KNN), XGBoost, Random Forest) and DL architectures (Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN)) were developed and rigorously evaluated. The findings demonstrate that traditional ML models, particularly KNN (0.95 accuracy, 0.95 F1-macro) and XGBoost (0.92 accuracy, 0.92 F1-macro), significantly outperformed DL models. The SMOTE-ENN technique proved effective in managing class imbalance, and RFE identified a critical 25-feature subset including driver fault, speed limit, and road conditions. This research highlights the efficacy of well-preprocessed ML approaches for tabular crash data, offering valuable insights for developing robust predictive tools to improve traffic safety outcomes.
dc.description.sponsorshipScientific and Technological Research Council of Trkiye (TUBITAK) [123E601]
dc.description.sponsorshipThis work was supported by the Scientific and Technological Research Council of Tuerkiye (TUBITAK) within the 1001-Technological Research Projects Support Program (Grant No: 123E601).
dc.identifier.doi10.3390/electronics14173377
dc.identifier.issn2079-9292
dc.identifier.issue17
dc.identifier.urihttp://dx.doi.org/10.3390/electronics14173377
dc.identifier.urihttps://hdl.handle.net/20.500.14669/4414
dc.identifier.volume14
dc.identifier.wosWOS:001569642500001
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherMDPI
dc.relation.ispartofElectronics
dc.relation.publicationcategoryMakale - Uluslararas� Hakemli Dergi - Kurum ��retim Eleman�
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_20260302
dc.subjecttraffic accidents
dc.subjectdriver injury severity
dc.subjectmachine learning
dc.subjectdeep learning
dc.titleA Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data
dc.typeArticle

Dosyalar