A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data

Aci, Cigdem InanMutlu, GizenOzen, MuratSarac, EsraUzel, Vahide Nida Kilic2026-02-272026-02-2720252079-929210.3390/electronics14173377http://dx.doi.org/10.3390/electronics14173377https://hdl.handle.net/20.500.14669/4414Predicting driver injury severity is critical for enhancing road safety, but it is complicated because fatal accidents inherently create class imbalance within datasets. This study conducts a comparative analysis of machine-learning (ML) and deep-learning (DL) models for multi-class driver injury severity prediction using a comprehensive dataset of 107,195 traffic accidents from the Adana, Mersin, and Antalya provinces in Turkey (2018-2023). To address the significant imbalance between fatal, injury, and non-injury classes, the hybrid SMOTE-ENN algorithm was employed for data balancing. Subsequently, feature selection techniques, including Relief-F, Extra Trees, and Recursive Feature Elimination (RFE), were utilized to identify the most influential predictors. Various ML models (K-Nearest Neighbors (KNN), XGBoost, Random Forest) and DL architectures (Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN)) were developed and rigorously evaluated. The findings demonstrate that traditional ML models, particularly KNN (0.95 accuracy, 0.95 F1-macro) and XGBoost (0.92 accuracy, 0.92 F1-macro), significantly outperformed DL models. The SMOTE-ENN technique proved effective in managing class imbalance, and RFE identified a critical 25-feature subset including driver fault, speed limit, and road conditions. This research highlights the efficacy of well-preprocessed ML approaches for tabular crash data, offering valuable insights for developing robust predictive tools to improve traffic safety outcomes.eninfo:eu-repo/semantics/openAccesstraffic accidentsdriver injury severitymachine learningdeep learningA Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash DataArticle1714WOS:001569642500001