A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Predicting driver injury severity is critical for enhancing road safety, but it is complicated because fatal accidents inherently create class imbalance within datasets. This study conducts a comparative analysis of machine-learning (ML) and deep-learning (DL) models for multi-class driver injury severity prediction using a comprehensive dataset of 107,195 traffic accidents from the Adana, Mersin, and Antalya provinces in Turkey (2018-2023). To address the significant imbalance between fatal, injury, and non-injury classes, the hybrid SMOTE-ENN algorithm was employed for data balancing. Subsequently, feature selection techniques, including Relief-F, Extra Trees, and Recursive Feature Elimination (RFE), were utilized to identify the most influential predictors. Various ML models (K-Nearest Neighbors (KNN), XGBoost, Random Forest) and DL architectures (Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN)) were developed and rigorously evaluated. The findings demonstrate that traditional ML models, particularly KNN (0.95 accuracy, 0.95 F1-macro) and XGBoost (0.92 accuracy, 0.92 F1-macro), significantly outperformed DL models. The SMOTE-ENN technique proved effective in managing class imbalance, and RFE identified a critical 25-feature subset including driver fault, speed limit, and road conditions. This research highlights the efficacy of well-preprocessed ML approaches for tabular crash data, offering valuable insights for developing robust predictive tools to improve traffic safety outcomes.









