Pedestrian crashes are a serious public health and economic issue, and analyzing the main contributing factors that lead to a fatal outcome could be an optimal strategy in a proactive approach to safety. According to the current literature, primarily econometric and Machine Learning Methods can predict crash severity. By analyzing pedestrian crash data from the city of Rome, Italy, this study presents the training and testing of five different models: Logistic Regression (LR), K-Nearest Neighbour (KNN), Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) with a radial kernel. To check for model stability and generalization, thirty (30) random samples are created, and prediction performances are evaluated by comparing the mean value of the F1-score. Gini Variable Importance and SHAP Analysis are also performed on the best model according to the F1-score, which has been identified in the Random Forest Model (0.9816). Pedestrian Age, Gender, and Behaviour, Hour of the day, Season of the year, Vehicle Type, and Location emerge as the most important contributing factors.
Cappelli, G., Nardoianni, S., D'Apuzzo, M., Nicolosi, V. (2025). Analysis of contributing factors influencing pedestrian crash severity: a case study in Rome, Italy. In TRANSCODE 2025 (pp.798-805). Amsterdam : Elsevier [10.1016/j.trpro.2025.10.102].
Analysis of contributing factors influencing pedestrian crash severity: a case study in Rome, Italy
Cappelli, Giuseppe;Nicolosi, Vittorio
2025-01-01
Abstract
Pedestrian crashes are a serious public health and economic issue, and analyzing the main contributing factors that lead to a fatal outcome could be an optimal strategy in a proactive approach to safety. According to the current literature, primarily econometric and Machine Learning Methods can predict crash severity. By analyzing pedestrian crash data from the city of Rome, Italy, this study presents the training and testing of five different models: Logistic Regression (LR), K-Nearest Neighbour (KNN), Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) with a radial kernel. To check for model stability and generalization, thirty (30) random samples are created, and prediction performances are evaluated by comparing the mean value of the F1-score. Gini Variable Importance and SHAP Analysis are also performed on the best model according to the F1-score, which has been identified in the Random Forest Model (0.9816). Pedestrian Age, Gender, and Behaviour, Hour of the day, Season of the year, Vehicle Type, and Location emerge as the most important contributing factors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


