Pedestrians occupy a leading position among the most vulnerable road users. Each year about 270,000 pedestrians die due to road accidents, so this study aims to highlight the most influencing contributory factors and the most promising models to predict pedestrian crash severity. ISTAT data for the City of Rome (2013–2020) are used and different Machine Learning Methods are trained and tested, after balancing the data with oversampling techniques. In addition, analysis of the most influencing contributory factor is carried out, by using the ROC curve method, Variable Importance Analysis (VIP), and Support Vector Machine with a Linear Kernel. The findings suggest that the model with the best prediction performance is the Random Forest, followed by the Decision Tree and k-nearest neighbour algorithm. Regarding the analysis of contributory factors, the methods implemented highlight that the hour in which the accident occurs, pedestrian gender, and age seem to be the most critical factors that increase the severity of a pedestrian crash. There are also some limitations in this study: the first is connected to the black-box nature of these models; the second regards how these variables could influence positively or negatively the outcome.
Cappelli, G., Nardoianni, S., D'Apuzzo, M., Nicolosi, V. (2026). Pedestrian crash severity prediction and contributory factors analysis by using machine learning methods. In Computational Science and Its Applications: ICCSA 2025 Workshops (pp.3-14). Cham : Springer [10.1007/978-3-031-97657-5_1].
Pedestrian crash severity prediction and contributory factors analysis by using machine learning methods
Cappelli, Giuseppe;Nicolosi, Vittorio
2026-01-01
Abstract
Pedestrians occupy a leading position among the most vulnerable road users. Each year about 270,000 pedestrians die due to road accidents, so this study aims to highlight the most influencing contributory factors and the most promising models to predict pedestrian crash severity. ISTAT data for the City of Rome (2013–2020) are used and different Machine Learning Methods are trained and tested, after balancing the data with oversampling techniques. In addition, analysis of the most influencing contributory factor is carried out, by using the ROC curve method, Variable Importance Analysis (VIP), and Support Vector Machine with a Linear Kernel. The findings suggest that the model with the best prediction performance is the Random Forest, followed by the Decision Tree and k-nearest neighbour algorithm. Regarding the analysis of contributory factors, the methods implemented highlight that the hour in which the accident occurs, pedestrian gender, and age seem to be the most critical factors that increase the severity of a pedestrian crash. There are also some limitations in this study: the first is connected to the black-box nature of these models; the second regards how these variables could influence positively or negatively the outcome.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


