Road traffic injuries are among the ten leading causes of death worldwide and they are the main cause of death among young adults aged 15-29 years . In addition to the tragic impact in human lives, road traffic injuries have a significant effect on the world’s economy. One study estimates that road traffic injuries will cost the world economy US$1.8 trillion in the period 2015-30 . The European Union estimates the impact represents 2% of their GDP . These costs include property damage costs, insurance administration costs, hospital costs, and productivity losses, among others .
Because of these enormous impacts, governments and the private sector are making big efforts to reduce these numbers. Today, as a result of these efforts, we have GPS-based systems that provide real-time information on traffic and weather conditions, with governments providing information such as accident hotspots (a.k.a., accident blackspots) and other useful statistics.
However, this information is available either post accident or it is static. For example, one location can have a relatively low concentration of accidents, but all occurring under a same set of rare circumstances. In this case, it would be useful to identify this location as a hotspot only when those circumstances take place, i.e., as a dynamic hotspot.
Knowing the conditions under which accidents happen and where they happen is very powerful information that can be used to take action to avoid them. For example, logistics companies can use this information to avoid specific routes, insurance companies can share this information with their clients, and cities can assign their traffic police to the dynamic hotspots.
In this blog post, we present a detailed analysis and a powerful predictive model that can help identify the factors affecting accident concentration and how these results can be further used to define dynamic hotspots. The analysis focuses on the city of Barcelona (Spain) using traffic accident data from 2019 and can be replicated in other cities and regions of the world. Barcelona was selected as it is a mid-large city and has available a rich Open Data catalog.
We identified different data sources that can influence accidents and worked with open and premium data. All premium datasets were obtained through CARTO’s Data Observatory.
The following map shows all the datasets we used for this model. The layer selector on the legends can be used to activate/deactivate one layer and analyze them. Note that data is only shown for an area of the city.
Open data comes from three sources:
Traffic accident data is organized in 5 different datasets and contains very detailed information such as geolocation, date and hour, number of injured people, age and gender of people involved, type and years of driving license, main cause, type of accident, and the type and color of vehicle.
We also identified other features that can influence accidents such as traffic density and points of interest (POI). All this data is provided by a range of CARTO data partners through CARTO’s Data Observatory.
We start our study by analyzing all the information available for the traffic accidents. We are interested in understanding accidents spatially, temporally, and based on factors such as age, type of vehicle, reason for trip, etc.
We first take a look at the daily time series (see figure below). This first analysis gives us an initial overview on the stationarity of accidents throughout the year. We can observe how:
We analyzed aggregated temporal patterns by day of the week, month, and time of the day (see figure below). We can see very different patterns that call for a dynamic and responsive approach to tackle accidents.
When looking at a more disaggregated time study (see figure below), we identified accidents during working days peaking during working hours whilst being more evenly distributed throughout the day (even at night hours) on weekends. Also, Thursdays from 2pm to 3pm and Fridays from 3pm to 4pm have an especially high number of accidents.
We have already identified that accidents don’t show a temporal stationary behavior and concentrate at specific hours depending on the day. Is this behavior also present spatially? As you might already know, it is. In fact, accidents concentrate in different parts of the city depending on the type of day and month of the year as can be seen in the two maps below. These results show the importance of defining dynamic hotspots.
Now we are interested in analyzing how factors such as age, type of vehicle, or trip purpose affect the number of accidents. In order to perform this analysis, we built the following dashboard using our Python package, CARTOframes. On the dashboard, accidents are labelled based on their type, and the widgets on the right allow us to filter them by cause, type of vehicle, type of day, among others, to discover new insights.
Some interesting insights we can easily identify are:
These insights further underline the importance of defining dynamic hotspots.
We were also interested in understanding the impact of weather and in particular, the impact of rain. Surprisingly enough, we found out that in Barcelona the difference in the number of accidents during rainy days vs non-rainy days is not statistically significant, most likely due to the city’s mild weather. This is, however, something to be considered in other cities as it might have a big impact on accidents, especially in cities with more extreme and variable weather conditions.
We have already seen how space, time, and other factors such as age and type of vehicle affect traffic accidents. In this section, we will apply modeling techniques to infer more advanced insights. First, we will perform a hotspot analysis using Local Moran’s I statistic, and then by training a predictive model to predict the annual number of accidents by area we can understand the effect of other factors such as human mobility and traffic signalling.
Before we start, we need to define our areas. Ideally, we should select a regular grid for the sake of data normalization. We selected Vodafone 250x250m cell grid as our base support and aggregated our data and performed our analyses with this support as reference. Note there are also standard hierarchical grids such as H3 or Quadkey grid that can make your analyses much easier, especially if you need to move between different aggregation levels.
Local Moran’s I statistic allows us to identify hot and cold areas in the city, which correspond to areas of statistically significant high and low concentration of accidents. Most importantly, it allows us to identify locations of high concentration of accidents in low concentration areas. Studying these locations can be even more important than hot areas and are considered spatial outliers.
The following map shows the hot (HH) and cold (LL) areas, together with the spatial outliers (HL and LH) with a significance level of 0.05. We identified six cells with a high concentration of accidents in low concentration areas. These are classified as HL (High-Low). Now, what is different in these cells with regards to their surrounding cells?
We first analyzed the urban characteristics of these cells. In most cases, they correspond to locations with large roundabouts or complex road systems as can be seen in the maps below.
We also looked at the characteristics of accidents in these areas compared to the rest of the city and found some interesting differences:
In order to study the impact of other factors such as human mobility and traffic signalling, we can train a predictive model to predict the annual number of accidents per cell. Note we are focusing now on the annual number of accidents to identify factors that affect accidents. This information will be very valuable to define dynamic hotspots as a further step.
The first step is to transform all the data we want to use to the Vodafone 250x250m cell grid we set as base geographic support.
Points and area related variables are easily transformed by aggregating. However, the working population is provided in a 100x100m cell grid which does not fit into the 250x250m cell grid. In order to transform this data, we used areal interpolation which can be automatically achieved using the data enrichment functionality of CARTOframes.
Note that other data preparation steps were performed and we have only focused on the purely spatial aspects.
Apart from the enriched data, we created new features based on the knowledge obtained throughout the analysis. These features are:
The main findings from this model are summarized on the SHAP summary plot below. We could highlight that accidents are highly influenced by:
It is especially interesting to analyze the effect that distance to the geographic city center has. We can see how for cells with a mid-high Road Type Index value, as you get closer to the city center, the number of accidents increases. However, for cells with a low Road Type Index value, the effect of distance to the geographic city center is significantly weaker.
Finally, regarding model performance, we obtained an R2 score of 0.71. The following map shows the ground truth compared to the model’s prediction. We can see that they look very similar when aggregating the results into intervals of accident intensity degrees.
Road traffic injuries are among the ten leading causes of death worldwide and have a significant impact on the world’s economy, affecting many public and private sectors. The study presented in this blog post shows that accidents are heavily influenced by space, time, and other factors such as age, type of vehicle, traffic, type of roads, and human mobility. These results call for further steps in which dynamic hotspots are identified so that this information can be used for better management of public and private resources.
The effective identification of dynamic hotspots can be used in a number of use cases including allowing:
 WHO, W. (2013). Global status report on road safety 2013: supporting a decade of action.
 Chen, S., Kuhn, M., Prettner, K., & Bloom, D. E. (2019). The global macroeconomic burden of road injuries: estimates and projections for 166 countries. The Lancet Planetary Health, 3(9), e390-e398.
 Mobility and transport - European Commission. (2016, Oct 19). Socio-economic costs and the value of prevention. Retrieved from here
 CENTERS FOR DISEASE CONTROL AND PREVENTION; DIVISION OF UNINTENTIONAL INJURY PREVENTION. (2020, Nov 6). Cost Data and Prevention Policies. Retrieved from here
 García-Altés, A., & Pérez, K. (2007). The economic cost of road traffic crashes in an urban setting. Injury prevention, 13(1), 65-68.
Want to get started?Sign up for a free account
Due to the current level of uncertainty surrounding the Covid-19 pandemic, the effects of the upcoming economic recession are nearly impossible to predict; however, history...Spatial Data
In recent years, the consumption and promotion of products which fall under the category of Organic / Natural / Local has increased dramatically. These are specific types o...Spatial Data
Please fill out the below form and we'll be in touch real soon.