Due to the current level of uncertainty surrounding the Covid-19 pandemic, the effects of the upcoming economic recession are nearly impossible to predict; however, history tells us that there are certain sociodemographic, socioeconomic, education, and employment factors that expose certain groups of people to more severe consequences of an economic crisis. This time, and unlike in 2008, the economic recession will arrive following a global pandemic. The non-Pharmaceutical interventions (NPIs) applied by governments around the world; such as school closures, remote work policies, and shelter-in-place (lockdown) strategies to limit intra- and inter-national mobility, have caused additional havoc to certain economic sectors such as the retail and tourism industries. This also translates into an additional risk factor to the individuals making a living from the economic support generated by business activities in those jeopardised sectors.
Based on a series of aggravating socio-demographic and socio-economic factors, this study attempts to identify areas in Spain that have a higher risk exposure to the upcoming economic recession. To help build the risk model a set of indicators have been used that relate both to the population and economic activity in these areas.
The reasoning behind our selection of specific aggravating indicators from the different datasets available is as follows:
It is important to note that all these datasets are provided on a 100x100m grid covering the whole country.
For reference below is a listing of all features from the selected datasets used for the analysis:
Unica360 - Sociodemographics
p_ed_15_24_h (combined with income)
Ratio between the number of males with ages between 15 and 24 living in the area and the index of average income over national average
p_ed_15_24_m (combined with income)
Ratio between the number of females with ages between 15 and 24 living in the area and the index of average income over national average
Number of males with ages between 65 and 79 living in the area
Number of males with age +80 living in the area
Number of females with ages between 65 and 79 living in the area
Number of females with age +80 living in the area
Population a priori economic immigrants
Index of average income over national average
Unica360 - Cadaster
Number of offices
Floor area of offices
Number of commercial properties
Average floor area of commercial properties
Number of industrial properties
Average floor area of industrial properties
Unica360 - Commercial Index
Culture index Includes: museums, libraries, book stores, theaters
Retail index. Includes retailers: all kinds of store fronts, such as food, fashion, pharmacies, banks, beauty salons... Does not include: sanitary services (doctors, physical therapists), places of worship, hospitality, public administration
Hostelry index. Includes: bars, restaurantes, cafés, takeaway, night clubs
Unica360 - Tourism
Total tourists expected per year (without considering COVID-19 impact)
Total foreign tourists expected per year (without considering COVID-19 impact)
Unica360 - Working Population
Number of employees
Number of employees in headquarters
Number of employees in branch offices
The risk index was calculated following the methodology outlined in the article “Deprivation index by enumeration district in Spain, 2011”.
The methodology builds an index using Principal Component Analysis (PCA), allowing us to reduce the dimensionality of the problem by capturing the majority of the variance in the resultant principal component.
Before applying PCA, we need to perform some pre-processing on the input data. First, we remove rows with insufficiently informed entries in their columns and then fill the missing values in the final subset of rows. To achieve this, we used the median value of the municipality for which the data in that row partains. Once there are no missing values within the input dataset, we standardize the data (a required step to run the PCA).
With the standardized data, we then check the Spearman correlation of each possible pair of columns, removing one if they show a Spearman correlation higher than 0.8. This equates to both columns providing the same information (and variance), meaning we can safely remove one of them. We then apply the PCA, keeping only the first component (which will be our risk index).
Finally, we check the Spearman correlation of each covariate against the first component of the PCA, removing those with a correlation less than 0.4. We perform this step in order to maximize the variance captured by the first component. Then, we recompute the PCA with the resultant covariates, providing our final index.
In the image below we can see the different features considered in the first run of the PCA (having removed correlated features), and the correlation of every feature with the privation index. This gives us a sense of the feature importance within our model. Those labelled in blue are the input features used for the second run of the PCA.
Next we compute a set of clusters for the index values so we can label each zone as having a low, low-medium, medium, medium-high, or high risk of being affected by the upcoming recession. To do this we apply the natural breaks (jenks) on the index values. This minimizes the variance between elements of the same cluster and maximizes the variance between elements of other clusters. In the image below we can see the values of the bins for the different risk categories resulting from the computation of the natural breaks.
Performing a high level analysis of the results across different areas of Madrid, we can see that areas with a higher risk (High or Medium-High) are those within the M30 orbital motorway and in the neighborhoods towards the south and south-west.
On the other hand, areas with a higher concentration of cells classified as Low risk or Low-Medium risk are located within the north and north-west in neighborhoods such as Aravaca and Chamartin.
By looking closer at the distribution of the features belonging to the “high” and “low” risk categories, and comparing their distributions with all cells within the city, we can get a better understanding on what characterizes the cells in each cluster. For example, the image below shows the comparison between the distributions of each feature in the “high risk” category and the overall distribution of the features in the whole city of Madrid.
We can clearly see that “high risk” cells have higher values of every feature in the final set of aggravating factors. This means that “high risk” cells tend to have a:
The same occurs if we look at the “low risk” cells where distribution tends to be positively skewed, indicating that values are lower for these cells compared to the rest of the areas.
Looking at a different city, Seville for example, we can see that the distribution of high risk values is spread in two well-separated zones: the city center and the neighborhood of Triana. These two zones have a higher density of bars and restaurants and a higher expected presence of foreign tourism under normal travel conditions.
Also, we discover that the vast majority of the cells with a low and low-medium risk index are located in areas surrounding the city center, where there are more residential and less touristy neighborhoods.
As outlined during the introduction the aim of this study is to help identify areas in Spain that have a higher risk exposure to the upcoming economic recession.
Given the current global uncertainty along with the frequent and rapid changes in NPIs and governmental economic response, it is extremely difficult to predict with accuracy how and where the recession will have the most impact. Therefore in order to make a more informed prediction we have leveraged commonalities from studies referencing other recent economic recessions.
This study also provides a good example of how to combine indicators from different data sources in order to build a derived index.
Want to see this in action?Request a live personalized demo
Over the past year consumer behavior has changed significantly and many believe permanently. Last year U.S. CPG sales rose by 10.3% to $933 billion as consumers stocked up ...Spatial Data
As we enter into the second year of a COVID-19 era, consumers are continuing to shift how they move through the physical world with the physical location of businesses and ...Spatial Data
Please fill out the below form and we'll be in touch real soon.