When it comes to retail site selection applications, a common task consists of combining data from different sources to compare the characteristics of target locations and hone in on the most appropriate areas for footprint expansion or consolidation.
In a previous blogpost, we explained how to build a similarity score with respect to an existing site (e.g. the location of your top performing store) for a set of target locations. This approach is an essential tool for Site Planners looking to open new stores, relocate existing points of sale or consolidate their current network.
Our Analytics Toolbox for BigQuery, a set of User Defined Functions and procedures for running advanced spatial analytics natively in BigQuery, has been further enhanced with a new twin areas procedure, adding to the existing retail-specific procedures we have developed including revenue prediction, commercial hotspots and white space analysis.
In this post, we tackle Twin Areas analysis in more depth. This approach consists of three main steps:
The similarity score for each target cell is computed with respect to the score of the average cell in the target areas. Under this scoring rule, a target cell with a larger score is more similar to the origin cell than a target cell with a lower score; moreover, this score is positive if (and only if) the target cell is more similar to the origin than the mean cell.
To demonstrate how to run the Twin Areas analysis in BigQuery, we selected as potential origin locations the position of the top 10 performing liquor stores in 2019 from the publicly available Iowa Liquor sales dataset.
We start by using the GRIDIFY_ENRICH procedure from the data module in CARTO’s Analytics Toolbox. This procedure is used to first gridify a set of geometries (point data, in this case) to a quadkey grid with zoom 15, and then to enrich each selected location with data from a subscription to one of the datasets available in the Data Observatory, including the total population (total_pop_3409f36f) and the number of households (households_d7d24db5) at the Census Block Group level from the ACS Sociodemographics dataset, as well as from a custom dataset, which contains the count of road links (count_qualified) per zip code. The result can be found in the table cartobq.docs.twin_areas_origin_enriched
.
CALL `carto-un`.carto.GRIDIFY_ENRICH(
-- Input query
'SELECT * FROM `cartobq.docs.twin_areas_iowa_liquor_sales_origin`',
-- Grid params: grid type and level
'quadkey', 15,
-- Data Observatory enrichment
[('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
'carto-data.ac_7xhfwyml',
-- Custom data enrichment
'''
SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
''',
[('count_qualified','count')],
0,"uniform",
-- Output table
'cartobq.docs.twin_areas_origin_enriched');
The map below shows both the locations of the selected stores (left) as well as the enriched grid for the population variable (right):
Similarly, we can use this procedure to gridify and enrich the target areas for which we will use the Census Tracts polygons in Texas in the main urban areas. The result can be found in the table cartobq.docs.twin_areas_target_enriched
.
CALL `carto-un`.carto.GRIDIFY_ENRICH(
-- Input query
'SELECT geom FROM `cartobq.docs.twin_areas_target`',
-- Grid params: grid type and level
'quadkey', 15,
-- Data Observatory enrichment
[('total_pop_3409f36f','sum'),('households_d7d24db5','sum')],
'carto-data.ac_7xhfwyml',
-- Custom data enrichment
'''
SELECT geom, count_qualified FROM `cartobq.docs.twin_areas_custom`
''',
[('count_qualified','count')],
0,"uniform",
-- Output table
'cartobq.docs.twin_areas_target_enriched');
Once we have gridified and enriched the origin and target areas, we can run the FIND_TWIN_AREAS procedure for a given origin location, here selected as the store with the highest revenue. The result of the analysis can be found in the table cartobq.docs.twin_areas
.
CALL `carto-un`.carto.FIND_TWIN_AREAS(
-- Input queries
'''SELECT * FROM `cartobq.docs.twin_areas_origin_enriched`
LIMIT 1''',
'''SELECT * FROM `cartobq.docs.twin_areas_target_enriched`''',
-- Twin areas model inputs
-- Grid type
'quadkey',
-- Percentage of explained variance retained by the Principal Component Analysis (PCA)
0.90,
-- Maximum number of twin areas
NULL,
-- Output prefix used to store the results: <output_prefix>_model_pca_components for the PCA model and <output_prefix>_<origin_cell_ID>_results for the results
'cartobq.docs.twin_areas');
As shown by the code above, the procedure requires the user to specify through a SQL query the origin and target cells, as well as to provide some inputs for the calculation of the similarity score, namely:
The results of this procedure for a selected origin location can be seen in this map:
This map shows the similarity skill score for all the target cells with a positive score in Texas: larger scores indicate areas more similar to the origin location.
Traditionally, discovering new areas for retail business expansion represented a difficult and lengthy process, which required on-site market analysis and local expertise. Using our Twin Areas tool, retailers and CPG brands can now easily discover the best locations to expand or optimize their network without any strong prior knowledge of the area, thus optimizing their site planning process. The approach takes full advantage of our comprehensive data catalog and the analytical capabilities of CARTO’s cloud native Location Intelligence platform.
These procedures are now available to CARTO users as part of the Analytics Toolbox for BigQuery. They can be run directly from the BigQuery console or from your SQL or Python Notebooks using the BigQuery client. Additionally, they will soon power the white space analysis engine integrated into CARTO’s Site Selection solution.
Want to get started?
Sign up for a free 14-day trial ![]() | This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 960401. |
You’ve probably seen hexagon grids on maps, and maybe even created some of your own. But have you ever stopped to think about why? This is CARTO’s definitive guide to hexag...
Spatial DataThe retail landscape of a city is a complex and intricate one. A store’s success or failure depends heavily on its location; whether it is in a busy area, has a strong loca...
Spatial DataIn New York City, a number of companies such as 1520, Fridge No More, JOKR and BUYK have started offering local grocery delivery in 15 minutes from dark stores. As these se...
Spatial DataPlease fill out the below form and we'll be in touch real soon.