As brick & mortar retailers, and the CPG categories they distribute, look to expand their footprint, network planners need access to insightful analysis and modeling techniques to drive better decisioning. After all, opening a new location is often the single largest cost retailers have to bear. In this post, we outline how spatial data and white space analysis can be used to surface the most viable new store sites.
Our Analytics Toolbox for BigQuery, a set of User Defined Functions for running advanced spatial analytics natively in BigQuery, has been further enhanced with a new set of retail-specific functions including revenue prediction and white space analysis.
White space analysis solves the ultimate question in retail expansion: where should I open my next store? The output of the analysis are the best potential locations, understood as those where the expected revenue is higher and where other business criteria are also met: e.g., the presence or absence of existing stores and/or competitors.
In order to complete end-to-end white space analysis, three steps are necessary, each corresponding to a specific procedure in the CARTO’s Analytics Toolbox for BigQuery: data preparation, model training, and locating whitespace areas, running the BUILD_REVENUE_MODEL_DATA, BUILD_REVENUE_MODEL, and FIND_WHITESPACE_AREAS procedures, respectively.
To showcase the usage of the white space analysis we will build on the same revenue prediction use-case of our previous blogpost. We are going to use a set of liquor store locations under the brand Hy-Vee and their associated annual revenue for 2019, extracted from the publicly available Iowa Liquor Sales datasets. All other liquor stores will be considered Hy-Vee’s competitors.
In this particular example we are using an H3 grid at resolution 7 and the same set of enrichment variables we used in the previous blog post: total population and number of POIs for different categories, in particular, retail, financial, food & drinks, leisure, transportation and tourism, extracted from CARTO’s Spatial Features dataset available via our Data Observatory.
This step has the objective of enriching the area of interest where the analysis will be performed using both the data provided by the retailer (existing store locations and their revenue, and competitor locations), and data from the Data Observatory subscription detailed above .
The following SQL operation is performed using the BUILD_REVENUE_MODEL_DATA procedure:
CALL `carto-un`.carto.BUILD_REVENUE_MODEL_DATA(
-- Stores: revenue, store, geom
'''
SELECT store_name as store,ST_GEOGFROMTEXT(any_value(store_location)) as geom, sum(sale_dollars) as revenue FROM
`bigquery-public-data.iowa_liquor_sales.sales`
where store_name like "%Hy-Vee%" and EXTRACT(YEAR FROM date)=2019
group by store_name
''',
-- Competitors: competitor, geom
'''
SELECT store_name as competitor,ST_GEOGFROMTEXT(any_value(store_location)) as geom FROM
`bigquery-public-data.iowa_liquor_sales.sales`
where store_name not like "%Hy-Vee%" and EXTRACT(YEAR FROM date)=2019
group by store_name
''',
-- Area of interest: geom
'''
SELECT state_geom AS geom
FROM `bigquery-public-data.geo_us_boundaries.states`
WHERE state_name='Iowa'
''',
-- Grid params: grid type and level
'h3', 7,
-- Decay params: kring size and decay function
1, '',
-- Data Observatory enrichment
[("population_f5b8d177", 'sum'), ("retail_21cb1815", 'sum'),("financial_9b8bb507",'sum'),("food_drink_bd2ef444",'sum'),("leisure_16da1bd6",'sum'),
("transportation_63a2617a",'sum'),("tourism_fc201498",'sum')],
'<my-dataobs-project>.<my-dataobs-dataset>',
-- Custom data enrichment
NULL, NULL,
-- Output destination prefix
'cartobq.docs.whitespace_analysis_iowa'
);
The result is a table (cartobq.docs.whitespace_analysis_iowa_model_data) including one row for every H3 grid cell at resolution 7; each containing their associated enrichment values, including the number of stores, competitors and the variables coming from CARTO’s Spatial Features dataset specified earlier.
This step builds the revenue prediction model using the data resulting from the data preparation step. The model is trained using BigQuery ML’s Boosted Tree Regressor, which the user can tweak using the options parameter of the BUILD_REVENUE_MODEL procedure.
The following SQL operation is performed:
CALL `carto-un`.carto.BUILD_REVENUE_MODEL(
-- Model data
'cartobq.docs.revenue_prediction_iowa_whitespace_analysis_data',
-- Options
'{"MAX_ITERATIONS": 20, "MAX_TREE_DEPTH":3}',
-- Output destination prefix
'cartobq.docs.whitespace_analysis_iowa
'
);
Once we have enriched our area of interest using the chosen spatial resolution and trained our revenue prediction model, the last step consists in finding the locations within our area of interest where a new store would perform best. For that purpose, we are going to use the FIND_WHITESPACE_AREAS procedure, which takes the following parameters as input:
For our particular analysis, a good choice for the generator locations is the set of transportation, retail and food & drink POIs, as those are the ones with the highest feature importance in our revenue prediction model (see table below). The location of these POIs has been extracted from the OpenStreetMap planet nodes dataset and those for Iowa have been stored in the table cartobq.docs.whitespace_analysis_iowa_generator_pois.
We can use this procedure to run a preliminary exploration to visualize all areas with a minimum amount of potential revenue, in this case $200,000. The result has been stored in the table whitespace_analysis_iowa_result_allcells_min2e5.
CALL `carto-un`.carto.FIND_WHITESPACE_AREAS(
'cartobq.docs.whitespace_analysis_iowa_revenue_model',
'cartobq.docs.whitespace_analysis_iowa_revenue_model_data',
'SELECT * FROM cartobq.docs.whitespace_analysis_iowa_generator_pois',
NULL, -- AOI query
2e5, -- Minimum revenue
NULL, -- Max results
TRUE, -- Include cells with own stores
TRUE -- Include cells with competitors
)
The following map displays the resulting whitespace areas, coloured by the amount of potential revenue, where darker colors represent higher values of revenue. The generator locations have also been included for reference in the visualization.
Then, we can rerun the procedure to find only the areas with the top 10 highest potential revenue where no previously existing stores or competitors are present.
CALL `carto-un`.carto.FIND_WHITESPACE_AREAS(
'cartobq.docs.whitespace_analysis_iowa_revenue_model',
'cartobq.docs.whitespace_analysis_iowa_revenue_model_data',
'SELECT * FROM cartobq.docs.whitespace_analysis_iowa_generator_pois',
NULL, -- AOI query
2e5, -- Minimum revenue
10, -- Max results
FALSE, -- Include cells with own stores
FALSE -- Include cells with competitors
)
The resulting white space areas are represented in red in the following map. Existing stores are displayed in blue, while competitors in yellow.
As we have seen in this post, white space analysis gives both retailers and CPG brands the flexibility to explore network expansion opportunities based on valuable spatial insights. This approach takes the guesswork out of site selection decisions, so businesses can hone in on those retail locations most likely to succeed.
These procedures are now available to CARTO users as part of the Analytics Toolbox for BigQuery. They can be run directly from the BigQuery console or from your SQL or Python Notebooks using the BigQuery client. Additionally, they will soon power the white space analysis engine integrated in CARTO’s Site Selection application.
Want to get started?
Sign up for a free 14-day trial ![]() | This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 960401. |
Wildfires can be detrimental to urban and rural areas, causing impacts in the form of injury, death and property loss. In fact, recent studies (here and here) have shown a ...
Spatial DataYou’ve probably seen hexagon grids on maps, and maybe even created some of your own. But have you ever stopped to think about why? This is CARTO’s definitive guide to hexag...
Spatial DataWhen it comes to retail site selection applications, a common task consists of combining data from different sources to compare the characteristics of target locations and ...
Spatial DataPlease fill out the below form and we'll be in touch real soon.