How does the city that never sleeps keep moving so fast? Coffee, of course! In this article, we investigate the spatial patterns of coffee shops across New York City and show you how to do the same.
Explore the map below to find out more about caffeination deprivation across New York City! Darker pink areas indicate where residents are suffering from greater caffeine deprivation, while lighter areas have better caffeine provision.
Caffeine deprivation is a (CARTO-coined) measure which assesses how hard it is for residents to get a coffee. It takes into account three spatial measures - population, distance to the closest coffee shop and how many residents have to share their local coffee shops. The insights from this type of location-based analysis can help coffee brands and retailers unlock their potential through more data-driven site selection.
According to our calculations, the five most caffeine deprived areas in New York City are as follows:
As well as all being located in the Bronx, a common trait amongst these areas is their proximity to major highways. The top three areas straddle the Grand Concourse, Cross Bronx Expressway and 3rd and Webster Avenues respectively. Whilst busy streets are not the most obvious location for enjoying a quiet cup of coffee, by avoiding these locations could retailers be missing out on the caffeine-craving commuter crowd?
At the opposite end of the scale, the five least caffeine-deprived locations are:
It probably won’t come as a surprise to anyone that Manhattan is the most well-provisioned when it comes to caffeine supplies, with three of the top five most caffeine-soaked areas located here. With 1,767 coffee shops on the island alone, residents of Manhattan rarely have to travel far for a coffee. There are spatial patterns within this, however, with areas midtown and downtown generally experiencing less caffeine deprivation than neighborhoods uptown of Central Park.
Another complexity to consider is that Manhattan typically experiences higher numbers of inbound tourists and commuters than other New York City boroughs, with 664,000 inbound commuters traveling from outside the NYC area (NYC Planning, 2017). It’s not just their neighbors they’re queuing up against to get their morning latte!
Our analysis shows there are a number of locations across New York City just crying out for extra caffeination! Areas with a large population but a lack of coffee shop provision include Arden Heights and Grasmere, both on Staten Island and large parts of southeast Brooklyn and Queens.
There are a large number of factors which go into the decision-making process of where to place a coffee shop. As well as population and existing services, other demographic and economic factors should be considered. This could include market profile, tourist and commuter numbers and footfall (keep reading to find out more specifics!).
Whether it’s expansion, consolidation or performance monitoring - location data is critical to avoid million-dollar mistakes in site selection. Leading retailers such as Ikea, Decathlon, Gap and Group Dynamite use our solutions to identify whitespace, understand catchments, locate customers and - ultimately - pinpoint the most successful locations. You can read more about this here.
Did you know that in the U.S. it can cost more than $1 million to open a McDonald’s, Taco Bell, Burger King, or Wendy’s restaurant?
Measuring the provision of services is a fantastic way to gain insights when selecting new locations for your business or service, whether that’s a coffee shop, an apparel store or student dorms. This type of analysis can tell you a lot about your market and competitors. For example, Experian market segmentation data can be used to assess the location and size of target markets, which could be combined with Mastercard Geographic Insights to understand spend frequency and volume. Walk, cycle and drive-time catchments could also be incorporated into the analysis to understand brand whitespace.
The utility of provision analysis data isn’t just limited to site selection; it could be used to inform housing developments, sustainability assessments and equity analysis.
At CARTO we’re big coffee fans which is why we focused on caffeine, but your solutions and analysis could look at… well, anything! Check out our recent ”Uncovering Site Selection Strategies using Point of Interest Data” blog post where we took a deep-dive into competitor location analysis for fast food restaurants. For both this and our caffeination analysis we used our Core Places dataset – an expansive dataset including everything from car dealerships to supermarkets to clinics – to identify POI locations.
In this section of the blog, we’ll walk you through how to recreate this analysis yourself using CARTO builder, so grab a coffee and let’s get mapping! If you don’t have access to CARTO, get yourself a copy using our 14-day free trial.
This tutorial is rated 2 / 5 and is suitable for novices, meaning a good understanding of basic GIS and SQL concepts and skills will be helpful.
All of the data used in this analysis can be accessed directly from our Spatial Data Catalog. For this specific map we’re using data for the United States, but the datasets exist for a large number of countries meaning this analysis could be recreated virtually anywhere in the world… well, maybe not Antarctica (technically you could make this map, but it may not be the most interesting…).
Take a moment to subscribe to all of these datasets before we get started.
When performing any analysis, it’s always a good idea to load and inspect the source data you’ll be using. Familiarizing yourself with the data will help you to understand the results of your analysis and spot any errors. This also means you have early versions of your SQL code to refer to in case of errors.
First of all, create a blank map, then click Add source from > Custom SQL Query… Let’s start off by loading our county data as we’ll be using it for all of our other queries. The below simple SQL query loads in the USA county boundaries (“sub_carto_geography_usa_county_2019”), filtered to only those which comprise New York City.
SELECT * FROM `carto-data.ac_7xhfwyml.sub_carto_geography_usa_county_2019` where do_label In ("New York", "Queens","Kings","Richmond","Bronx") and geoid LIKE "36%"
The code below first joins the geometry of the hexagonal grid (do_geo) to the all-important demographic data which relates to them (do_data). We’ll also apply the ST_Intersects() function to limit the hexagons to only those which intersect with New York City.
WITH NYC AS (SELECT * FROM `carto-data.ac_7xhfwyml.sub_carto_geography_usa_county_2019` where do_label In ("New York", "Queens","Kings","Richmond","Bronx") and geoid LIKE "36%")
/*Load the hexagons, joining the geometry to the associated data */
SELECT
do_geo.geom as geometry, do_data.population
FROM `carto-data.ac_7xhfwyml.sub_carto_derived_spatialfeatures_usa_h3res8_v1_yearly_v2` do_data, NYC
INNER JOIN `carto-data.ac_7xhfwyml.sub_carto_geography_usa_h3res8_v1` do_geo
ON do_geo.geoid = do_data.geoid
WHERE st_intersects(NYC.geom, do_geo.geom)
You could of course emulate this for any state, or any geographical area for that matter! All you would need to do is replace the “NYC” selection with your geographic area of choice and replace the “USA” section of the H3 dataset name with the 3-letter country code required.
Finally, let’s load in our coffee shops. The Core Places data we’re using for this is enormous, but luckily CARTO’s MO is big data. However, it’s also good practice to streamline SQL queries to make your processes faster and more efficient, which we can do with the following steps:
Many of the datasets in our analysis have the “geoid” field, so to avoid confusion let’s rename this “ID.” The final code for this is below. You can change the contents of your “where” filter to focus on any other topics within the Core Places dataset - there are a lot!
WITH
/*Load the states */
NYC as (SELECT * FROM `carto-data.ac_7xhfwyml.sub_carto_geography_usa_county_2019` where do_label In ("New York", "Queens","Kings","Richmond","Bronx") and geoid LIKE "36%"),
/*Load the cafes, joining the geometry to the associated data and filtering by relevant fields and geography*/
cafe as (SELECT do_geo.geoid as ID, do_data.region,do_data.city, do_data.sub_category, do_data.location_name, do_data.brands, do_data.latitude, do_data.longitude
FROM `carto-data.ac_7xhfwyml.sub_safegraph_pointsofinterest_coreplaces_usa_latlon_v3_daily_v2` do_data, NYC
INNER JOIN `carto-data.ac_7xhfwyml.sub_safegraph_geography_usa_latlon_v3` do_geo
ON do_geo.geoid = do_data.geoid
WHERE st_contains(nyc.geom,do_geo.geom) AND sub_category = "Snack and Nonalcoholic Beverage Bars" AND category_tags LIKE "%Coffee%"
GROUP BY do_geo.geoid, do_data.region, do_data.city, do_data.sub_category, do_data.location_name, do_data.brands, do_data.geoid, do_data.latitude, do_data.longitude)
SELECT ID, region, city, sub_category, location_name, brands, latitude, longitude, st_geogpoint(longitude, latitude) as geom FROM cafe
At this point I’m going to export my coffee shop query to a new table called cartobq.maps.coffee_ny to streamline my later query, but this is optional.
If you’ve followed this example you should now have a map with 3 layers; the New York state boundary, h3 hexagon grid and all coffee shops in the state - it might look a little something like this!
CARTOgraphic tip: when using a dark basemap, turn on additive blending modes and watch your features glow!
Spend a bit of time familiarizing yourself with your data and looking through the data tables to prepare for the analysis we’re about to do - it’ll make your life a lot easier in about five minutes’ time! Also take the time to rename your layers something meaningful - you don’t want to end up confused with layer names like “Layer 1” and “Copy of Layer 3.”
The next step of this analysis is to turn our hexagon grid into a heatmap showing areas of caffeine deprivation. To do this, we need to find out four things about each hexagon.
With this information we can build our caffeination deprivation index! For this we will normalize the results of points 3 and 4 on a scale of 0 to 1 from most to least deprived. We’ll add these results together, giving each hexagon a score of between 0 and 2 (2 being the most caffeine-deprived, being furthest from a coffee shop and/or having to share their coffee shop with the most other people).
Ready?
We’ll be utilizing the CARTO Analytics Toolbox for BigQuery for this analysis, which is a set of UDFs and Stored Procedures to unlock Spatial Analytics. There are two types of modules within the toolbox: core modules, that are and free to use for anyone with a BigQuery account, and advanced modules, only available with a CARTO account. You can find out more about unlocking the power of spatial data with our Analytics Toolbox.
Create a new custom SQL query. This will be quite a long query, as we’ll essentially be combining everything we’ve done before - but we’ll break it down step by step. This could be done much more efficiently in fewer subqueries, but as this is a tutorial it can be helpful to break things down into bite-size pieces of code rather than novels of SQL.
Just like we did previously, load in the New York state boundary, coffee shops and hexagons. We now have tables called “state,” “cafes” and “hexnyc” which we can quickly refer to for the remainder of our analysis.
WITH
/*Load the states */
NYC as (SELECT do_label, geom as geometry FROM `carto-data.ac_7xhfwyml.sub_carto_geography_usa_county_2019` where do_label In ("New York", "Queens","Kings","Richmond","Bronx") and geoid LIKE "36%"),
/*Load the cafes, joining the geometry to the associated data and filtering by amenity type */
Cafes as (SELECT geom FROM cartobq.maps.coffee_ny, nyc where st_contains(nyc.geometry, coffee_ny.geom)),
/*Load the hexagons, joining the geometry to the associated data */
hex as (SELECT
do_geo.geom as geom, do_data.population, do_data.geoid
FROM `carto-data.ac_7xhfwyml.sub_carto_derived_spatialfeatures_usa_h3res8_v1_yearly_v2` do_data
INNER JOIN `carto-data.ac_7xhfwyml.sub_carto_geography_usa_h3res8_v1` do_geo
ON do_geo.geoid = do_data.geoid),
/*Filter hexagons to New York City*/
hexnyc as (SELECT * FROM hex, nyc
where st_intersects(nyc.geometry, hex.geom)),
The next short snippet of code creates a field called “distance” which calculates how far each hexagon is from its closest coffee shop. Where a coffee shop falls inside the hexagon, it is given the distance 0.
/*Calculate distance to the closest coffee shop*/
hexdist as( select hexnyc.*, (select min(st_distance(hexnyc.geom,cafes.geom)) from cafes) AS distance from hexnyc),
In this next section of code, we calculate the field “joincount” as the number of cafes contained by each hexagon by performing a left join and again using st_contains(). Now we have the join count calculated, we can work out the number of residents per coffee shop. As we can’t divide zero by zero, we also need to override the values for when either the “population” or “joincount” equals zero.
/*Calculate number of coffee shops in each hex, and population per coffee shop*/
hexagg as (select hexdist.geoid, hexdist.population, hexdist.distance,
count(cafes.geom) as joincount, /*Count number of cafes inside hex*/
case when count(cafes.geom) = 0 then hexdist.population
when hexdist.population = 0 then 0
else hexdist.population/count(cafes.geom) end as countperpop /*Calculate population per cafe*/
from hexdist
LEFT JOIN cafes on st_contains(hexdist.geom, cafes.geom) /*Join cafes to hex when contained by them*/
group by hexdist.geoid, hexdist.population, hexdist.distance)
Now we have all of this data in one table, we can calculate the index! The query section below creates a field called “index” where each hexagon has a score between 0 and 2, with 2 being where areas experience more deprivation, being further from coffee shops and with higher populations. This code also excludes areas with extremely low populations (<1,000) from the index.
select hexagg.geoid, round(hexagg.population,0) as Population, round(hexagg.distance,0) as Distance,hexagg.joincount as JoinCount, round(hexagg.countperpop,3) as PopulationPerCoffeeShop,
case
when hexagg.population < 1000 then 0
else
round((1.00*(hexagg.distance-(SELECT min(hexagg.distance) from hexagg)))/ ((SELECT max(hexagg.distance) from hexagg)-(SELECT min(hexagg.distance) from hexagg))+
(1.00*(hexagg.countperpop-(SELECT min(hexagg.countperpop) from hexagg)))/ ((SELECT max(hexagg.countperpop) from hexagg)-(SELECT min(hexagg.countperpop) from hexagg)),4) end
as index
from hexagg
Now is also a good time to do a bit of data cleaning with your end user’s experience in mind. You can see in the section where we’re selecting our fields that the fields have all been rounded to a sensible number of decimal places, with more legible names given. We’ve included the full code for the index query at the bottom of this post in case you’d like to see how all the sections fit together.
And that… is it! If you’ve followed these steps, you should now have a hexagon grid layer which contains a field called “index” where higher numbers indicate more caffeine deprivation - congratulations!
The Caffeination Deprivation Index
Mapped against the raw data
Having some problems? You can find a full copy of the code used to create the index layer below. We also have a range of support available for our CARTO users which you can explore here, including our fantastic user community!
WITH
/*Load the states */
NYC as (SELECT do_label, geom as geometry FROM `carto-data.ac_7xhfwyml.sub_carto_geography_usa_county_2019` where do_label In ("New York", "Queens","Kings","Richmond","Bronx") and geoid LIKE "36%"),
/*Load the cafes, joining the geometry to the associated data and filtering by amenity type */
Cafes as (SELECT geom FROM cartobq.maps.coffee_ny, nyc where st_contains(nyc.geometry, coffee_ny.geom)),
/*Load the hexagons, joining the geometry to the associated data */
hex as (SELECT
do_geo.geom as geom, do_data.population, do_data.geoid
FROM `carto-data.ac_7xhfwyml.sub_carto_derived_spatialfeatures_usa_h3res8_v1_yearly_v2` do_data
INNER JOIN `carto-data.ac_7xhfwyml.sub_carto_geography_usa_h3res8_v1` do_geo
ON do_geo.geoid = do_data.geoid),
/*Filter hexagons to New York City*/
hexnyc as (SELECT * FROM hex, nyc
where st_intersects(nyc.geometry, hex.geom)),
/*Calculate distance to the closest coffee shop*/
hexdist as( select hexnyc.*, (select min(st_distance(hexnyc.geom,cafes.geom)) from cafes) AS distance from hexnyc),
/*Calculate number of coffee shops in each hex, and population per coffee shop*/
hexagg as (select hexdist.geoid, hexdist.population, hexdist.distance,
count(cafes.geom) as joincount, /*Count number of cafes inside hex*/
case when count(cafes.geom) = 0 then hexdist.population
when hexdist.population = 0 then 0
else hexdist.population/count(cafes.geom) end as countperpop /*Calculate population per cafe*/
from hexdist
LEFT JOIN cafes on st_contains(hexdist.geom, cafes.geom) /*Join cafes to hex when contained by them*/
group by hexdist.geoid, hexdist.population, hexdist.distance)
select hexagg.geoid, round(hexagg.population,0) as Population, round(hexagg.distance,0) as Distance,hexagg.joincount as JoinCount, round(hexagg.countperpop,3) as PopulationPerCoffeeShop,
case
when hexagg.population < 1000 then 0
else
round((1.00*(hexagg.distance-(SELECT min(hexagg.distance) from hexagg)))/ ((SELECT max(hexagg.distance) from hexagg)-(SELECT min(hexagg.distance) from hexagg))+
(1.00*(hexagg.countperpop-(SELECT min(hexagg.countperpop) from hexagg)))/ ((SELECT max(hexagg.countperpop) from hexagg)-(SELECT min(hexagg.countperpop) from hexagg)),4) end
as index
from hexagg
Did you make this map, or adapt this tutorial to do something different? We love seeing your mappy creations! Share it with us on Twitter or LinkedIn using the hashtag #cartocreations 🗺️.
We’re always looking for ways to improve how we share knowledge with you, and would love it if you took the time to send us your feedback using the form below.
Want to see this in action?
Request a live personalized demoWhile travel and commuting pick up speed once again across the world, investment in out-of-home advertising (OOH) is growing in tandem. With OOH ad spend projected to grow ...
Use CasesThroughout the course of history, the use of the term “equity” has ebbed and flowed in frequency. Recently, equity has become quite a buzzword. Why? Advancing equity across...
Use Cases“Imagine a gender equal world.”
Use CasesPlease fill out the below form and we'll be in touch real soon.