CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

This version includes breaking changes, check the CHANGELOG for more information

Quickstart

Introduction

Hi! Glad to see you made it to the Quickstart guide! In this guide you are introduced to how CARTOframes can be used by data scientists in spatial analysis workflows. Using fake Starbucks revenue data, this guide walks through some common steps a data scientist takes to answer the following question: which stores are performing better than others?

Before you get started, we encourage you to have CARTOframes installed so you can get a feel for the library by using it:

1
pip install --pre cartoframes

If you want to know other ways to install it, check out the installation guide first.

Spatial analysis scenario

Let’s say you are a data scientist working for Starbucks and you want to better understand why some stores in Brooklyn, New York, perform better than others.

To begin, let’s outline a workflow:

  • Get and explore your company’s data
  • Create areas of influence for your stores
  • Enrich your data with demographic data
  • And finally, share the results of your analysis with your team

Let’s get started!

Get and explore your company’s data

This dataset is the one you have to start your exploration. It contains information about the location of Starbucks and each store’s annual revenue. As a first exploratory step, you read it into a Jupyter Notebook using pandas.

1
2
3
4
import pandas as pd

stores_df = pd.read_csv('../files/starbucks_brooklyn.csv')
stores_df.head()
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411

To be able to display your stores as points on a map, you first have to convert the address column into geometries. This process is called geocoding and CARTO provides a straightforward way to do it (you can learn more about it in the location data services guide).

In order to geocode, you have to set your CARTO credentials. If you don’t know your API key yet, check the authentication guide to learn how to get it. In case you want to see the result of the geocoding without being logged in, here it is the geocoded dataset.

Note: If you don’t have an account yet, you can get a trial, or a free account if you are a student, by signing up here.

1
2
3
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

Now, we are ready to geocode the dataframe:

1
2
3
4
from cartoframes.data.services import Geocoding

stores_cdf, _ = Geocoding().geocode(stores_df, street='address')
stores_cdf.head()
cartodb_id the_geom name address revenue gc_status_rel carto_geocode_hash
0 1 POINT (-73.95901 40.67109) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.91 9212e0e908d8c64d07c6a94827322397
1 2 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.97 b1bbfe2893914a350193969a682dc1f5
2 3 POINT (-73.98976 40.61912) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.95 e47cf7b16d6c9b53c63e86a0418add1d
3 4 POINT (-74.02744 40.63152) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.95 2f21749c02f73116892eb3b6fd5d5738
4 5 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.95 134c23973313802448365db6235783f9

Done! Now that the stores are geocoded, you will notice a new column named geometry has been added. This column stores the geographic location of each store and it’s used to plot each location on the map.

You can quickly visualize your geocoded dataframe using the Map and Layer classes. Check out the visualization guide to learn more about the visualization capabilities inside of CARTOframes.

1
2
3
from cartoframes.viz import Map, Layer

Map(Layer(stores_cdf))

Great! We have a map!

Now, you have a better sense about where the stores are. To continue with your exploration, you want to know where are the stores with more revenue. To do so, you can use the size_continuous_layer visualization layer:

1
2
3
from cartoframes.viz.helpers import size_continuous_layer

Map(size_continuous_layer(stores_cdf, 'revenue', 'Annual Revenue ($)'))

Good job! By using the size continuous visualization layer you can see right away where the stores with higher revenue are. By default, visualization layers also provide a popup with the mapped value and an appropriate legend.

Create your areas of influence

Similar to geocoding, there is a straightforward method for creating isochrones to define your areas of influence. Isochrones are concentric polygons that display equally calculated levels over a given surface area measured by time.

For our analysis, let’s create isochrones for each store that cover the area within a 15 minute walk.

To do this we will use the Isolines data service:

1
2
3
4
from cartoframes.data.services import Isolines

isochrones_cdf, _ = Isolines().isochrones(stores_cdf, [15*60], mode='walk')
isochrones_cdf.head()
cartodb_id source_id data_range lower_data_range the_geom range_label
0 1 1 900 0 MULTIPOLYGON (((-73.95934 40.68011, -73.96074 ... 15 min.
1 2 2 900 0 MULTIPOLYGON (((-73.96187 40.58632, -73.96289 ... 15 min.
2 3 3 900 0 MULTIPOLYGON (((-73.99082 40.62694, -73.99170 ... 15 min.
3 4 4 900 0 MULTIPOLYGON (((-74.02851 40.64062, -74.02882 ... 15 min.
4 5 5 900 0 MULTIPOLYGON (((-74.00110 40.60186, -74.00249 ... 15 min.
1
2
3
4
Map([
    Layer(isochrones_cdf),
    Layer(stores_cdf)]
)

There they are! To learn more about creating isochrones and isodistances check out the location data services guide.

Enrich your data with demographic data

Now that you have the area of influence calculated for each store, let’s augment the result with population information to help better understand a store’s average revenue per person.

First, let’s find the demographic variable we need. We will use the Catalog class that can be filter by country and category. In our case, we have to look for USA demographics datasets. Let’s check which geographies (spatial resolution) are available.

1
2
3
from cartoframes.data.observatory import Catalog

Catalog().country('usa').category('demographics').geographies
[<Geography('ags_blockgroup_1c63771c')>,
<Geography('ags_q17_4739be4f')>,
<Geography('mbi_blockgroups_1ab060a')>,
<Geography('mbi_counties_141b61cd')>,
<Geography('mbi_county_subd_e8e6ea23')>,
<Geography('mbi_pc_5_digit_4b1682a6')>,
<Geography('usct_blockgroup_f45b6b49')>,
<Geography('usct_cbsa_6c8b51ef')>,
<Geography('usct_censustract_bc698c5a')>,
<Geography('usct_congression_b6336b2c')>,
<Geography('usct_county_ec40c962')>,
<Geography('usct_place_12d6699f')>,
<Geography('usct_puma_b859f0fa')>,
<Geography('usct_schooldistr_515af763')>,
<Geography('usct_schooldistr_da72a4cb')>,
<Geography('usct_schooldistr_287be4f7')>,
<Geography('usct_state_4c8090b5')>,
<Geography('usct_zcta5_75071016')>]

This time, let’s choose the block groups from AGS and check which datasets are available.

1
Catalog().country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
[<Dataset('ags_sociodemogr_e92b1637')>,
<Dataset('ags_consumerspe_fe5d060a')>,
<Dataset('ags_retailpoten_ddf56a1a')>,
<Dataset('ags_consumerpro_e8344e2e')>,
<Dataset('ags_businesscou_a8310a11')>,
<Dataset('ags_crimerisk_9ec89442')>]

Nice! The population variables are inside of the sociodemographic category, let take a look at what options are available and the associated descriptions.

1
2
3
4
5
from cartoframes.data.observatory import Dataset

dataset = Dataset.get('ags_sociodemogr_e92b1637')
variables_df = dataset.variables.to_dataframe()
variables_df[variables_df['description'].str.contains('population', case=False)]
agg_method column_name dataset_id db_type description id name slug starred summary_json variable_group_id
31 SUM AGECYGT85 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 85+ (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECYGT85 AGECYGT85_b9d8a94d False {'head': [1, 0, 0, 2, 2, 0, 0, 0, 0, 0], 'tail... None
32 SUM AGECYGT25 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population Age 25+ (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECYGT25 AGECYGT25_433741c7 False {'head': [6, 3, 0, 18, 41, 0, 0, 0, 0, 0], 'ta... None
33 SUM AGECYGT15 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population Age 15+ (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECYGT15 AGECYGT15_681a1204 False {'head': [6, 3, 0, 20, 959, 0, 0, 0, 0, 0], 't... None
34 SUM AGECY8084 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 80-84 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY8084 AGECY8084_b25d4aed False {'head': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'tail... None
35 SUM AGECY7579 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 75-79 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY7579 AGECY7579_15dcf822 False {'head': [0, 0, 0, 1, 0, 0, 0, 0, 0, 0], 'tail... None
36 SUM AGECY7074 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 70-74 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY7074 AGECY7074_6da64674 False {'head': [0, 0, 0, 1, 0, 0, 0, 0, 0, 0], 'tail... None
37 SUM AGECY6064 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 60-64 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY6064 AGECY6064_cc011050 False {'head': [1, 2, 0, 0, 0, 0, 0, 0, 0, 0], 'tail... None
38 SUM AGECY5559 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 55-59 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY5559 AGECY5559_8de3522b False {'head': [1, 0, 0, 2, 0, 0, 0, 0, 0, 0], 'tail... None
39 SUM AGECY5054 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 50-54 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY5054 AGECY5054_f599ec7d False {'head': [0, 0, 0, 1, 0, 0, 0, 0, 0, 0], 'tail... None
40 SUM AGECY4549 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 45-49 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY4549 AGECY4549_2c44040f False {'head': [1, 0, 0, 3, 3, 0, 0, 0, 0, 0], 'tail... None
41 SUM AGECY4044 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 40-44 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY4044 AGECY4044_543eba59 False {'head': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'tail... None
42 SUM AGECY3034 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 30-34 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY3034 AGECY3034_86a81427 False {'head': [0, 0, 0, 0, 5, 0, 0, 0, 0, 0], 'tail... None
43 SUM AGECY2529 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 25-29 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY2529 AGECY2529_5f75fc55 False {'head': [0, 0, 0, 0, 31, 0, 0, 0, 0, 0], 'tai... None
44 SUM AGECY1519 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 15-19 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY1519 AGECY1519_66ed0078 False {'head': [0, 0, 0, 1, 488, 0, 0, 0, 0, 0], 'ta... None
45 SUM AGECY0509 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 5-9 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY0509 AGECY0509_c74a565c False {'head': [0, 0, 0, 1, 0, 0, 0, 0, 0, 0], 'tail... None
46 SUM AGECY0004 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 0-4 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY0004 AGECY0004_bf30e80a False {'head': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'tail... None
49 SUM AGECY2024 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 20-24 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY2024 AGECY2024_270f4203 False {'head': [0, 0, 0, 1, 430, 0, 0, 0, 0, 0], 'ta... None
50 SUM AGECY1014 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 10-14 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY1014 AGECY1014_1e97be2e False {'head': [0, 2, 0, 1, 0, 0, 0, 0, 0, 0], 'tail... None
51 SUM AGECY3539 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 35-39 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY3539 AGECY3539_fed2aa71 False {'head': [0, 1, 0, 1, 0, 0, 0, 0, 0, 0], 'tail... None
55 SUM POPPY carto-do.ags.demographics_sociodemographic_usa... FLOAT Population (2024A) carto-do.ags.demographics_sociodemographic_usa... POPPY POPPY_946f4ed6 False {'head': [0, 0, 8, 0, 0, 0, 4, 0, 2, 59], 'tai... None
66 SUM SEXCYMAL carto-do.ags.demographics_sociodemographic_usa... INTEGER Population male (2019A) carto-do.ags.demographics_sociodemographic_usa... SEXCYMAL SEXCYMAL_ca14d4b8 False {'head': [1, 2, 0, 13, 374, 0, 0, 0, 0, 0], 't... None
67 SUM SEXCYFEM carto-do.ags.demographics_sociodemographic_usa... INTEGER Population female (2019A) carto-do.ags.demographics_sociodemographic_usa... SEXCYFEM SEXCYFEM_d52acecb False {'head': [5, 3, 0, 9, 585, 0, 0, 0, 0, 0], 'ta... None
75 SUM POPCYGRPI carto-do.ags.demographics_sociodemographic_usa... INTEGER Institutional Group Quarters Population (2019A) carto-do.ags.demographics_sociodemographic_usa... POPCYGRPI POPCYGRPI_147af7a9 False {'head': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'tail... None
76 SUM POPCYGRP carto-do.ags.demographics_sociodemographic_usa... INTEGER Population in Group Quarters (2019A) carto-do.ags.demographics_sociodemographic_usa... POPCYGRP POPCYGRP_74c19673 False {'head': [0, 0, 0, 0, 959, 0, 0, 0, 0, 0], 'ta... None
77 SUM POPCY carto-do.ags.demographics_sociodemographic_usa... INTEGER Population (2019A) carto-do.ags.demographics_sociodemographic_usa... POPCY POPCY_f5800f44 False {'head': [6, 5, 0, 22, 959, 0, 0, 0, 0, 0], 't... None
92 SUM HISCYHISP carto-do.ags.demographics_sociodemographic_usa... INTEGER Population Hispanic (2019A) carto-do.ags.demographics_sociodemographic_usa... HISCYHISP HISCYHISP_f3b3a31e False {'head': [0, 0, 0, 0, 36, 0, 0, 0, 0, 0], 'tai... None
96 SUM LBFCYLBF carto-do.ags.demographics_sociodemographic_usa... INTEGER Population In Labor Force (2019A) carto-do.ags.demographics_sociodemographic_usa... LBFCYLBF LBFCYLBF_59ce7ab0 False {'head': [0, 2, 0, 10, 378, 0, 0, 0, 0, 0], 't... None
99 SUM LBFCYPOP16 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population Age 16+ (2019A) carto-do.ags.demographics_sociodemographic_usa... LBFCYPOP16 LBFCYPOP16_53fa921c False {'head': [6, 3, 0, 20, 959, 0, 0, 0, 0, 0], 't... None
107 SUM AGECY6569 carto-do.ags.demographics_sociodemographic_usa... INTEGER Population age 65-69 (2019A) carto-do.ags.demographics_sociodemographic_usa... AGECY6569 AGECY6569_b47bae06 False {'head': [2, 0, 0, 7, 0, 0, 0, 0, 0, 0], 'tail... None

We can see the variable that contains the population for 2019 is the one with the slug POPCY_f5800f44. Now we are ready to enrich our areas of influence with that variable.

1
2
3
4
5
6
7
from cartoframes.data.observatory import Variable
from cartoframes.data.observatory import Enrichment

variable = Variable.get('POPCY_f5800f44')

isochrones_cdf = Enrichment().enrich_polygons(isochrones_cdf, [variable])
isochrones_cdf.head()
cartodb_id source_id data_range lower_data_range the_geom range_label sum_POPCY
0 1 1 900 0 MULTIPOLYGON (((-73.95934 40.68011, -73.96074 ... 15 min. 1221.455789
1 2 2 900 0 MULTIPOLYGON (((-73.96187 40.58632, -73.96289 ... 15 min. 1534.248133
2 3 3 900 0 MULTIPOLYGON (((-73.99082 40.62694, -73.99170 ... 15 min. 1311.667005
3 4 4 900 0 MULTIPOLYGON (((-74.02851 40.64062, -74.02882 ... 15 min. 1179.999175
4 5 5 900 0 MULTIPOLYGON (((-74.00110 40.60186, -74.00249 ... 15 min. 1334.454790

Great! Let’s see the result on a map:

1
2
3
from cartoframes.viz.helpers import color_continuous_layer

Map(color_continuous_layer(isochrones_cdf, 'sum_POPCY', 'Population'))

We can see that the area of influence of the store on the right, is the one with the highest population. Let’s go a bit further and calculate and visualize the average revenue per person.

1
2
stores_cdf['rev_pop'] = stores_cdf['revenue']/isochrones_cdf['sum_POPCY']
Map(size_continuous_layer(stores_cdf, 'rev_pop', 'Revenue per person ($)'))

As we can see, there are clearly 3 stores that have lower revenue per person. This insight will help us to focus on them in further analyses.

To learn more about discovering the data you want, check out the data discovery guide. To learn more about enriching your data check out the data enrichment guide.

Publish and share your results

The final step in the workflow is to share this interactive map with your colleagues so they can explore the information on their own. Let’s do it!

First, let’s add widgets so people are able to see some graphs of the information and filter it. To do this, we only have to add widget=True to the visualization layers.

1
2
3
4
5
result_map = Map([
    color_continuous_layer(isochrones_cdf, 'sum_POPCY', 'Population', stroke_width=0, opacity=0.7),
    size_continuous_layer(stores_cdf, 'rev_pop', 'Revenue per person ($)', stroke_color='white', widget=True)
])
result_map

Cool! Now that you have a small dashboard to play with, let’s publish it on CARTO so you are able to share it with anyone. To do this, you just need to call the publish method from the Map class:

1
result_map.publish('startbucks_analysis')
{'id': '3c900d1f-d3ef-472f-9bc0-a2005a08df27',
 'url': 'https://cartoframes.carto.com/kuviz/3c900d1f-d3ef-472f-9bc0-a2005a08df27',
 'name': 'startbucks_analysis',
 'privacy': 'public'}

Conclusion

Congratulations! You have finished this guide and have a sense about how CARTOframes can speed up your workflow. To continue learning, you can check the specific guides, check the reference to know everything about a class or a method or check the examples.