CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

Basic cartoframes usage

cartoframes lets you use CARTO in a Python environment so that you can do all of your analysis and mapping in, for example, a Jupyter notebook. cartoframes allows you to use CARTO’s functionality for data analysis, storage, location services like routing and geocoding, and visualization.

You can view this notebook best on nbviewer here: https://nbviewer.jupyter.org/github/CartoDB/cartoframes/blob/master/examples/Basic%20Usage.ipynb It is recommended to download this notebook and use on your computer instead so you can more easily explore the functionality of cartoframes.

To get started, let’s load the required packages, and set credentials.

1
2
3
4
5
6
7
8
9
10
11
%matplotlib inline
import matplotlib.pyplot as plt
import cartoframes
from cartoframes import Credentials
import pandas as pd

USERNAME = 'eschbacher'  # <-- replace with your username
APIKEY = 'abcdefg'       # <-- your CARTO API key
creds = Credentials(username=USERNAME,
                    key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)

cc.read

CartoContext has several methods for interacting with CARTO in a Python environment. CartoContext.read allows you to pull a dataset stored on CARTO into a pandas DataFrame. In the cell below, we use read_taxi to get the table brooklyn_poverty from a CARTO account.

1
2
from cartoframes.examples import read_brooklyn_poverty
cc.write(read_brooklyn_poverty(), 'brooklyn_poverty_example', overwrite=True)
1
Table successfully written to CARTO: https://eschbacher.carto.com/dataset/brooklyn_poverty_example
1
2
3
# Get a CARTO table as a pandas DataFrame
df = cc.read('brooklyn_poverty_example')
df.head()
commuters_16_over_2011_2015 geoid pop_determined_poverty_status_2011_2015 poverty_count poverty_per_pop the_geom total_pop_2011_2015 total_population walked_to_work_2011_2015_per_pop
cartodb_id
1606 0.0 360470702031 0.0 NaN NaN 0106000020E61000000800000001030000000100000013... 0.0 0 NaN
2052 NaN 360479901000 NaN NaN NaN None NaN 0 NaN
111 0.0 360470666001 0.0 NaN NaN 0106000020E6100000030000000103000000010000006B... 0.0 0 1.553393e-12
116 NaN 360470702030 NaN NaN NaN None NaN 0 NaN
91 15928.0 360470080002 31367.0 225.0 0.17201 0106000020E61000000100000001030000000100000007... 39471.0 1309 1.505213e-02

Notice that:

  • the index of the DataFrame is the same as the index of the CARTO table (cartodb_id)
  • the_geom column stores the geometry. This can be decoded if we set the decode_geom=True flag in cc.read, which requires the library shapely.
  • We have several numeric columns
  • SQL null values are represented as numpy.nan

Other things to notice:

1
df.dtypes
1
2
3
4
5
6
7
8
9
10
commuters_16_over_2011_2015                float64
geoid                                       object
pop_determined_poverty_status_2011_2015    float64
poverty_count                              float64
poverty_per_pop                            float64
the_geom                                    object
total_pop_2011_2015                        float64
total_population                             int64
walked_to_work_2011_2015_per_pop           float64
dtype: object

The dtype of each column is a mapping of the column type on CARTO. For example, numeric will map to float64, text will map to object (pandas string representation), timestamp will map to datetime64[ns], etc. The reverse happens if a DataFrame is sent to CARTO.

cc.map

Now that we can inspect the data, we can map it to see how the values change over the geography. We can use the cc.map method for this purpose.

cc.map takes a layers argument which specifies the data layers that are to be visualized. They can be imported from cartoframes as below.

There are different types of layers:

  • Layer for visualizing CARTO tables
  • QueryLayer for visualizing arbitrary queries from tables in user’s CARTO account
  • BaseMap for specifying the base map to be used

Each of the layers has different styling options. Layer and QueryLayer take the same styling arguments, and BaseMap can be specified to be light/dark and options on label placement.

Maps can be interactive or not. Set interactivity with the interactive with True or False. If the map is static (not interactive), it will be embedded in the notebook as either a matplotlib axis or IPython.Image. Either way, the image will be transported with the notebook. Interactive maps will be embedded zoom and pan-able maps.

1
2
3
4
5
6
from cartoframes import Layer, styling, BaseMap
l = Layer('brooklyn_poverty_example',
          color={'column': 'poverty_per_pop',
                 'scheme': styling.sunset(7)})
cc.map(layers=l,
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x10630a320>

Brooklyn poverty rates

Multiple variables together

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
table = 'brooklyn_poverty_example'
cols = [
    'pop_determined_poverty_status_2011_2015',
    'poverty_per_pop',
    'walked_to_work_2011_2015_per_pop',
    'total_pop_2011_2015'
]

fig, axs = plt.subplots(2, 2, figsize=(12, 12))

for idx, col in enumerate(cols):
    cc.map(layers=[BaseMap('dark'), Layer(table,
                        color={'column': col,
                               'scheme': styling.sunset(7, 'quantiles')})],
           ax=axs[idx // 2][idx % 2],
           zoom=11, lng=-73.9476, lat=40.6437,
           interactive=False,
           size=(432, 432))
    axs[idx // 2][idx % 2].set_title(col)
fig.tight_layout()
plt.show()

Brooklyn demographic data in small multiples

NYC Taxi Dataset

Let’s explore a typical cartoframes workflow using data on NYC taxis.

To get the data into CARTO, we can:

  1. Use pandas to grab the data from the cartoframes example account
  2. Send it to your CARTO account using cc.write, specifying the lng/lat columns you want to use for visualization
  3. Set overwrite=True to replace an existing dataset if it exists
  4. Refresh our df with the CARTO-fied version using cc.read`
1
2
3
4
5
6
# read in a CSV of NYC taxi data from cartoframes example datasets
from cartoframes.examples import read_taxi
df = read_taxi()

# show first five rows to see what we've got
df.head()
dropoff_latitude dropoff_longitude extra fare_amount improvement_surcharge mta_tax passenger_count payment_type pickup_latitude pickup_longitude ratecodeid store_and_fwd_flag the_geom tip_amount tolls_amount total_amount tpep_dropoff_datetime tpep_pickup_datetime trip_distance vendorid
cartodb_id
1 40.706779 -74.012383 0.0 8.5 0.3 0.5 2 1 40.730461 -74.006706 1 False None 1.00 0.0 10.30 2016-05-01 15:00:36 2016-05-01 14:52:11 2.08 2
2 40.762779 -73.973824 0.0 13.5 0.3 0.5 1 1 40.744125 -73.924957 1 False None 2.00 0.0 16.30 2016-05-01 08:49:02 2016-05-01 08:34:08 3.00 1
3 40.740833 -73.998955 0.0 14.5 0.3 0.5 1 2 40.748501 -73.973488 1 False None 0.00 0.0 15.30 2016-05-04 10:07:09 2016-05-04 09:44:40 2.10 1
4 40.792370 -73.966362 0.5 15.0 0.3 0.5 1 2 40.743267 -73.999786 1 False None 0.00 0.0 16.30 2016-05-01 21:05:24 2016-05-01 20:50:11 4.41 2
5 40.784939 -73.956963 0.0 19.5 0.3 0.5 2 1 40.803360 -73.963631 1 False None 4.06 0.0 24.36 2016-05-02 07:53:53 2016-05-02 07:26:56 4.01 2
1
2
3
4
5
6
7
8
# send it to carto so we can map it
# specify the columns we want to have as a point (pickup location)
cc.write(df, 'taxi_50k',
         lnglat=('pickup_longitude', 'pickup_latitude'),
         overwrite=True)

# read the fresh carto-fied version
df = cc.read('taxi_50k')
1
2
3
4
5
Table successfully written to CARTO: https://eschbacher.carto.com/dataset/taxi_50k
`the_geom` column is being populated from `('pickup_longitude', 'pickup_latitude')`. Check the status of the operation with:
    BatchJobStatus(CartoContext(), '787ce294-3fce-47d0-8390-5599c4fc35c8').status()
or try reading the table from CARTO in a couple of minutes.
Note: `CartoContext.map` will not work on this table until its geometries are created.

Take a look at the data on a map.

1
2
3
from cartoframes import Layer
cc.map(layers=Layer('taxi_50k'),
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x107b87be0>

NYC Taxi Data Raw

Oops, there are some zero-valued long/lats in there, so the results are going to null island. Let’s remove them.

1
2
3
4
5
# select only the rows which are not at (0,0)
df = df[(df['pickup_longitude'] != 0) | (df['pickup_latitude'] != 0)]
# send back up to CARTO
cc.write(df, 'taxi_50k', overwrite=True,
         lnglat=('pickup_longitude', 'pickup_latitude'))
1
2
3
4
5
6
7
8
9
10
11
Table successfully written to CARTO: https://eschbacher.carto.com/dataset/taxi_50k
`the_geom` column is being populated from `('pickup_longitude', 'pickup_latitude')`. Check the status of the operation with:
    BatchJobStatus(CartoContext(), '292c4b72-965a-4b17-970c-8527b85eced4').status()
or try reading the table from CARTO in a couple of minutes.
Note: `CartoContext.map` will not work on this table until its geometries are created.





BatchJobStatus(job_id='292c4b72-965a-4b17-970c-8527b85eced4', last_status='pending', created_at='2018-06-26T13:47:06.959Z')

Instead of using pandas, we could have remove those rows on the database by using SQL.

1
2
3
4
cc.query('''
DELETE FROM taxi_50k
WHERE pickup_longitude = 0 and pickup_latitude = 0
''')
1
2
3
4
5
6
# Let's take a look at what's going on, styled by the fare amount
cc.map(layers=Layer('taxi_50k',
                    size=4,
                    color={'column': 'fare_amount',
                           'scheme': styling.sunset(7)}),
       interactive=True)

We can use the zoom=..., lng=..., lat=... information in the embedded interactive map to help us get static snapshots of the regions we’re interested in. For example, JFK airport is around zoom=12, lng=-73.7880, lat=40.6629. We can paste that information as arguments in cc.map to generate a static snapshot of the data there.

1
2
3
4
5
6
7
# Let's take a look at what's going on at JFK airport, styled by the fare amount, and STATIC
cc.map(layers=Layer('taxi_50k',
                    size=6,
                    color={'column': 'fare_amount',
                           'scheme': styling.sunset(7)}),
       zoom=12, lng=-73.7880, lat=40.6629,
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x10e5024a8>

JFK pickups styled by fare amount

Data Observatory in cartoframes

The Data Observatory can be accessed through CARTOframes. This is a basic demonstration how one would pull down new measures for building a feature set for training a model.

1
2
3
4
5
6
7
8
%matplotlib inline
import cartoframes
from cartoframes import QueryLayer, Layer, styling
import pandas as pd

# Enter your username and api key below
cc = cartoframes.CartoContext(base_url='https://{username}.carto.com/'.format(username=''),
                              api_key='')

Getting Mexico City Metro station coordinates

Use pandas to download an Excel spreadsheet into a dataframe.

1
2
3
4
# Metro stations from here:
#  https://github.com/josecarlosgonz/mexicoCityMetro/blob/master/coordsMetro.xlsx
df = pd.read_excel('https://github.com/josecarlosgonz/mexicoCityMetro/blob/master/coordsMetro.xlsx?raw=true')
df.head()
Name latitude longitude Unnamed: 3 linea estacion afluencia latitude.1 longitude.1
0 Pantitlán 19.4163 -99.0747 NaN 1 Pantitlán 4513549.0 19.4163 -99.0747
1 Zaragoza 19.4117 -99.0821 NaN 1 Zaragoza 5144223.0 19.4117 -99.0821
2 Gómez Farías 19.4165 -99.0904 NaN 1 Gómez Farías 3665025.0 19.4165 -99.0904
3 Boulevard Puerto Aéreo 19.4196 -99.0963 NaN 1 Boulevard Puerto Aéreo 3611591.0 19.4196 -99.0963
4 Balbuena 19.4231 -99.1021 NaN 1 Balbuena 1822229.0 19.4231 -99.1021

Send to CARTO, being sure to specify the to-be-normalized column names latitude.1 -> latitude_1, etc.

1
2
orig_table = 'coordsmetro_demo'
cc.write(df, orig_table, lnglat=('longitude_1', 'latitude_1'), overwrite=True)
1
2
3
4
5
6
The following columns were changed in the CARTO copy of this dataframe:
Name -> name
Unnamed: 3 -> unnamed_3
latitude.1 -> latitude_1
longitude.1 -> longitude_1
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/coordsmetro_demo

See the data by linea

Note: notice the basemap labels are default on the bottom.

1
2
3
cc.map(layers=Layer(orig_table,
                    color={'column': 'linea',
                           'scheme': styling.bold(10)}))

See a static version of the map above

1
2
3
4
cc.map(layers=Layer(orig_table,
                    color={'column': 'linea',
                           'scheme': styling.bold(10)}),
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x10d065358>

Mexico City Metro Stations

Data Observatory measures in the Mexico City area

Let’s get education-related Data Observatory measures around the metro stops.

1
2
meta = cc.data_discovery(region=orig_table, keywords='education')
meta.head()
denom_aggregate denom_colname denom_description denom_geomref_colname denom_id denom_name denom_reltype denom_t_description denom_tablename denom_type ... numer_timespan numer_type score score_rank score_rownum suggested_name target_area target_geoms timespan_rank timespan_rownum
0 sum employed_primary_education None cvegeo mx.inegi_columns.ECO10 Employed population with primary education denominator None obs_50197262168407a1409111b87164348a0b01e9e4 Numeric ... 2010 Numeric 36.599093 1 1 female_employed_primary_education_rate_2010 None None 1 1
1 sum female_employed None cvegeo mx.inegi_columns.ECO5 Employed female denominator None obs_50197262168407a1409111b87164348a0b01e9e4 Numeric ... 2010 Numeric 36.599093 1 2 female_employed_primary_education_rate_2010 None None 1 2
2 sum employed_incomplete_secondary_education None cvegeo mx.inegi_columns.ECO13 Employed population with incomplete secondary ... denominator None obs_50197262168407a1409111b87164348a0b01e9e4 Numeric ... 2010 Numeric 36.599093 1 1 female_employed_incomplete_secondary_education... None None 1 1
3 sum female_employed None cvegeo mx.inegi_columns.ECO5 Employed female denominator None obs_50197262168407a1409111b87164348a0b01e9e4 Numeric ... 2010 Numeric 36.599093 1 2 female_employed_incomplete_secondary_education... None None 1 2
4 sum employed_incomplete_secondary_education None cvegeo mx.inegi_columns.ECO13 Employed population with incomplete secondary ... denominator None obs_50197262168407a1409111b87164348a0b01e9e4 Numeric ... 2010 Numeric 36.599093 1 1 male_employed_incomplete_secondary_education_r... None None 1 1

5 rows × 42 columns

1
2
# See how many measures are possible
meta.shape
1
(28, 42)
1
2
# Look at the geometry levels available
meta.groupby('geom_id')['geom_id'].count()
1
2
3
4
geom_id
mx.inegi.ageb          8
mx.inegi.municipio    20
Name: geom_id, dtype: int64

Narrow down the problem to only have municipio-level measures.

1
2
# select only the municipio level data
meta = meta[meta['geom_id'] == 'mx.inegi.municipio']

This takes it down to only 20 measures.

1
meta.shape
1
(20, 42)

Take a look at the measures we have

1
meta['numer_name'].values
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
array(['Employed female population with primary education',
       'Employed female population with primary education',
       'Employed female population with incomplete secondary education',
       'Employed female population with incomplete secondary education',
       'Employed male population with incomplete secondary education',
       'Employed male population with incomplete secondary education',
       'Employed female population who completed basic education',
       'Employed female population who completed basic education',
       'Employed male population who completed basic education',
       'Employed male population who completed basic education',
       'Female population 15 or more years old who did not complete basic education',
       'Female population 15 or more years old who did not complete basic education',
       'Male population 15 or more years old who completed basic education',
       'Male population 15 or more years old who completed basic education',
       'Employed male population with primary education',
       'Employed male population with primary education',
       'Male population 15 or more years old who did not complete basic education',
       'Male population 15 or more years old who did not complete basic education',
       'Female population 15 or more years old who completed basic education',
       'Female population 15 or more years old who completed basic education'], dtype=object)
1
2
3
# Get some data
data = cc.data(orig_table, meta.loc[0:4])
data.head()
afluencia estacion female_employed_incomplete_secondary_education_rate_2010 female_employed_primary_education_rate_2010 latitude latitude_1 linea longitude longitude_1 male_employed_incomplete_secondary_education_rate_2010 name the_geom unnamed_3
cartodb_id
1 4513549 Pantitlán 0.029624 0.135172 19.4163 19.4163 1 -99.0747 -99.0747 0.668229 Pantitlán 0101000020E6100000B84082E2C7C458C0265305A3926A... None
2 5144223 Zaragoza 0.029624 0.135172 19.4117 19.4117 1 -99.0821 -99.0821 0.668229 Zaragoza 0101000020E61000001AC05B2041C558C061C3D32B6569... None
3 3665025 Gómez Farías 0.029624 0.135172 19.4165 19.4165 1 -99.0904 -99.0904 0.668229 Gómez Farías 0101000020E6100000BDE3141DC9C558C0B4C876BE9F6A... None
4 3611591 Boulevard Puerto Aéreo 0.029624 0.135172 19.4196 19.4196 1 -99.0963 -99.0963 0.668229 Boulevard Puerto Aéreo 0101000020E6100000B5A679C729C658C0CF66D5E76A6B... None
5 1822229 Balbuena 0.029624 0.135172 19.4231 19.4231 1 -99.1021 -99.1021 0.668229 Balbuena 0101000020E6100000FB3A70CE88C658C007F01648506C... None
1
2
new_table = 'mexico_metro_augmented'
cc.write(data, new_table, overwrite=True)
1
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/mexico_metro_augmented
1
2
3
4
from cartoframes import Layer, BaseMap, styling
cc.map(layers=[BaseMap('dark'), Layer(new_table,
                    color={'column': 'female_employed_incomplete_secondary_education_rate_2010',
                           'scheme': styling.sunset(7)})])

Shapely, Geopandas, and CARTOframes

This notebook demonstrates how to use cartoframes with GeoPandas, a popular Python package for working with geospatial data in a local environment. It’s built on some of the same robust libraries as PostGIS which underlies CARTO’s spatial analysis, so compliments CARTO extremely well.

Get started by creating a CartoContext that allows you to interact with CARTO in the notebook enviroment.

1
2
3
4
%matplotlib inline
from cartoframes import CartoContext, Credentials
import geopandas as gpd
import pandas as pd
1
2
3
4
5
# set carto credentials
creds = Credentials(key='abcdefg',
                    username='cartoframes')
creds.save()
cxn = CartoContext()

Reading and writing data from/to CARTO

To get started, let’s use the nat dataset which contains county-level criminology data for the United States from 1960 onwards.

1
2
3
4
5
6
7
8
# load data into your account
from cartoframes.examples import read_nat

# write `nat` to your CARTO account
cxn.write(read_nat(), 'nat')

# download it and decode geometries
df = cxn.read('nat', decode_geom=True)
1
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/nat
1
df.head()
blk60 blk70 blk80 blk90 cnty_fips cofips dnl60 dnl70 dnl80 dnl90 ... state_fips state_name stfips the_geom the_geom_webmercator ue60 ue70 ue80 ue90 geometry
cartodb_id
2367 17.515639 17.566758 13.96325 13.449123 449 449 3.692759 3.687802 3.951113 4.068565 ... 48 Texas 48 0106000020E6100000010000000103000000010000000F... 0106000020110F0000010000000103000000010000000F... 4.2 2.7 5.027695 5.863649 (POLYGON ((-94.81377410888672 32.9882507324218...
2148 0.000000 0.007520 0.01176 0.000000 113 113 2.634145 2.739525 2.983983 3.004824 ... 5 Arkansas 5 0106000020E6100000010000000103000000010000000F... 0106000020110F0000010000000103000000010000000F... 7.2 5.4 5.717978 5.524099 (POLYGON ((-93.92706298828125 34.3524627685546...
16 0.000000 0.000000 0.00000 0.443038 75 75 1.668175 1.463381 1.417714 1.284332 ... 38 North Dakota 38 0106000020E6100000010000000103000000010000000C... 0106000020110F0000010000000103000000010000000C... 4.1 6.0 3.935185 6.328182 (POLYGON ((-101.0608749389648 48.4602966308593...
258 0.000000 0.000000 0.00000 0.000000 49 49 1.484931 1.363188 1.198477 1.009217 ... 46 South Dakota 46 0106000020E61000000100000001030000000100000006... 0106000020110F00000100000001030000000100000006... 2.8 0.3 0.539291 1.962388 (POLYGON ((-98.72162628173828 44.8916778564453...
27 0.000000 0.000000 0.00000 0.000000 19 19 0.956364 0.759179 0.686538 0.463073 ... 30 Montana 30 0106000020E6100000010000000103000000010000000E... 0106000020110F0000010000000103000000010000000E... 3.7 0.4 2.112676 1.428571 (POLYGON ((-105.8138580322266 48.5703468322753...

5 rows × 72 columns

By default, CARTO uses Well-known-binary (WKB) serialization for geometries that come out of PostGIS.

1
df.head(2)[['fipsno', 'the_geom']]
fipsno the_geom
cartodb_id
364 27101.0 0106000020E61000000100000001030000000100000007...
1660 20019.0 0106000020E61000000100000001030000000100000006...

These strings can be deserialized into shapely objects that work with GeoPandas.

1
df.head(2)[['fipsno', 'geometry']]
fipsno geometry
cartodb_id
2367 48449 (POLYGON ((-94.81377410888672 32.9882507324218...
2148 5113 (POLYGON ((-93.92706298828125 34.3524627685546...

This allows you to do GIS operations locally in Geopandas. To send a DataFrame with shapely geometries into a Geopandas DataFrame, you only need to call the constructor directly on the DataFrame:

1
2
gdf = gpd.GeoDataFrame(df)
gdf.plot('hr90', linewidth=0.1) # to prove we're in geopandas
1
<matplotlib.axes._subplots.AxesSubplot at 0x10f47a240>

Homicide rates in 1990, GeoPandas plot

The nice thing with having the code to serialize/deserialize shapely objects is that you can publish directly to CARTO (and make CARTO maps) directly from (geo)pandas:

1
2
3
4
from cartoframes import Layer, styling
cxn.map(layers=Layer('nat', color={'column': 'hr90',
                                   'scheme': styling.sunset(7)}),
        interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x111789358>

Homicide rates in 1990, CARTOframes plot

You can also create interactive versions of the above map by setting interactive=True.

Note: If viewing this notebook on GitHub, the interactive map will not display. Checkout this same notebook rendered on nbviewer instead.

1
2
3
4
from cartoframes import Layer, styling
cxn.map(layers=Layer('nat', color={'column': 'hr90',
                                   'scheme': styling.sunset(7)}),
        interactive=True)

GeoPandas DataFrames can be written to CARTO just like pandas DataFrames.

1
2
3
4
cxn.write(gdf,
          encode_geom=True,
          table_name='cartoframes_geopandas',
          overwrite=True)
1
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/cartoframes_geopandas

If you change the geometries locally, the changes propagate back to CARTO:

1
2
gdf['geometry'] = gdf.geometry.apply(lambda x: x.buffer(2))
df['geometry'] = df.geometry.apply(lambda x: x.buffer(2))
1
2
3
cxn.write(gdf, encode_geom=True,
          table_name='cartoframes_geopandas_buffered',
          overwrite=True)
1
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/cartoframes_geopandas_buffered
1
gdf.head()
blk60 blk70 blk80 blk90 cnty_fips cofips dnl60 dnl70 dnl80 dnl90 ... state_fips state_name stfips the_geom the_geom_webmercator ue60 ue70 ue80 ue90 geometry
cartodb_id
2367 17.515639 17.566758 13.96325 13.449123 449 449 3.692759 3.687802 3.951113 4.068565 ... 48 Texas 48 0103000000010000004c000000110ff66c84d157c0f09c... 0106000020110F0000010000000103000000010000000F... 4.2 2.7 5.027695 5.863649 POLYGON ((-95.27370761899171 31.04432885782165...
2148 0.000000 0.007520 0.01176 0.000000 113 113 2.634145 2.739525 2.983983 3.004824 ... 5 Arkansas 5 0103000000010000004c000000eb5432f7cb7c57c063bb... 0106000020110F0000010000000103000000010000000F... 7.2 5.4 5.717978 5.524099 POLYGON ((-93.94994907298285 36.6935937156584,...
16 0.000000 0.000000 0.00000 0.443038 75 75 1.668175 1.463381 1.417714 1.284332 ... 38 North Dakota 38 0103000000010000004c000000a1c34ac301e759c0860d... 0106000020110F0000010000000103000000010000000C... 4.1 6.0 3.935185 6.328182 POLYGON ((-103.6094825964087 47.5185228488654,...
258 0.000000 0.000000 0.00000 0.000000 49 49 1.484931 1.363188 1.198477 1.009217 ... 46 South Dakota 46 01030000000100000047000000726e9852332e58c02e30... 0106000020110F00000100000001030000000100000006... 2.8 0.3 0.539291 1.962388 POLYGON ((-96.72188248525507 44.92368954847949...
27 0.000000 0.000000 0.00000 0.000000 19 19 0.956364 0.759179 0.686538 0.463073 ... 30 Montana 30 0103000000010000004e000000d07d8644a7ff5ac00553... 0106000020110F0000010000000103000000010000000E... 3.7 0.4 2.112676 1.428571 POLYGON ((-107.9945842088121 49.70426177527956...

5 rows × 72 columns

1
2
3
4
5
from cartoframes import BaseMap, Layer
cxn.map(layers=[BaseMap('light'),
                Layer('cartoframes_geopandas_buffered',
                         color='gi69')],
        interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x1130fac88>

Buffered points, using CARTOframes

Example Datasets

cartoframes’ example datasets functionality allows users to learn cartoframes features 1) without needing to authenticate against their account, or 2) using pre-packaged data to follow along in notebooks

1
2
from cartoframes.examples import example_context
from cartoframes import Layer, QueryLayer

List available tables

1
[table.name for table in example_context.tables()]
1
['mcdonalds_nyc', 'nyc_census_tracts', 'brooklyn_poverty', 'taxi_50k', 'nat']

Brooklyn Poverty Data

1
2
3
from cartoframes.examples import read_brooklyn_poverty
df = read_brooklyn_poverty()
df.head()
commuters_16_over_2011_2015 geoid pop_determined_poverty_status_2011_2015 poverty_count poverty_per_pop the_geom total_pop_2011_2015 total_population walked_to_work_2011_2015_per_pop
cartodb_id
2052 NaN 360479901000 NaN NaN NaN None NaN 0 NaN
1606 0.0 360470702031 0.0 NaN NaN 0106000020E61000000800000001030000000100000013... 0.0 0 NaN
1572 NaN 360470666000 NaN NaN NaN None NaN 0 NaN
17 5058.0 360470534004 23191.0 377.0 0.406394 0106000020E6100000010000000103000000010000000B... 21451.0 928 0.018761
53 4230.0 360470593001 10804.0 117.0 0.098598 0106000020E6100000010000000103000000010000000E... 8116.0 1185 0.031915

Sample map

Note: The cell below contains a dynamic map that won’t render in static renders like GitHub, but will render on nbviewer. If you’re viewing the map on a running notebook server, re-run the cell (after running the top cell first) to ensure it renders as an interactive map.

1
2
# Interactive map
example_context.map(Layer('brooklyn_poverty', color='poverty_per_pop'))
1
2
3
4
# Interactive map
example_context.map(
    Layer('brooklyn_poverty', color='poverty_per_pop'),
    interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x10c464748>

Brooklyn poverty rates at the census tract level

Taxi Data

1
2
3
from cartoframes.examples import read_taxi
df = read_taxi()
df.head()
dropoff_latitude dropoff_longitude extra fare_amount improvement_surcharge mta_tax passenger_count payment_type pickup_latitude pickup_longitude ratecodeid store_and_fwd_flag the_geom tip_amount tolls_amount total_amount tpep_dropoff_datetime tpep_pickup_datetime trip_distance vendorid
cartodb_id
1 40.706779 -74.012383 0.0 8.5 0.3 0.5 2 1 40.730461 -74.006706 1 False None 1.00 0.0 10.30 2016-05-01 15:00:36 2016-05-01 14:52:11 2.08 2
2 40.762779 -73.973824 0.0 13.5 0.3 0.5 1 1 40.744125 -73.924957 1 False None 2.00 0.0 16.30 2016-05-01 08:49:02 2016-05-01 08:34:08 3.00 1
3 40.740833 -73.998955 0.0 14.5 0.3 0.5 1 2 40.748501 -73.973488 1 False None 0.00 0.0 15.30 2016-05-04 10:07:09 2016-05-04 09:44:40 2.10 1
4 40.792370 -73.966362 0.5 15.0 0.3 0.5 1 2 40.743267 -73.999786 1 False None 0.00 0.0 16.30 2016-05-01 21:05:24 2016-05-01 20:50:11 4.41 2
5 40.784939 -73.956963 0.0 19.5 0.3 0.5 2 1 40.803360 -73.963631 1 False None 4.06 0.0 24.36 2016-05-02 07:53:53 2016-05-02 07:26:56 4.01 2

To visualize this data, we need to add a column called the_geom to visualize. Using example_context we can call the query method to get the data or QueryLayer to visualize on a map.

If we try to map it, we will get an error because this dataset doesn’t have explicit geometries.

1
example_context.map(Layer('taxi_50k'))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-7-dbf78c20e820> in <module>()
----> 1 example_context.map(Layer('taxi_50k'))


~/git/CartoDB/cartoframes/cartoframes/context.py in map(self, layers, interactive, zoom, lat, lng, size, ax)
    913                     for stylecol, coltype in utils.dict_items(resp['fields']):
    914                         layer.style_cols[stylecol] = coltype['type']
--> 915                 layer.geom_type = self._geom_type(layer)
    916                 if not base_layers:
    917                     geoms.add(layer.geom_type)


~/git/CartoDB/cartoframes/cartoframes/context.py in _geom_type(self, source)
   1103                      common_geom=resp['rows'][0]['geom_type']))
   1104         elif resp['total_rows'] == 0:
-> 1105             raise ValueError('No geometry for layer. Check all layer tables '
   1106                              'and queries to ensure there are geometries.')
   1107         return resp['rows'][0]['geom_type']


ValueError: No geometry for layer. Check all layer tables and queries to ensure there are geometries.

Creating a geometry

There are many ways to create geometries from the lng/lat pairs in the taxi dataset. Here we will do a “crow fly” distance (converted from meters to miles) between pickups and drop-offs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
q = '''
SELECT
  *,
  ST_Transform(the_geom, 3857) as the_geom_webmercator,
  ST_Length(the_geom::geography) / 1609 as crow_dist
FROM (
    SELECT
      ST_MakeLine(CDB_LatLng(pickup_latitude, pickup_longitude), CDB_LatLng(dropoff_latitude, dropoff_longitude)) as the_geom,
      cartodb_id,
      fare_amount,
      trip_distance
    FROM taxi_50k
    WHERE pickup_latitude <> 0 and dropoff_latitude <> 0
) as _w
'''
example_context.map(
    QueryLayer(q, color='fare_amount', opacity=0.05),
    zoom=12, lng=-73.9503, lat=40.7504
)
1
2
3
4
5
6
# static view
example_context.map(
    QueryLayer(q, color='fare_amount', opacity=0.05),
    zoom=12, lng=-73.9503, lat=40.7504,
    interactive=False
)
1
<matplotlib.axes._subplots.AxesSubplot at 0x10c40a2b0>

Taxi trips colored by fare amount

We can use that same query to get the line geometries, and compare the crow fly distance with the reported distance.

1
2
taxi_lines = example_context.query(q)
taxi_lines.head()
crow_dist fare_amount the_geom trip_distance
cartodb_id
1 1.661395 8.5 0102000020E610000002000000020000E06D8052C00400... 2.08
2 2.869732 13.5 0102000020E610000002000000FEFFFF7F327B52C0FBFF... 3.00
3 1.437734 14.5 0102000020E610000002000000000000A04D7E52C0FFFF... 2.10
4 3.815911 15.0 0102000020E610000002000000FEFFFF7FFC7F52C00400... 4.41
5 1.318624 19.5 0102000020E610000002000000FEFFFF1FAC7D52C00500... 4.01

Nat dataset

1
2
from cartoframes import styling
example_context.map(Layer('nat', color={'column': 'hr90', 'scheme': styling.sunset(7)}))

CARTOframes with Dask

This notebook recreate the workflow from https://jakevdp.github.io/blog/2015/08/14/out-of-core-dataframes-in-python/, where the author explores dask for splitting up the computations across multiple cores in a machine to complete tasks more quickly.

Basics

You’ll need the following for this:

  1. Your CARTO username
  2. Your API key

Paste these values in the quotes ('') below.

1
2
3
4
5
6
7
8
9
%matplotlib inline
import pandas as pd
import cartoframes

username = ''   # <-- insert your username here
api_key = ''    # <-- insert your API key here

cc = cartoframes.CartoContext('https://{}.carto.com/'.format(username),
                              api_key)
1
2
3
4
from dask import dataframe as dd
import pandas as pd
columns = ["name", "amenity", "Longitude", "Latitude"]
data = dd.read_csv('scratch/POIWorld.csv', usecols=columns)
1
2
3
4
5
6
7
8
with_name = data[data.name.notnull()]
with_amenity = data[data.amenity.notnull()]

is_starbucks = with_name.name.str.contains('[Ss]tarbucks')
is_dunkin = with_name.name.str.contains('[Dd]unkin')

starbucks = with_name[is_starbucks].compute()
dunkin = with_name[is_dunkin].compute()
1
2
3
4
starbucks['type'] = 'starbucks'
dunkin['type'] = 'dunkin'
coffee_places = pd.concat([starbucks, dunkin])
coffee_places.head(20)
name amenity Longitude Latitude type
6696 Starbucks cafe 121.035006 14.547281 starbucks
8322 Starbucks Coffee cafe 120.978371 14.578929 starbucks
9505 星巴克咖啡 Starbucks cafe 120.135887 30.271254 starbucks
12417 星巴克咖啡 Starbucks cafe 120.158464 30.253061 starbucks
12418 星巴克咖啡 Starbucks cafe 120.157341 30.256555 starbucks
28564 Starbucks cafe -76.934625 40.238707 starbucks
33017 Starbucks cafe -1.545032 53.797463 starbucks
36552 Starbucks Coffee cafe -79.390940 43.649976 starbucks
37719 Starbucks Coffee cafe -79.389462 43.645361 starbucks
40437 Starbucks Coffee cafe -79.393634 43.670435 starbucks
40439 Starbucks Coffee cafe -79.390089 43.670596 starbucks
41495 Starbucks Coffee cafe 13.388738 52.519403 starbucks
41500 Starbucks Coffee cafe 13.379787 52.516727 starbucks
41869 Starbucks Coffee cafe 13.373732 52.507931 starbucks
42941 Starbucks Coffee cafe 13.376430 52.510587 starbucks
43173 Starbucks cafe 32.878538 39.895229 starbucks
43229 Starbucks cafe 16.370443 48.203665 starbucks
45342 Starbucks cafe 13.390018 52.511072 starbucks
50742 Starbucks Coffee cafe -79.385434 43.655530 starbucks
53643 Starbucks Coffee cafe -0.135514 51.498925 starbucks

Write DataFrame to CARTO

1
2
3
4
5
# specify columns for lng/lat so carto will create a geometry
cc.write(coffee_places,
         table_name='coffee_places',
         lnglat=('longitude', 'latitude'),
         overwrite=True)
1
2
3
4
The following columns were changed in the CARTO copy of this dataframe:
Longitude -> longitude
Latitude -> latitude
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/coffee_places

Let’s visualize this DataFrame

Category map on Dunkin’ Donuts vs. Starbucks (aka, color by ‘type’)

1
2
from cartoframes import Layer
cc.map(layers=Layer('coffee_places', color='type', size=3))
1
<matplotlib.axes._subplots.AxesSubplot at 0x160fe00f0>

Coffee places

Fast Food

1
2
3
is_fastfood = with_amenity.amenity.str.contains('fast_food')
fastfood = with_amenity[is_fastfood]
fastfood.name.value_counts().head(12)
1
2
3
4
5
6
7
8
9
10
11
12
13
McDonald's        8697
Subway            7058
Burger King       3226
KFC               2881
Wendy's           1304
Taco Bell         1282
Pizza Hut         1014
マクドナルド             927
Dairy Queen        745
Domino's Pizza     724
McDonalds          619
Arby's             609
Name: name, dtype: int64
1
2
3
4
ff = fastfood.compute()
cc.write(ff,
         table_name='fastfood_dask',
         lnglat=('longitude', 'latitude'), overwrite=True)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The following columns were changed in the CARTO copy of this dataframe:
Longitude -> longitude
Latitude -> latitude
Table successfully written to CARTO: https://cartoframes.carto.com/dataset/fastfood_dask
`the_geom` column is being populated from `('longitude', 'latitude')`. Check the status of the operation with:
    BatchJobStatus(CartoContext(), 'd2bc4e92-b1d5-4c7d-a8d7-d5a47eddea0f').status()
or try reading the table from CARTO in a couple of minutes.
Note: `CartoContext.map` will not work on this table until its geometries are created.





BatchJobStatus(job_id='d2bc4e92-b1d5-4c7d-a8d7-d5a47eddea0f', last_status='pending', created_at='2017-12-05T14:25:06.748Z')

Number of Fast Food places in this OSM dump

1
len(ff)
1
152214

OSM Fast Food POIs

1
2
from cartoframes import BaseMap
cc.map(layers=Layer('fastfood_dask', size=2))

Adding measures from the Data Observatory

We can augment our datasets to find out some demographics at the areas of each of the coffee places if you wanted to add features for building a model.

1
2
3
4
5
6
7
8
9
# DO measures: Total Population,
#              Children under 18 years of age
#              Median income

data_obs_measures = [{'numer_id': 'us.census.acs.B01003001'},
                     {'numer_id': 'us.census.acs.B17001001'},
                     {'numer_id': 'us.census.acs.B19013001'}]
coffee_augmented = cc.data('coffee_places', data_obs_measures)
coffee_augmented.head()
amenity index latitude longitude median_income_2010_2014 median_income_2011_2015 name pop_determined_poverty_status_per_sq_km_2010_2014 pop_determined_poverty_status_per_sq_km_2011_2015 the_geom total_pop_per_sq_km_2010_2014 total_pop_per_sq_km_2011_2015 type
cartodb_id
6 cafe 28564 40.238707 -76.934625 62842.0 63415.0 Starbucks 881.868363 873.056479 0101000020E61000007D2079E7D03B53C005EBEEF08D1E... 996.366180 991.577729 starbucks
26 cafe 65532 35.079845 -106.606866 34140.0 32838.0 Starbucks 869.520635 871.745312 0101000020E61000001CA43BE3D6A65AC0A11EEC5E388A... 929.112813 932.249243 starbucks
42 cafe 93071 44.039946 -88.541166 39996.0 40665.0 Starbucks 687.264493 686.831843 0101000020E61000004E0E9F74A22256C092CD55F31C05... 854.267385 840.552380 starbucks
61 cafe 143097 37.367685 -122.036460 122354.0 125488.0 Starbucks 3211.407801 3278.440419 0101000020E6100000CFC7105B55825EC0D45D7E4C10AF... 3238.971520 3308.291340 starbucks
69 cafe 169680 39.953015 -75.192289 18933.0 20483.0 Starbucks 4855.675567 4977.608287 0101000020E610000067B96C744ECC52C0910E0F61FCF9... 6558.800331 6725.720324 starbucks