A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

What is CARTOframes?

CARTOframes is a Python package for integrating CARTO maps, analysis, and data services into data science workflows.

Python data analysis workflows often rely on the de facto standards pandas and Jupyter notebooks. Integrating CARTO into this workflow saves data scientists time and energy by not having to export datasets as files or retain multiple copies of the data. To understand the fundamentals of CARTOframes, read the guides. To view the source code, browse the open-source repository in Github and contribute. Otherwise, read the full reference API, or find different support options.

Guides

Quick reference guides for learning how to use CARTOframes features.

Reference

Browse the interactive API documentation to search for specific CARTOframes methods, arguments, and sample code that can be used to build your applications.

Check Full Reference API
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Set your credentials.
from cartoframes import Credentials, CartoContext
cc = CartoContext(
    creds=Credentials(key='{your\_api\_key}', username='{your\_user\_name}')
)

# Save/update credentials for later use.
from cartoframes import Credentials, CartoContext
creds = Credentials(username='eschbacher', key='abcdefg')
creds.save()  \# save credentials for later use (not dependent on Python session)

# Write an existing pandas DataFrame to CARTO.
import pandas as pd
import cartoframes
df = pd.read_csv('acadia_biodiversity.csv')
cc = cartoframes.CartoContext(base_url=BASEURL,
                              api_key=APIKEY)
cc.write(df, 'acadia_biodiversity')

Examples

Play with real examples and learn by doing.

Basic cartoframes usage

cartoframes lets you use CARTO in a Python environment so that you can do all of your analysis and mapping in, for example, a Jupyter notebook. cartoframes allows you to use CARTO’s functionality for data analysis, storage, location services like routing and geocoding, and visualization.

You can view this notebook best on nbviewer here: https://nbviewer.jupyter.org/github/CartoDB/cartoframes/blob/master/examples/Basic%20Usage.ipynb It is recommended to download this notebook and use on your computer instead so you can more easily explore the functionality of cartoframes.

To get started, let’s load the required packages, and set credentials.

1
2
3
4
5
6
7
8
9
10
%matplotlib inline
import cartoframes
from cartoframes import Credentials
import pandas as pd

USERNAME = 'eschbacher'  # <-- replace with your username 
APIKEY = 'abcdefg'       # <-- your CARTO API key
creds = Credentials(username=USERNAME, 
                    key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)

cc.read

CartoContext has several methods for interacting with CARTO in a Python environment. CartoContext.read allows you to pull a dataset stored on CARTO into a pandas DataFrame. In the cell below, we use cc.read to get the table brooklyn_poverty from a CARTO account. You can get a CSV of the table here for uploading to your CARTO account:

https://cartoframes.carto.com/api/v2/sql?q=SELECT+*+FROM+brooklyn_poverty&format=csv&filename=brooklyn_poverty

1
2
3
# Get a CARTO table as a pandas DataFrame
df = cc.read('brooklyn_poverty')
df.head()
commuters_16_over_2011_2015 geoid pop_determined_poverty_status_2011_2015 poverty_count poverty_per_pop the_geom the_geom_webmercator total_pop_2011_2015 total_population walked_to_work_2011_2015
cartodb_id
2 4074.192637 360470050003 3304.439797 23.112583 0.031191 0103000020E6100000010000000B0000006D3A02B85982... 0103000020110F0000010000000B000000D240DAA89070... 9624.365242 741 0.005207
585 5434.149852 360470218001 27809.352304 770.733564 0.250000 0103000020E6100000010000000B000000ACE3F8A1D27F... 0103000020110F0000010000000B0000000354CD84456C... 16072.338976 756 0.042990
15 32412.498980 360470514002 39958.419065 574.101597 0.325824 0103000020E610000001000000070000003DB5FAEAAA7D... 0103000020110F00000100000007000000ADF228609C68... 61660.046010 1762 0.008740
16 5135.760974 360470534003 23191.290336 235.858921 0.391142 0103000020E61000000100000008000000EBABAB02B57D... 0103000020110F000001000000080000008ECED184AD68... 14912.553653 603 0.016081
146 486.050087 360470013002 8739.299360 NaN NaN 0103000020E6100000010000001500000005854199467F... 0103000020110F000001000000150000003D6926A8576B... 40739.834591 939 0.037871

Notice that:

  • the index of the DataFrame is the same as the index of the CARTO table (cartodb_id)
  • the_geom column stores the geometry. This can be decoded if we set the decode_geom=True flag in cc.read, which requires the library shapely.
  • We have several numeric columns
  • SQL null values are represented as np.nan

Other things to notice:

1
df.dtypes
1
2
3
4
5
6
7
8
9
10
11
commuters_16_over_2011_2015                float64
geoid                                       object
pop_determined_poverty_status_2011_2015    float64
poverty_count                              float64
poverty_per_pop                            float64
the_geom                                    object
the_geom_webmercator                        object
total_pop_2011_2015                        float64
total_population                             int64
walked_to_work_2011_2015                   float64
dtype: object

The dtype of each column is a mapping of the column type on CARTO. For example, numeric will map to float64, text will map to object (pandas string representation), timestamp will map to datetime64[ns], etc. The reverse happens if a DataFrame is sent to CARTO.

cc.map

Now that we can inspect the data, we can map it to see how the values change over the geography. We can use the cc.map method for this purpose.

cc.map takes a layers argument which specifies the data layers that are to be visualized. They can be imported from cartoframes as below.

There are different types of layers:

  • Layer for visualizing CARTO tables
  • QueryLayer for visualizing arbitrary queries from tables in user’s CARTO account
  • BaseMap for specifying the base map to be used

Each of the layers has different styling options. Layer and QueryLayer take the same styling arguments, and BaseMap can be specified to be light/dark and options on label placement.

Maps can be interactive or not. Set interactivity with the interactive with True or False. If the map is static (not interactive), it will be embedded in the notebook as either a matplotlib axis or IPython.Image. Either way, the image will be transported with the notebook. Interactive maps will be embedded zoom and pan-able maps.

1
2
3
4
5
6
from cartoframes import Layer, styling
l = Layer('brooklyn_poverty',
          color={'column': 'poverty_per_pop',
                 'scheme': styling.sunset(7)})
cc.map(layers=l,
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x113361160>

png

NYC Taxi Dataset

Let’s explore a typical cartoframes workflow using data on NYC taxis.

To get the data into CARTO, we can:

  1. Use pandas to grab the data from the cartoframes example account
  2. Send it to your CARTO account using cc.write, specifying the lng/lat columns you want to use for visualization
  3. Set overwrite=True to replace an existing dataset if it exists
  4. Refresh our df with the CARTO-fied version using cc.read`
1
2
3
4
5
6
7
8
# read in a CSV of NYC taxi data from cartoframes example datasets
df = pd.read_csv('https://cartoframes.carto.com/api/v2/sql?q=SELECT+*+FROM+taxi_50k&format=csv')

# set the index of the dataframe to be the cartodb_id (database index)
df.set_index('cartodb_id', inplace=True)

# show first five rows to see what we've got
df.head()
the_geom the_geom_webmercator vendorid tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude ratecodeid ... dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount
cartodb_id
1 NaN NaN 2 2016-05-01 14:52:11+00 2016-05-01 15:00:36+00 2 2.08 -74.006706 40.730461 1 ... -74.012383 40.706779 1 8.5 0.0 0.5 1.00 0.0 0.3 10.30
2 NaN NaN 1 2016-05-01 08:34:08+00 2016-05-01 08:49:02+00 1 3.00 -73.924957 40.744125 1 ... -73.973824 40.762779 1 13.5 0.0 0.5 2.00 0.0 0.3 16.30
3 NaN NaN 1 2016-05-04 09:44:40+00 2016-05-04 10:07:09+00 1 2.10 -73.973488 40.748501 1 ... -73.998955 40.740833 2 14.5 0.0 0.5 0.00 0.0 0.3 15.30
4 NaN NaN 2 2016-05-01 20:50:11+00 2016-05-01 21:05:24+00 1 4.41 -73.999786 40.743267 1 ... -73.966362 40.792370 2 15.0 0.5 0.5 0.00 0.0 0.3 16.30
5 NaN NaN 2 2016-05-02 07:26:56+00 2016-05-02 07:53:53+00 2 4.01 -73.963631 40.803360 1 ... -73.956963 40.784939 1 19.5 0.0 0.5 4.06 0.0 0.3 24.36

5 rows × 21 columns

1
2
3
4
5
6
7
8
# send it to carto so we can map it
# specify the columns we want to have as a point (pickup location)
cc.write(df, 'taxi_50k',
         lnglat=('pickup_longitude', 'pickup_latitude'),
         overwrite=True)

# read the fresh carto-fied version
df = cc.read('taxi_50k')
1
2
Creating geometry out of columns `pickup_longitude`/`pickup_latitude`
Table successfully written to CARTO: https://eschbacher.carto.com/dataset/taxi_50k

Take a look at the data on a map.

1
2
3
from cartoframes import Layer
cc.map(layers=Layer('taxi_50k'),
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x1133b4780>

png

Oops, there are some zero-valued long/lats in there, so the results are going to null island. Let’s remove them.

1
2
3
4
5
# select only the rows which are not at (0,0)
df = df[(df['pickup_longitude'] != 0) | (df['pickup_latitude'] != 0)]
# send back up to CARTO
cc.write(df, 'taxi_50k', overwrite=True,
         lnglat=('pickup_longitude', 'pickup_latitude'))
1
2
Creating geometry out of columns `pickup_longitude`/`pickup_latitude`
Table successfully written to CARTO: https://eschbacher.carto.com/dataset/taxi_sample
1
2
3
4
5
6
# Let's take a look at what's going on, styled by the fare amount
cc.map(layers=Layer('taxi_sample',
                    size=4,
                    color={'column': 'fare_amount',
                           'scheme': styling.sunset(7)}),
       interactive=True)

<iframe srcdoc=”<!DOCTYPE html>

Carto
zoom=4, lng=No data, lat=No data

” width=800 height=400> Preview image: </iframe>

We can use the zoom=..., lng=..., lat=... information in the embedded interactive map to help us get static snapshots of the regions we’re interested in. For example, JFK airport is around zoom=12, lng=-73.7880, lat=40.6629. We can paste that information as arguments in cc.map to generate a static snapshot of the data there.

1
2
3
4
5
6
7
# Let's take a look at what's going on at JFK airport, styled by the fare amount, and STATIC
cc.map(layers=Layer('taxi_sample',
                    size=4,
                    color={'column': 'fare_amount',
                           'scheme': styling.sunset(7)}),
       zoom=12, lng=-73.7880, lat=40.6629,
       interactive=False)
1
<matplotlib.axes._subplots.AxesSubplot at 0x119c01240>

png

Support

Get help or learn about known issues.