CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

This version includes breaking changes, check the CHANGELOG for more information

What is CARTOframes?

CARTOframes is a Python package for integrating CARTO maps, analysis, and data services into data science workflows.

Python data analysis workflows often rely on the pandas and jupyter notebook de facto standards. Integrating CARTO into this workflow saves data scientists time and energy by not having to export datasets as files or retain multiple copies of the data. To understand the fundamentals of CARTOframes, read the guides. To view the source code, browse the open-source repository in GitHub and contribute. Otherwise, read the full reference API, or find different support options.

Guides

Quick reference guides for learning how to use CARTOframes features.

Reference

Browse the interactive API documentation to search for specific CARTOframes methods, arguments, and sample code that can be used to build your applications.

Check Full Reference API
1
2
3
4
5
6
7
8
9
10
from cartoframes.auth import Credentials, set_default_credentials
from cartoframes.data.observatory import Enrichment

set_default_credentials('creds.json')
enrichment = Enrichment()

enriched_dataset_cdf = enrichment.enrich_points(
    bikeshare_df,
    variables=['no_cars_d19dfd10']
)

Examples

Play with real examples and learn by doing.

Create a CartoDataFrame from a CSV file

To create a CartoDataFrame from a csv, we need to use the pandas library

In [1]:
from pandas import read_csv
from cartoframes import CartoDataFrame

remote_file_path = 'http://data.sfgov.org/resource/wg3w-h783.csv'

df = read_csv(remote_file_path)

# Clean latitude and longitude values that are NaN
df = df[df.longitude == df.longitude]
df = df[df.latitude == df.latitude]

cdf = CartoDataFrame(df)

# Set a geometry column from the coordinates
cdf.set_geometry_from_xy('longitude', 'latitude', inplace=True)

cdf.head()
Out[1]:
incident_datetime incident_date incident_time incident_year incident_day_of_week report_datetime row_id incident_id incident_number cad_number ... :@computed_region_6qbp_sg9q :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md :@computed_region_nqbw_i6c3 :@computed_region_2dwj_jsy4 :@computed_region_h4ep_8xdi :@computed_region_y6ts_4iup :@computed_region_jg9y_a9du geometry
2 2019-10-04T14:25:00.000 2019-10-04T00:00:00.000 14:25 2019 Friday 2019-10-04T16:13:00.000 85442603474 854426 190746203 192772728.0 ... 8.0 8.0 4.0 29.0 NaN NaN NaN NaN NaN POINT (-122.51129 37.77508)
3 2019-10-03T19:30:00.000 2019-10-03T00:00:00.000 19:30 2019 Thursday 2019-10-03T23:25:00.000 85419706244 854197 190744514 192764437.0 ... 28.0 3.0 5.0 5.0 5.0 NaN NaN NaN NaN POINT (-122.42746 37.76877)
4 2019-10-04T16:53:00.000 2019-10-04T00:00:00.000 16:53 2019 Friday 2019-10-04T16:53:00.000 85446351040 854463 190746532 192772932.0 ... 6.0 8.0 4.0 29.0 NaN NaN NaN NaN NaN POINT (-122.50309 37.78118)
5 2019-10-02T14:10:00.000 2019-10-02T00:00:00.000 14:10 2019 Wednesday 2019-10-02T22:59:00.000 85425706224 854257 196208142 NaN ... 99.0 6.0 3.0 23.0 NaN NaN NaN NaN NaN POINT (-122.41050 37.80696)
6 2018-10-05T16:15:00.000 2018-10-05T00:00:00.000 16:15 2018 Friday 2018-10-05T16:15:00.000 72248630200 722486 180755377 182782753.0 ... 112.0 7.0 5.0 3.0 NaN 31.0 NaN NaN NaN POINT (-122.44281 37.76601)

5 rows × 36 columns

In [2]:
cdf.viz()
Out[2]:

Create a Dataset from a JSON file

This example illustrates how to create a CartoDataFrame from a remote JSON file using pandas showing the process of data cleaning.

In [1]:
import requests

# Download the JSON file
remote_file_path = 'http://opendata.paris.fr/api/records/1.0/search/?dataset=arbresremarquablesparis&rows=200'
data_json = requests.get(remote_file_path).json()['records']
data_json[0].keys()
Out[1]:
dict_keys(['datasetid', 'recordid', 'fields', 'geometry', 'record_timestamp'])
In [2]:
from pandas.io.json import json_normalize

# Normalize the data
df = json_normalize(data_json)
df.head()
Out[2]:
datasetid recordid record_timestamp fields.geom_x_y fields.libellefrancais fields.objectid fields.idemplacement fields.arrondissement fields.circonferenceencm fields.hauteurenm ... fields.stadedeveloppement fields.remarquable fields.idbase fields.genre fields.complementadresse fields.typeemplacement fields.dateplantation geometry.type geometry.coordinates fields.varieteoucultivar
0 arbresremarquablesparis 1e31d3f0c53902b8852d1406cbb485ab5a591688 2019-11-29T11:01:13.717000+00:00 [48.8460598206, 2.25295516084] Ailante 11972 00060035 PARIS 16E ARRDT 495.0 22.0 ... M 1 112968.0 Ailanthus 16-47 Arbre 1700-01-01T00:09:21+00:00 Point [2.25295516084, 48.8460598206] NaN
1 arbresremarquablesparis 790790e4517a8ff950ea800c14b9b41a619f39f7 2019-11-29T11:01:13.717000+00:00 [48.8462426895, 2.25137540594] Erable 15499 00040005 PARIS 16E ARRDT 210.0 16.0 ... M 1 123330.0 Acer NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.25137540594, 48.8462426895] NaN
2 arbresremarquablesparis 6f417a150db8f783294fea063fe39bb8e324e68a 2019-11-29T11:01:13.717000+00:00 [48.8460398085, 2.25406628275] Micocoulier 24985 00060082 PARIS 16E ARRDT 172.0 9.0 ... A 1 114452.0 Celtis NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.25406628275, 48.8460398085] NaN
3 arbresremarquablesparis 3a381464d5fac7746b2d3fe1ed63d3614a236099 2019-11-29T11:01:13.717000+00:00 [48.8709369341, 2.24803445349] If 172685 000701001 BOIS DE BOULOGNE 246.0 15.0 ... A 1 2002352.0 Taxus 16-21 Arbre 1772-01-01T00:09:21+00:00 Point [2.24803445349, 48.8709369341] NaN
4 arbresremarquablesparis 6f5f941f740f94230b51dec33c63f7cd155e0feb 2019-11-29T11:01:13.717000+00:00 [48.876921106, 2.34671864206] Platane 166794 000108001 PARIS 9E ARRDT 465.0 25.0 ... M 1 317250.0 Platanus NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.34671864206, 48.876921106] NaN

5 rows × 24 columns

In [3]:
# Add Latitude and Longitude columns
df['lng'] = df.apply(lambda row: row['geometry.coordinates'][0], axis=1)
df['lat'] = df.apply(lambda row: row['geometry.coordinates'][1], axis=1)
df.head()
Out[3]:
datasetid recordid record_timestamp fields.geom_x_y fields.libellefrancais fields.objectid fields.idemplacement fields.arrondissement fields.circonferenceencm fields.hauteurenm ... fields.idbase fields.genre fields.complementadresse fields.typeemplacement fields.dateplantation geometry.type geometry.coordinates fields.varieteoucultivar lng lat
0 arbresremarquablesparis 1e31d3f0c53902b8852d1406cbb485ab5a591688 2019-11-29T11:01:13.717000+00:00 [48.8460598206, 2.25295516084] Ailante 11972 00060035 PARIS 16E ARRDT 495.0 22.0 ... 112968.0 Ailanthus 16-47 Arbre 1700-01-01T00:09:21+00:00 Point [2.25295516084, 48.8460598206] NaN 2.252955 48.846060
1 arbresremarquablesparis 790790e4517a8ff950ea800c14b9b41a619f39f7 2019-11-29T11:01:13.717000+00:00 [48.8462426895, 2.25137540594] Erable 15499 00040005 PARIS 16E ARRDT 210.0 16.0 ... 123330.0 Acer NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.25137540594, 48.8462426895] NaN 2.251375 48.846243
2 arbresremarquablesparis 6f417a150db8f783294fea063fe39bb8e324e68a 2019-11-29T11:01:13.717000+00:00 [48.8460398085, 2.25406628275] Micocoulier 24985 00060082 PARIS 16E ARRDT 172.0 9.0 ... 114452.0 Celtis NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.25406628275, 48.8460398085] NaN 2.254066 48.846040
3 arbresremarquablesparis 3a381464d5fac7746b2d3fe1ed63d3614a236099 2019-11-29T11:01:13.717000+00:00 [48.8709369341, 2.24803445349] If 172685 000701001 BOIS DE BOULOGNE 246.0 15.0 ... 2002352.0 Taxus 16-21 Arbre 1772-01-01T00:09:21+00:00 Point [2.24803445349, 48.8709369341] NaN 2.248034 48.870937
4 arbresremarquablesparis 6f5f941f740f94230b51dec33c63f7cd155e0feb 2019-11-29T11:01:13.717000+00:00 [48.876921106, 2.34671864206] Platane 166794 000108001 PARIS 9E ARRDT 465.0 25.0 ... 317250.0 Platanus NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.34671864206, 48.876921106] NaN 2.346719 48.876921

5 rows × 26 columns

In [4]:
from cartoframes import CartoDataFrame

cdf = CartoDataFrame(df)

# Set a geometry column from the coordinates
cdf.set_geometry_from_xy('lng', 'lat', inplace=True)

cdf.head()
Out[4]:
datasetid recordid record_timestamp fields.geom_x_y fields.libellefrancais fields.objectid fields.idemplacement fields.arrondissement fields.circonferenceencm fields.hauteurenm ... fields.genre fields.complementadresse fields.typeemplacement fields.dateplantation geometry.type geometry.coordinates fields.varieteoucultivar lng lat geometry
0 arbresremarquablesparis 1e31d3f0c53902b8852d1406cbb485ab5a591688 2019-11-29T11:01:13.717000+00:00 [48.8460598206, 2.25295516084] Ailante 11972 00060035 PARIS 16E ARRDT 495.0 22.0 ... Ailanthus 16-47 Arbre 1700-01-01T00:09:21+00:00 Point [2.25295516084, 48.8460598206] NaN 2.252955 48.846060 POINT (2.25296 48.84606)
1 arbresremarquablesparis 790790e4517a8ff950ea800c14b9b41a619f39f7 2019-11-29T11:01:13.717000+00:00 [48.8462426895, 2.25137540594] Erable 15499 00040005 PARIS 16E ARRDT 210.0 16.0 ... Acer NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.25137540594, 48.8462426895] NaN 2.251375 48.846243 POINT (2.25138 48.84624)
2 arbresremarquablesparis 6f417a150db8f783294fea063fe39bb8e324e68a 2019-11-29T11:01:13.717000+00:00 [48.8460398085, 2.25406628275] Micocoulier 24985 00060082 PARIS 16E ARRDT 172.0 9.0 ... Celtis NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.25406628275, 48.8460398085] NaN 2.254066 48.846040 POINT (2.25407 48.84604)
3 arbresremarquablesparis 3a381464d5fac7746b2d3fe1ed63d3614a236099 2019-11-29T11:01:13.717000+00:00 [48.8709369341, 2.24803445349] If 172685 000701001 BOIS DE BOULOGNE 246.0 15.0 ... Taxus 16-21 Arbre 1772-01-01T00:09:21+00:00 Point [2.24803445349, 48.8709369341] NaN 2.248034 48.870937 POINT (2.24803 48.87094)
4 arbresremarquablesparis 6f5f941f740f94230b51dec33c63f7cd155e0feb 2019-11-29T11:01:13.717000+00:00 [48.876921106, 2.34671864206] Platane 166794 000108001 PARIS 9E ARRDT 465.0 25.0 ... Platanus NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.34671864206, 48.876921106] NaN 2.346719 48.876921 POINT (2.34672 48.87692)

5 rows × 27 columns

In [5]:
cdf.viz()
Out[5]:

Create a Dataset from a GeoJSON file

This example illustrates how to create a Dataset from a GeoJSON file using geopandas

In [1]:
from cartoframes import CartoDataFrame

cdf = CartoDataFrame.from_file('../files/sustainable_palm_oil_production_mills.geojson')
cdf.head()
Out[1]:
objectid cartodb_id entity_id latitude longitude audit_stat legal_radi illegal_ra radius_umd radius_for ... carbon_r_3 peat_for_2 peat_for_3 primary_10 primary_11 mill_name parent_com rspo_certi date_updat geometry
0 1 59 ID1822 -1.585833 103.205556 ASA 1 0 0.321764 0.225759 1099 ... 1224 0 29 0.004508 0.135391 Muara Bulian Mill PT Inti Indosawit Subur yes 14-Aug POINT (103.20556 -1.58583)
1 2 153 ID1847 0.077043 102.030838 Renewal Certification 0 0.445960 0.258855 1979 ... 2038 7 523 0.017304 0.321418 Pabrik Kelapa Sawit Batang Kulim POM PT Musim Mas yes 14-Aug POINT (102.03084 0.07704)
2 3 103 ID1720 1.660222 100.590611 Initial Certification 0 0.498531 0.248520 1432 ... 1518 0 476 0.000811 0.193365 Kayangan and Kencana POM PT Salim Ivomas Pratama Tbk yes 14-Aug POINT (100.59061 1.66022)
3 4 216 ID1945 -2.894444 112.543611 ASA 1 0 0.662863 0.186332 226 ... 269 59 66 0.124882 0.189169 PT Sarana Titian Permata POM Wilmar International Ltd yes 14-Aug POINT (112.54361 -2.89444)
4 5 156 ID1553 3.593333 98.947222 Initial Certification 0 0.533668 0.028972 382 ... 412 0 0 0.000216 0.009296 Adolina POM PT Perkebunan Nusantara IV (PERSERO) yes 14-Aug POINT (98.94722 3.59333)

5 rows × 73 columns

In [2]:
cdf.viz()
Out[2]:

Support

Get help or learn about known issues.