CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

CARTOframes v1.0.1 includes breaking changes from betas and 0.10 version, check the migration guide to learn how to update your code.

What is CARTOframes?

CARTOframes is a Python package for integrating CARTO maps, analysis, and data services into data science workflows.

Python data analysis workflows often rely on the pandas and jupyter notebook de facto standards. Integrating CARTO into this workflow saves data scientists time and energy by not having to export datasets as files or retain multiple copies of the data. To understand the fundamentals of CARTOframes, read the guides. To view the source code, browse the open-source repository in GitHub and contribute. Otherwise, read the full reference API, or find different support options.

Guides

Quick reference guides for learning how to use CARTOframes features.

Reference

Browse the interactive API documentation to search for specific CARTOframes methods, arguments, and sample code that can be used to build your applications.

Check Full Reference API
1
2
3
4
5
6
7
8
9
10
from cartoframes.auth import set_default_credentials
from cartoframes.data.observatory import Enrichment

set_default_credentials('creds.json')
enrichment = Enrichment()

enriched_df = enrichment.enrich_points(
    df,
    variables=['total_pop_3cf008b3']
)

Examples

Play with real examples and learn by doing.

Load data from a CSV file

These examples illustrate how to load data from a CSV file using the Pandas and GeoPandas libraries.

From latitude and longitude columns

In [1]:
from pandas import read_csv
from geopandas import GeoDataFrame, points_from_xy

remote_file_path = 'http://data.sfgov.org/resource/wg3w-h783.csv'

df = read_csv(remote_file_path)

# Clean rows where the `longitude` column is NULL
df = df[df['longitude'].notna()]

gdf = GeoDataFrame(df, geometry=points_from_xy(df['longitude'], df['latitude']))
gdf.head()
Out[1]:
incident_datetime incident_date incident_time incident_year incident_day_of_week report_datetime row_id incident_id incident_number cad_number ... :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md :@computed_region_nqbw_i6c3 :@computed_region_2dwj_jsy4 :@computed_region_h4ep_8xdi :@computed_region_y6ts_4iup :@computed_region_jg9y_a9du :@computed_region_6pnf_4xz7 geometry
0 2020-02-03T14:45:00.000 2020-02-03T00:00:00.000 14:45 2020 Monday 2020-02-03T17:50:00.000 89881675000 898816 200085557 200342870.0 ... 10.0 8.0 16.0 NaN NaN NaN NaN NaN 2.0 POINT (-122.47604 37.72695)
1 2020-02-03T03:45:00.000 2020-02-03T00:00:00.000 03:45 2020 Monday 2020-02-03T03:45:00.000 89860711012 898607 200083749 200340316.0 ... 3.0 2.0 20.0 3.0 NaN NaN NaN NaN 2.0 POINT (-122.41517 37.75244)
2 2020-02-03T10:00:00.000 2020-02-03T00:00:00.000 10:00 2020 Monday 2020-02-03T10:06:00.000 89867264015 898672 200084060 200340808.0 ... 5.0 3.0 8.0 NaN 35.0 NaN NaN NaN 2.0 POINT (-122.40734 37.78456)
4 2020-01-05T00:00:00.000 2020-01-05T00:00:00.000 00:00 2020 Sunday 2020-02-03T16:09:00.000 89877368020 898773 200085193 200342341.0 ... 4.0 6.0 30.0 NaN NaN NaN NaN NaN 1.0 POINT (-122.44025 37.78711)
5 2020-02-03T08:36:00.000 2020-02-03T00:00:00.000 08:36 2020 Monday 2020-02-03T08:36:00.000 89876268020 898762 200083909 200340826.0 ... 6.0 3.0 8.0 NaN NaN NaN NaN NaN 1.0 POINT (-122.39951 37.79693)

5 rows × 37 columns

In [2]:
from cartoframes.viz import Layer

Layer(gdf)
Out[2]:

From a WKT column

In [3]:
from cartoframes.utils import decode_geometry

df = read_csv(remote_file_path)

# Clean rows where the `point` column, the WTK representation of each point, is NULL
df = df[df['point'].notna()]

gdf = GeoDataFrame(df, geometry=decode_geometry(df['point']))
gdf.head()
Out[3]:
incident_datetime incident_date incident_time incident_year incident_day_of_week report_datetime row_id incident_id incident_number cad_number ... :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md :@computed_region_nqbw_i6c3 :@computed_region_2dwj_jsy4 :@computed_region_h4ep_8xdi :@computed_region_y6ts_4iup :@computed_region_jg9y_a9du :@computed_region_6pnf_4xz7 geometry
0 2020-02-03T14:45:00.000 2020-02-03T00:00:00.000 14:45 2020 Monday 2020-02-03T17:50:00.000 89881675000 898816 200085557 200342870.0 ... 10.0 8.0 16.0 NaN NaN NaN NaN NaN 2.0 POINT (-122.47604 37.72695)
1 2020-02-03T03:45:00.000 2020-02-03T00:00:00.000 03:45 2020 Monday 2020-02-03T03:45:00.000 89860711012 898607 200083749 200340316.0 ... 3.0 2.0 20.0 3.0 NaN NaN NaN NaN 2.0 POINT (-122.41517 37.75244)
2 2020-02-03T10:00:00.000 2020-02-03T00:00:00.000 10:00 2020 Monday 2020-02-03T10:06:00.000 89867264015 898672 200084060 200340808.0 ... 5.0 3.0 8.0 NaN 35.0 NaN NaN NaN 2.0 POINT (-122.40734 37.78456)
4 2020-01-05T00:00:00.000 2020-01-05T00:00:00.000 00:00 2020 Sunday 2020-02-03T16:09:00.000 89877368020 898773 200085193 200342341.0 ... 4.0 6.0 30.0 NaN NaN NaN NaN NaN 1.0 POINT (-122.44025 37.78711)
5 2020-02-03T08:36:00.000 2020-02-03T00:00:00.000 08:36 2020 Monday 2020-02-03T08:36:00.000 89876268020 898762 200083909 200340826.0 ... 6.0 3.0 8.0 NaN NaN NaN NaN NaN 1.0 POINT (-122.39951 37.79693)

5 rows × 37 columns

In [4]:
Layer(gdf)
Out[4]:

Load data from a JSON file

This example illustrates how to load data from a remote JSON file using pandas and the process of preparing the data for spatial operations.

In [1]:
import requests

# Download the JSON file
remote_file_path = 'http://opendata.paris.fr/api/records/1.0/search/?dataset=arbresremarquablesparis&rows=200'
data_json = requests.get(remote_file_path).json()['records']
data_json[0].keys()
Out[1]:
dict_keys(['datasetid', 'recordid', 'fields', 'geometry', 'record_timestamp'])
In [2]:
from pandas.io.json import json_normalize

# Normalize the data
df = json_normalize(data_json)
df.head()
Out[2]:
datasetid recordid record_timestamp fields.geom_x_y fields.libellefrancais fields.objectid fields.idemplacement fields.arrondissement fields.circonferenceencm fields.hauteurenm ... fields.stadedeveloppement fields.remarquable fields.idbase fields.genre fields.complementadresse fields.typeemplacement fields.dateplantation geometry.type geometry.coordinates fields.varieteoucultivar
0 arbresremarquablesparis ac5b2e74ad08eab0eeb5f8ce5c1722e776ad9e62 2020-01-17T11:00:47.389000+00:00 [48.8558090006, 2.30029407426] Plaqueminier 16637 P0090571 PARIS 7E ARRDT 117.0 14.0 ... A 1 106715.0 Diospyros 07-05 Arbre 1700-01-01T00:09:21+00:00 Point [2.30029407426, 48.8558090006] NaN
1 arbresremarquablesparis ef7829a6dd75472693edaf90d8deec7102f53d1a 2020-01-17T11:00:47.389000+00:00 [48.8428258374, 2.29728541016] Aulne 26200 00000087 PARIS 15E ARRDT 223.0 19.0 ... A 1 111640.0 Alnus 15-02 Arbre 1700-01-01T00:09:21+00:00 Point [2.29728541016, 48.8428258374] NaN
2 arbresremarquablesparis 92ab6f2fdc6381d9bfea47b70f658f215bae582d 2020-01-17T11:00:47.389000+00:00 [48.8606050739, 2.25977704939] Cyprès Chauve 46910 000101003 BOIS DE BOULOGNE 354.0 32.0 ... M 1 2002397.0 Taxodium 16-03 Arbre 1862-01-01T00:09:21+00:00 Point [2.25977704939, 48.8606050739] NaN
3 arbresremarquablesparis e8b0534eb9fa43e6cb1234ba666411099ec49c10 2020-01-17T11:00:47.389000+00:00 [48.8727963278, 2.2875405986] Noisetier de Byzance 62060 000105001 / 16-25 PARIS 16E ARRDT 208.0 12.0 ... M 1 114779.0 Corylus PELOUSE 15 - 27 à 29 Arbre 1700-01-01T00:09:21+00:00 Point [2.2875405986, 48.8727963278] NaN
4 arbresremarquablesparis b3592a53c7e7289a66beed8500410b7aabcf6b37 2020-01-17T11:00:47.389000+00:00 [48.8800797215, 2.38570950523] Marronnier 179572 G0380001 PARIS 19E ARRDT 330.0 22.0 ... M 1 102027.0 Aesculus NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.38570950523, 48.8800797215] NaN

5 rows × 24 columns

In [3]:
# Add Latitude and Longitude columns
df['lng'] = df.apply(lambda row: row['geometry.coordinates'][0], axis=1)
df['lat'] = df.apply(lambda row: row['geometry.coordinates'][1], axis=1)
df.head()
Out[3]:
datasetid recordid record_timestamp fields.geom_x_y fields.libellefrancais fields.objectid fields.idemplacement fields.arrondissement fields.circonferenceencm fields.hauteurenm ... fields.idbase fields.genre fields.complementadresse fields.typeemplacement fields.dateplantation geometry.type geometry.coordinates fields.varieteoucultivar lng lat
0 arbresremarquablesparis ac5b2e74ad08eab0eeb5f8ce5c1722e776ad9e62 2020-01-17T11:00:47.389000+00:00 [48.8558090006, 2.30029407426] Plaqueminier 16637 P0090571 PARIS 7E ARRDT 117.0 14.0 ... 106715.0 Diospyros 07-05 Arbre 1700-01-01T00:09:21+00:00 Point [2.30029407426, 48.8558090006] NaN 2.300294 48.855809
1 arbresremarquablesparis ef7829a6dd75472693edaf90d8deec7102f53d1a 2020-01-17T11:00:47.389000+00:00 [48.8428258374, 2.29728541016] Aulne 26200 00000087 PARIS 15E ARRDT 223.0 19.0 ... 111640.0 Alnus 15-02 Arbre 1700-01-01T00:09:21+00:00 Point [2.29728541016, 48.8428258374] NaN 2.297285 48.842826
2 arbresremarquablesparis 92ab6f2fdc6381d9bfea47b70f658f215bae582d 2020-01-17T11:00:47.389000+00:00 [48.8606050739, 2.25977704939] Cyprès Chauve 46910 000101003 BOIS DE BOULOGNE 354.0 32.0 ... 2002397.0 Taxodium 16-03 Arbre 1862-01-01T00:09:21+00:00 Point [2.25977704939, 48.8606050739] NaN 2.259777 48.860605
3 arbresremarquablesparis e8b0534eb9fa43e6cb1234ba666411099ec49c10 2020-01-17T11:00:47.389000+00:00 [48.8727963278, 2.2875405986] Noisetier de Byzance 62060 000105001 / 16-25 PARIS 16E ARRDT 208.0 12.0 ... 114779.0 Corylus PELOUSE 15 - 27 à 29 Arbre 1700-01-01T00:09:21+00:00 Point [2.2875405986, 48.8727963278] NaN 2.287541 48.872796
4 arbresremarquablesparis b3592a53c7e7289a66beed8500410b7aabcf6b37 2020-01-17T11:00:47.389000+00:00 [48.8800797215, 2.38570950523] Marronnier 179572 G0380001 PARIS 19E ARRDT 330.0 22.0 ... 102027.0 Aesculus NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.38570950523, 48.8800797215] NaN 2.385710 48.880080

5 rows × 26 columns

In [4]:
from geopandas import GeoDataFrame, points_from_xy

gdf = GeoDataFrame(df, geometry=points_from_xy(df['lng'], df['lat']))
gdf.head()
Out[4]:
datasetid recordid record_timestamp fields.geom_x_y fields.libellefrancais fields.objectid fields.idemplacement fields.arrondissement fields.circonferenceencm fields.hauteurenm ... fields.genre fields.complementadresse fields.typeemplacement fields.dateplantation geometry.type geometry.coordinates fields.varieteoucultivar lng lat geometry
0 arbresremarquablesparis ac5b2e74ad08eab0eeb5f8ce5c1722e776ad9e62 2020-01-17T11:00:47.389000+00:00 [48.8558090006, 2.30029407426] Plaqueminier 16637 P0090571 PARIS 7E ARRDT 117.0 14.0 ... Diospyros 07-05 Arbre 1700-01-01T00:09:21+00:00 Point [2.30029407426, 48.8558090006] NaN 2.300294 48.855809 POINT (2.30029 48.85581)
1 arbresremarquablesparis ef7829a6dd75472693edaf90d8deec7102f53d1a 2020-01-17T11:00:47.389000+00:00 [48.8428258374, 2.29728541016] Aulne 26200 00000087 PARIS 15E ARRDT 223.0 19.0 ... Alnus 15-02 Arbre 1700-01-01T00:09:21+00:00 Point [2.29728541016, 48.8428258374] NaN 2.297285 48.842826 POINT (2.29729 48.84283)
2 arbresremarquablesparis 92ab6f2fdc6381d9bfea47b70f658f215bae582d 2020-01-17T11:00:47.389000+00:00 [48.8606050739, 2.25977704939] Cyprès Chauve 46910 000101003 BOIS DE BOULOGNE 354.0 32.0 ... Taxodium 16-03 Arbre 1862-01-01T00:09:21+00:00 Point [2.25977704939, 48.8606050739] NaN 2.259777 48.860605 POINT (2.25978 48.86061)
3 arbresremarquablesparis e8b0534eb9fa43e6cb1234ba666411099ec49c10 2020-01-17T11:00:47.389000+00:00 [48.8727963278, 2.2875405986] Noisetier de Byzance 62060 000105001 / 16-25 PARIS 16E ARRDT 208.0 12.0 ... Corylus PELOUSE 15 - 27 à 29 Arbre 1700-01-01T00:09:21+00:00 Point [2.2875405986, 48.8727963278] NaN 2.287541 48.872796 POINT (2.28754 48.87280)
4 arbresremarquablesparis b3592a53c7e7289a66beed8500410b7aabcf6b37 2020-01-17T11:00:47.389000+00:00 [48.8800797215, 2.38570950523] Marronnier 179572 G0380001 PARIS 19E ARRDT 330.0 22.0 ... Aesculus NaN Arbre 1700-01-01T00:09:21+00:00 Point [2.38570950523, 48.8800797215] NaN 2.385710 48.880080 POINT (2.38571 48.88008)

5 rows × 27 columns

In [5]:
from cartoframes.viz import Layer

Layer(gdf)
Out[5]:

Load data from a GeoJSON file

This example illustrates how to load data from a GeoJSON file using GeoPandas.

In [1]:
from geopandas import read_file

gdf = read_file('http://libs.cartocdn.com/cartoframes/files/sustainable_palm_oil_production_mills.geojson')
gdf.head()
Out[1]:
objectid cartodb_id entity_id latitude longitude audit_stat legal_radi illegal_ra radius_umd radius_for ... carbon_r_3 peat_for_2 peat_for_3 primary_10 primary_11 mill_name parent_com rspo_certi date_updat geometry
0 1 59 ID1822 -1.585833 103.205556 ASA 1 0 0.321764 0.225759 1099 ... 1224 0 29 0.004508 0.135391 Muara Bulian Mill PT Inti Indosawit Subur yes 14-Aug POINT (103.20556 -1.58583)
1 2 153 ID1847 0.077043 102.030838 Renewal Certification 0 0.445960 0.258855 1979 ... 2038 7 523 0.017304 0.321418 Pabrik Kelapa Sawit Batang Kulim POM PT Musim Mas yes 14-Aug POINT (102.03084 0.07704)
2 3 103 ID1720 1.660222 100.590611 Initial Certification 0 0.498531 0.248520 1432 ... 1518 0 476 0.000811 0.193365 Kayangan and Kencana POM PT Salim Ivomas Pratama Tbk yes 14-Aug POINT (100.59061 1.66022)
3 4 216 ID1945 -2.894444 112.543611 ASA 1 0 0.662863 0.186332 226 ... 269 59 66 0.124882 0.189169 PT Sarana Titian Permata POM Wilmar International Ltd yes 14-Aug POINT (112.54361 -2.89444)
4 5 156 ID1553 3.593333 98.947222 Initial Certification 0 0.533668 0.028972 382 ... 412 0 0 0.000216 0.009296 Adolina POM PT Perkebunan Nusantara IV (PERSERO) yes 14-Aug POINT (98.94722 3.59333)

5 rows × 73 columns

In [2]:
from cartoframes.viz import Layer

Layer(gdf)
Out[2]:

Support

Get help or learn about known issues.