View source Data Management
In this guide you will learn how to load different data files into DataFrames and how to interact with the CARTO platform to upload DataFrames into tables and download tables or SQL queries into DataFrames.
CARTOframes is built on top of Pandas and GeoPandas. Therefore, it’s compatible with all the data formats supported in those projects like GeoJSON, Shapefile, CSV, etc.
There are two main concepts we should know before continuing with the guide:
- A DataFrame is a two-dimensional data structure for generic data. It can be thought of as a table with rows and columns. It’s composed of Series objects, which are one-dimensional data structures.
- A GeoDataFrame is a DataFrame with an extra geometry column. This geometry column is a GeoSeries object.
Every time we manage geographic data, a GeoDataFrame should be used. In case a DataFrame with an encoded geometry column is used (WKB, WKT, etc.), every method contains a geom_col
param to provide the name of that column and decode the geometry internally.
For further learning you can checkout the Data Management examples.
Read a GeoJSON file
This is how to load geographic data from a GeoJSON file using GeoPandas. To read pure JSON files check this example.
1
2
3
4
| from geopandas import read_file
gdf = read_file('https://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn_geocoded.geojson')
gdf.head()
|
| cartodb_id | field_1 | name | address | revenue | geometry |
0 | 1 | 0 | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 | POINT (-73.95901 40.67109) |
1 | 2 | 1 | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 | POINT (-73.96122 40.57796) |
2 | 3 | 2 | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 | POINT (-73.98976 40.61912) |
3 | 4 | 3 | Bay Ridge Pkwy & 3rd Ave | 7419 3rd Avenue,Brooklyn, NY 11209 | 1185702.676 | POINT (-74.02744 40.63152) |
4 | 5 | 4 | Caesar's Bay Shopping Center | 8973 Bay Parkway,Brooklyn, NY 11214 | 1148427.411 | POINT (-74.00098 40.59321) |
Read a Shapefile
Shapefile is a complex format, compared to CSV or GeoJSON. To learn more about this format check GeoPandas documentation.
1
2
3
4
| from geopandas import read_file
gdf = read_file('https://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn_geocoded.zip')
gdf.head()
|
| cartodb_id | field_1 | name | address | revenue | geometry |
0 | 1 | 0 | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 | POINT (-73.95901 40.67109) |
1 | 2 | 1 | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 | POINT (-73.96122 40.57796) |
2 | 3 | 2 | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 | POINT (-73.98976 40.61912) |
3 | 4 | 3 | Bay Ridge Pkwy & 3rd Ave | 7419 3rd Avenue,Brooklyn, NY 11209 | 1185702.676 | POINT (-74.02744 40.63152) |
4 | 5 | 4 | Caesar's Bay Shopping Center | 8973 Bay Parkway,Brooklyn, NY 11214 | 1148427.411 | POINT (-74.00098 40.59321) |
Read a CSV file
Compute geometry from longitude and latitude
1
2
3
4
5
6
7
| from pandas import read_csv
from geopandas import GeoDataFrame, points_from_xy
df = read_csv('https://libs.cartocdn.com/cartoframes/samples/sf_incidents.csv')
gdf = GeoDataFrame(df, geometry=points_from_xy(df['longitude'], df['latitude']))
gdf.head()
|
| incident_datetime | incident_date | incident_time | incident_year | incident_day_of_week | report_datetime | row_id | incident_id | incident_number | cad_number | ... | :@computed_region_qgnn_b9vv | :@computed_region_26cr_cadq | :@computed_region_ajp5_b2md | :@computed_region_nqbw_i6c3 | :@computed_region_2dwj_jsy4 | :@computed_region_h4ep_8xdi | :@computed_region_y6ts_4iup | :@computed_region_jg9y_a9du | :@computed_region_6pnf_4xz7 | geometry |
0 | 2020-08-16T03:13:00.000 | 2020-08-16T00:00:00.000 | 03:13 | 2020 | Sunday | 2020-08-16T03:14:00.000 | 95319604083 | 953196 | 200491669 | 202290313.0 | ... | 2.0 | 9 | 26.0 | NaN | NaN | NaN | NaN | NaN | 2.0 | POINT (-122.39773 37.75483) |
1 | 2020-08-16T03:38:00.000 | 2020-08-16T00:00:00.000 | 03:38 | 2020 | Sunday | 2020-08-16T04:56:00.000 | 95326228100 | 953262 | 200491738 | 202290404.0 | ... | 3.0 | 2 | 20.0 | 3.0 | NaN | NaN | NaN | NaN | 2.0 | POINT (-122.42204 37.76654) |
2 | 2020-08-16T13:40:00.000 | 2020-08-16T00:00:00.000 | 13:40 | 2020 | Sunday | 2020-08-16T13:56:00.000 | 95336264020 | 953362 | 200492463 | 202291631.0 | ... | 1.0 | 10 | 8.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | POINT (-122.40371 37.78404) |
3 | 2020-08-16T16:18:00.000 | 2020-08-16T00:00:00.000 | 16:18 | 2020 | Sunday | 2020-08-16T16:18:00.000 | 95335012010 | 953350 | 200492792 | 202292091.0 | ... | 10.0 | 7 | 35.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | POINT (-122.50742 37.75100) |
4 | 2020-08-12T22:00:00.000 | 2020-08-12T00:00:00.000 | 22:00 | 2020 | Wednesday | 2020-08-15T08:30:00.000 | 95300674000 | 953006 | 200489880 | 202280827.0 | ... | 4.0 | 11 | 39.0 | NaN | NaN | NaN | NaN | NaN | 2.0 | POINT (-122.43214 37.78050) |
5 rows × 37 columns
Compute geometry from WKT/WKB
1
2
3
4
5
6
7
8
| from pandas import read_csv
from geopandas import GeoDataFrame
from cartoframes.utils import decode_geometry
df = read_csv('https://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn_geocoded.csv')
gdf = GeoDataFrame(df, geometry=decode_geometry(df['the_geom']))
gdf.head()
|
| the_geom | cartodb_id | field_1 | name | address | revenue | id_store | geometry |
0 | 0101000020E61000005EA27A6B607D52C01956F146E655... | 1 | 0 | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 | A | POINT (-73.95901 40.67109) |
1 | 0101000020E6100000B610E4A0847D52C0B532E197FA49... | 2 | 1 | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 | B | POINT (-73.96122 40.57796) |
2 | 0101000020E6100000E5B8533A587F52C05726FC523F4F... | 3 | 2 | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 | C | POINT (-73.98976 40.61912) |
3 | 0101000020E61000008BA6B393C18152C08D62B9A5D550... | 4 | 3 | Bay Ridge Pkwy & 3rd Ave | 7419 3rd Avenue,Brooklyn, NY 11209 | 1185702.676 | D | POINT (-74.02744 40.63152) |
4 | 0101000020E6100000CEFC6A0E108052C080D4264EEE4B... | 5 | 4 | Caesar's Bay Shopping Center | 8973 Bay Parkway,Brooklyn, NY 11214 | 1148427.411 | E | POINT (-74.00098 40.59321) |
Read data from a CARTO table
Note: You’ll need your CARTO Account credentials to perform this action.
1
2
3
| from cartoframes.auth import set_default_credentials
set_default_credentials('cartoframes')
|
1
2
3
4
| from cartoframes import read_carto
gdf = read_carto('starbucks_brooklyn')
gdf.head()
|
| cartodb_id | the_geom | field_1 | name | address | revenue |
0 | 1 | POINT (-73.95901 40.67109) | 0 | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 |
1 | 2 | POINT (-73.96122 40.57796) | 1 | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 |
2 | 3 | POINT (-73.98976 40.61912) | 2 | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 |
3 | 4 | POINT (-74.02744 40.63152) | 3 | Bay Ridge Pkwy & 3rd Ave | 7419 3rd Avenue,Brooklyn, NY 11209 | 1185702.676 |
4 | 5 | POINT (-74.00098 40.59321) | 4 | Caesar's Bay Shopping Center | 8973 Bay Parkway,Brooklyn, NY 11214 | 1148427.411 |
Read data from a CARTO SQL Query
Note: You’ll need your CARTO Account credentials to perform this action.
1
2
3
| from cartoframes.auth import set_default_credentials
set_default_credentials('cartoframes')
|
1
2
3
4
| from cartoframes import read_carto
gdf = read_carto("SELECT * FROM starbucks_brooklyn WHERE revenue > 1200000")
gdf.head()
|
| cartodb_id | the_geom | field_1 | name | address | revenue |
0 | 1 | POINT (-73.95901 40.67109) | 0 | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 |
1 | 2 | POINT (-73.96122 40.57796) | 1 | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 |
2 | 3 | POINT (-73.98976 40.61912) | 2 | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 |
Upload data to CARTO
Note: You’ll need your CARTO Account credentials to perform this action.
1
2
3
| from cartoframes.auth import set_default_credentials
set_default_credentials('creds.json')
|
1
2
3
| from cartoframes import to_carto
to_carto(gdf, 'starbucks_brooklyn_filtered', if_exists='replace')
|
1
2
3
4
5
6
7
| Success! Data uploaded to table "starbucks_brooklyn_filtered" correctly
'starbucks_brooklyn_filtered'
|