A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

View source

Data Services

You can connect to CARTO Data Services API directly from CARTOframes. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook. For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade area analysis by computing isodistances or isochrones programmatically.

Using Data Services requires to be authenticated. For more information about how to authenticate, please read the Authentication guide. For further learning you can also check out the Data Services examples.

1
2
3
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

Depending on your CARTO account plan, some of these data services are subject to different quota limitations.

Geocoding

To get started, let’s read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name and address. There’s also a third column that reflects the annual revenue of the store.

1
2
3
4
import pandas as pd

df = pd.read_csv('http://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn.csv')
df.head()
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411

Quota consumption

Each time you run Data Services, quota is consumed. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run parameter when running the service function.

It is also possible to check your available quota by running the available_quota function.

1
2
3
4
5
6
7
8
from cartoframes.data.services import Geocoding

geo_service = Geocoding()

city_ny = {'value': 'New York'}
country_usa = {'value': 'USA'}

_, geo_dry_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa, dry_run=True)
1
geo_dry_metadata
1
2
3
4
5
{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0}
1
geo_service.available_quota()
1
4999402
1
geo_gdf, geo_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa)
1
Success! Data geocoded correctly

Let’s compare geo_dry_metadata and geo_metadata to see the differences between the information returned with and without the dry_run option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.

1
geo_metadata
1
2
3
4
5
6
7
8
9
{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0,
 'final_records_with_geometry': 10,
 'geocoded_increment': 10,
 'successfully_geocoded': 10,
 'failed_geocodings': 0}
1
geo_service.available_quota()
1
4999392

If the input data file ever changes, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records. In order to use cached results, we have to save the results to a CARTO table using the table_name and cached=True parameters.

The resulting data is a GeoDataFrame that contains three new columns:

  • geometry: The resulting geometry
  • gc_status_rel: The percentage of accuracy of each location
  • carto_geocode_hash: Geocode information
1
geo_gdf.head()
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.95746 40.67102) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79
2 POINT (-73.98978 40.61944) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718
3 POINT (-74.02750 40.63202) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a
4 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495

In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process.

This will happen automatically in these cases:

  1. Your input is a table from CARTO processed in place (without a table_name parameter)
  2. If you save your results to a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

If you try to geocode this DataFrame now that it contains both the_geom and the carto_geocode_hash, you will see that the required quota is 0 because it has already been geocoded.

1
_, geo_metadata = geo_service.geocode(geo_gdf, street='address', city=city_ny, country=country_usa, dry_run=True)
1
geo_metadata.get('required_quota')
1
0

Precision

The address column is more complete than the name column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name column are lower than the ones we get by using the address column for geocoding.

1
geo_name_gdf, geo_name_metadata = geo_service.geocode(df, street='name', city=city_ny, country=country_usa)
1
Success! Data geocoded correctly
1
geo_name_gdf.gc_status_rel.unique()
1
array([0.93, 0.96, 0.85, 0.83, 0.74, 0.87])
1
geo_gdf.gc_status_rel.unique()
1
array([0.97, 0.99, 0.98])

Visualize the results

Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.

1
2
3
4
5
6
7
8
from cartoframes.viz import Layer, color_bins_style, popup_element

Layer(
    geo_gdf,
    color_bins_style('gc_status_rel', method='equal', bins=geo_gdf.gc_status_rel.unique().size),
    popup_hover=[popup_element('address', 'Address'), popup_element('gc_status_rel', 'Precision')],
    title='Geocoding Precision'
)

Isolines

There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.

By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:

  • Time in the case of isochrones
  • Distance in the case of isodistances

Isochrones

For isochrones, let’s calculate the time ranges of 5, 15 and 30 minutes. These ranges are input in seconds, so they will be 300, 900, and 1800 respectively.

1
2
3
4
5
from cartoframes.data.services import Isolines

iso_service = Isolines()

_, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True)

Remember to always check the quota using dry_run parameter and available_quota method before running the service!

1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isochrones_dry_metadata.get('required_quota'))
)
1
available 115394, required 30
1
isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk')
1
Success! Isolines created correctly
1
isochrones_gdf.head()
source_id data_range the_geom
0 0 300 MULTIPOLYGON (((-73.95936 40.67087, -73.95910 ...
1 0 900 MULTIPOLYGON (((-73.96743 40.67345, -73.96683 ...
2 0 1800 MULTIPOLYGON (((-73.97584 40.67499, -73.97558 ...
3 1 300 MULTIPOLYGON (((-73.96417 40.57749, -73.96391 ...
4 1 900 MULTIPOLYGON (((-73.97035 40.57749, -73.97009 ...
1
2
3
from cartoframes.viz import Layer, basic_style, basic_legend

Layer(isochrones_gdf, basic_style(opacity=0.5), basic_legend('Isochrones'))

Isodistances

For isodistances, let’s calculate the distance ranges of 100, 500 and 1000 meters. These ranges are input in meters, so they will be 100, 500, and 1000 respectively.

1
_, isodistances_dry_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk', dry_run=True)
1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isodistances_dry_metadata.get('required_quota'))
)
1
available 115364, required 30
1
isodistances_gdf, isodistances_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk')
1
Success! Isolines created correctly
1
isodistances_gdf.head()
source_id data_range the_geom
0 0 100 MULTIPOLYGON (((-73.95782 40.67139, -73.95721 ...
1 0 500 MULTIPOLYGON (((-73.96262 40.67276, -73.96219 ...
2 0 1000 MULTIPOLYGON (((-73.96880 40.67345, -73.96820 ...
3 1 100 MULTIPOLYGON (((-73.96125 40.57800, -73.96065 ...
4 1 500 MULTIPOLYGON (((-73.96605 40.57800, -73.96563 ...
1
2
3
from cartoframes.viz import Layer, basic_style, basic_legend

Layer(isodistances_gdf, basic_style(opacity=0.5), basic_legend('Isodistances'))