CARTOframes

Data Services

You can connect to CARTO Data Services API directly from CARTOframes. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook. For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade area analysis by computing isodistances or isochrones programmatically.

Using Data Services requires to be authenticated. For more information about how to authenticate, please read the Authentication guide. For further learning you can also check out the Data Services examples.

from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

Depending on your CARTO account plan, some of these data services are subject to different quota limitations.

Geocoding

To get started, let’s read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name and address. There’s also a third column that reflects the annual revenue of the store.

import pandas as pd

df = pd.read_csv('http://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn.csv')
df.head()

	name	address	revenue
0	Franklin Ave & Eastern Pkwy	341 Eastern Pkwy,Brooklyn, NY 11238	1321040.772
1	607 Brighton Beach Ave	607 Brighton Beach Avenue,Brooklyn, NY 11235	1268080.418
2	65th St & 18th Ave	6423 18th Avenue,Brooklyn, NY 11204	1248133.699
3	Bay Ridge Pkwy & 3rd Ave	7419 3rd Avenue,Brooklyn, NY 11209	1185702.676
4	Caesar's Bay Shopping Center	8973 Bay Parkway,Brooklyn, NY 11214	1148427.411

Quota consumption

Each time you run Data Services, quota is consumed. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run parameter when running the service function.

It is also possible to check your available quota by running the available_quota function.

from cartoframes.data.services import Geocoding

geo_service = Geocoding()

city_ny = {'value': 'New York'}
country_usa = {'value': 'USA'}

_, geo_dry_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa, dry_run=True)

geo_dry_metadata

{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0}

geo_service.available_quota()

4999940

geo_gdf, geo_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa)

Success! Data geocoded correctly

Let’s compare geo_dry_metadata and geo_metadata to see the differences between the information returned with and without the dry_run option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.

geo_metadata

{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0,
 'final_records_with_geometry': 10,
 'geocoded_increment': 10,
 'successfully_geocoded': 10,
 'failed_geocodings': 0}

geo_service.available_quota()

4999930

If the input data file ever changes, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records. In order to use cached results, we have to save the results to a CARTO table using the table_name and cached=True parameters.

The resulting data is a GeoDataFrame that contains three new columns:

geometry: The resulting geometry
gc_status_rel: The percentage of accuracy of each location
carto_geocode_hash: Geocode information

geo_gdf.head()

	the_geom	name	address	revenue	gc_status_rel	carto_geocode_hash
0	POINT (-73.95746 40.67102)	Franklin Ave & Eastern Pkwy	341 Eastern Pkwy,Brooklyn, NY 11238	1321040.772	0.97	c834a8e289e5bce280775a9bf1f833f1
1	POINT (-73.96122 40.57796)	607 Brighton Beach Ave	607 Brighton Beach Avenue,Brooklyn, NY 11235	1268080.418	0.99	7d39a3fff93efd9034da88aa9ad2da79
2	POINT (-73.98978 40.61944)	65th St & 18th Ave	6423 18th Avenue,Brooklyn, NY 11204	1248133.699	0.98	1a2312049ddea753ba42bf77f5ccf718
3	POINT (-74.02750 40.63202)	Bay Ridge Pkwy & 3rd Ave	7419 3rd Avenue,Brooklyn, NY 11209	1185702.676	0.98	827ab4dcc2d49d5fd830749597976d4a
4	POINT (-74.00098 40.59321)	Caesar's Bay Shopping Center	8973 Bay Parkway,Brooklyn, NY 11214	1148427.411	0.98	119a38c7b51195cd4153fc81605a8495

In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process.

This will happen automatically in these cases:

Your input is a table from CARTO processed in place (without a table_name parameter)
If you save your results to a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

If you try to geocode this DataFrame now that it contains both the_geom and the carto_geocode_hash, you will see that the required quota is 0 because it has already been geocoded.

_, geo_metadata = geo_service.geocode(geo_gdf, street='address', city=city_ny, country=country_usa, dry_run=True)

geo_metadata.get('required_quota')

0

Precision

The address column is more complete than the name column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name column are lower than the ones we get by using the address column for geocoding.

geo_name_gdf, geo_name_metadata = geo_service.geocode(df, street='name', city=city_ny, country=country_usa)

Success! Data geocoded correctly

geo_name_gdf.gc_status_rel.unique()

array([0.93, 0.96, 0.85, 0.83, 0.74, 0.87])

geo_gdf.gc_status_rel.unique()

array([0.97, 0.99, 0.98])

Visualize the results

Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.

from cartoframes.viz import Layer, color_bins_style, popup_element

Layer(
    geo_gdf,
    color_bins_style('gc_status_rel', method='equal', bins=geo_gdf.gc_status_rel.unique().size),
    popup_hover=[popup_element('address', 'Address'), popup_element('gc_status_rel', 'Precision')],
    title='Geocoding Precision'
)

Isolines

There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.

By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:

Time in the case of isochrones
Distance in the case of isodistances

Isochrones

For isochrones, let’s calculate the time ranges of 5, 15 and 30 minutes. These ranges are input in seconds, so they will be 300, 900, and 1800 respectively.

from cartoframes.data.services import Isolines

iso_service = Isolines()

_, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True)

Remember to always check the quota using dry_run parameter and available_quota method before running the service!

print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isochrones_dry_metadata.get('required_quota'))
)

available 115540, required 30

isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk')

Success! Isolines created correctly

isochrones_gdf.head()

	source_id	data_range	the_geom
0	0	300	MULTIPOLYGON (((-73.96279 40.67224, -73.96254 ...
1	0	900	MULTIPOLYGON (((-73.96897 40.67430, -73.96872 ...
2	0	1800	MULTIPOLYGON (((-73.97653 40.67842, -73.97627 ...
3	1	300	MULTIPOLYGON (((-73.96537 40.57869, -73.96494 ...
4	1	900	MULTIPOLYGON (((-73.97223 40.57800, -73.97181 ...

from cartoframes.viz import Layer, basic_style, basic_legend

Layer(isochrones_gdf, basic_style(opacity=0.5), basic_legend('Isochrones'))

Isodistances

For isodistances, let’s calculate the distance ranges of 100, 500 and 1000 meters. These ranges are input in meters, so they will be 100, 500, and 1000 respectively.

_, isodistances_dry_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk', dry_run=True)

print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isodistances_dry_metadata.get('required_quota'))
)

available 115510, required 30

isodistances_gdf, isodistances_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk')

Success! Isolines created correctly

isodistances_gdf.head()

	source_id	data_range	the_geom
0	0	100	MULTIPOLYGON (((-73.95850 40.67139, -73.95721 ...
1	0	500	MULTIPOLYGON (((-73.96262 40.67276, -73.96219 ...
2	0	1000	MULTIPOLYGON (((-73.96880 40.67345, -73.96820 ...
3	1	100	MULTIPOLYGON (((-73.96125 40.57800, -73.96065 ...
4	1	500	MULTIPOLYGON (((-73.96605 40.57800, -73.96563 ...

from cartoframes.viz import Layer, basic_style, basic_legend

Layer(isodistances_gdf, basic_style(opacity=0.5), basic_legend('Isodistances'))

Fundamentals

Docs

Libraries

API