View source Data Services
You can connect to CARTO Data Services API directly from CARTOframes. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook. For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade area analysis by computing isodistances or isochrones programmatically.
Using Data Services requires to be authenticated. For more information about how to authenticate, please read the Authentication guide. For further learning you can also check out the Data Services examples.
1
2
3
| from cartoframes.auth import set_default_credentials
set_default_credentials('creds.json')
|
Depending on your CARTO account plan, some of these data services are subject to different quota limitations.
Geocoding
To get started, let’s read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name
and address
. There’s also a third column that reflects the annual revenue of the store.
1
2
3
4
| import pandas as pd
df = pd.read_csv('http://libs.cartocdn.com/cartoframes/samples/starbucks_brooklyn.csv')
df.head()
|
| name | address | revenue |
0 | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 |
1 | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 |
2 | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 |
3 | Bay Ridge Pkwy & 3rd Ave | 7419 3rd Avenue,Brooklyn, NY 11209 | 1185702.676 |
4 | Caesar's Bay Shopping Center | 8973 Bay Parkway,Brooklyn, NY 11214 | 1148427.411 |
Quota consumption
Each time you run Data Services, quota is consumed. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run
parameter when running the service function.
It is also possible to check your available quota by running the available_quota
function.
1
2
3
4
5
6
7
8
| from cartoframes.data.services import Geocoding
geo_service = Geocoding()
city_ny = {'value': 'New York'}
country_usa = {'value': 'USA'}
_, geo_dry_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa, dry_run=True)
|
1
2
3
4
5
| {'total_rows': 10,
'required_quota': 10,
'previously_geocoded': 0,
'previously_failed': 0,
'records_with_geometry': 0}
|
1
| geo_service.available_quota()
|
1
| geo_gdf, geo_metadata = geo_service.geocode(df, street='address', city=city_ny, country=country_usa)
|
1
| Success! Data geocoded correctly
|
Let’s compare geo_dry_metadata
and geo_metadata
to see the differences between the information returned with and without the dry_run
option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.
1
2
3
4
5
6
7
8
9
| {'total_rows': 10,
'required_quota': 10,
'previously_geocoded': 0,
'previously_failed': 0,
'records_with_geometry': 0,
'final_records_with_geometry': 10,
'geocoded_increment': 10,
'successfully_geocoded': 10,
'failed_geocodings': 0}
|
1
| geo_service.available_quota()
|
If the input data file ever changes, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records. In order to use cached results, we have to save the results to a CARTO table using the table_name
and cached=True
parameters.
The resulting data is a GeoDataFrame
that contains three new columns:
geometry
: The resulting geometry gc_status_rel
: The percentage of accuracy of each location carto_geocode_hash
: Geocode information
| the_geom | name | address | revenue | gc_status_rel | carto_geocode_hash |
0 | POINT (-73.95746 40.67102) | Franklin Ave & Eastern Pkwy | 341 Eastern Pkwy,Brooklyn, NY 11238 | 1321040.772 | 0.97 | c834a8e289e5bce280775a9bf1f833f1 |
1 | POINT (-73.96122 40.57796) | 607 Brighton Beach Ave | 607 Brighton Beach Avenue,Brooklyn, NY 11235 | 1268080.418 | 0.99 | 7d39a3fff93efd9034da88aa9ad2da79 |
2 | POINT (-73.98978 40.61944) | 65th St & 18th Ave | 6423 18th Avenue,Brooklyn, NY 11204 | 1248133.699 | 0.98 | 1a2312049ddea753ba42bf77f5ccf718 |
3 | POINT (-74.02750 40.63202) | Bay Ridge Pkwy & 3rd Ave | 7419 3rd Avenue,Brooklyn, NY 11209 | 1185702.676 | 0.98 | 827ab4dcc2d49d5fd830749597976d4a |
4 | POINT (-74.00098 40.59321) | Caesar's Bay Shopping Center | 8973 Bay Parkway,Brooklyn, NY 11214 | 1148427.411 | 0.98 | 119a38c7b51195cd4153fc81605a8495 |
In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom
and carto_geocode_hash
columns generated by the geocoding process.
This will happen automatically in these cases:
- Your input is a table from CARTO processed in place (without a
table_name
parameter) - If you save your results to a CARTO table using the
table_name
parameter, and only use the resulting table for any further geocoding.
If you try to geocode this DataFrame now that it contains both the_geom
and the carto_geocode_hash
, you will see that the required quota is 0 because it has already been geocoded.
1
| _, geo_metadata = geo_service.geocode(geo_gdf, street='address', city=city_ny, country=country_usa, dry_run=True)
|
1
| geo_metadata.get('required_quota')
|
Precision
The address
column is more complete than the name
column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name
column are lower than the ones we get by using the address
column for geocoding.
1
| geo_name_gdf, geo_name_metadata = geo_service.geocode(df, street='name', city=city_ny, country=country_usa)
|
1
| Success! Data geocoded correctly
|
1
| geo_name_gdf.gc_status_rel.unique()
|
1
| array([0.93, 0.96, 0.85, 0.83, 0.74, 0.87])
|
1
| geo_gdf.gc_status_rel.unique()
|
1
| array([0.97, 0.99, 0.98])
|
Visualize the results
Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.
1
2
3
4
5
6
7
8
| from cartoframes.viz import Layer, color_bins_style, popup_element
Layer(
geo_gdf,
color_bins_style('gc_status_rel', method='equal', bins=geo_gdf.gc_status_rel.unique().size),
popup_hover=[popup_element('address', 'Address'), popup_element('gc_status_rel', 'Precision')],
title='Geocoding Precision'
)
|
Isolines
There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.
By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:
- Time in the case of isochrones
- Distance in the case of isodistances
Isochrones
For isochrones, let’s calculate the time ranges of 5, 15 and 30 minutes. These ranges are input in seconds
, so they will be 300, 900, and 1800 respectively.
1
2
3
4
5
| from cartoframes.data.services import Isolines
iso_service = Isolines()
_, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True)
|
Remember to always check the quota using dry_run
parameter and available_quota
method before running the service!
1
2
3
4
| print('available {0}, required {1}'.format(
iso_service.available_quota(),
isochrones_dry_metadata.get('required_quota'))
)
|
1
| available 115540, required 30
|
1
| isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk')
|
1
| Success! Isolines created correctly
|
| source_id | data_range | the_geom |
0 | 0 | 300 | MULTIPOLYGON (((-73.96279 40.67224, -73.96254 ... |
1 | 0 | 900 | MULTIPOLYGON (((-73.96897 40.67430, -73.96872 ... |
2 | 0 | 1800 | MULTIPOLYGON (((-73.97653 40.67842, -73.97627 ... |
3 | 1 | 300 | MULTIPOLYGON (((-73.96537 40.57869, -73.96494 ... |
4 | 1 | 900 | MULTIPOLYGON (((-73.97223 40.57800, -73.97181 ... |
1
2
3
| from cartoframes.viz import Layer, basic_style, basic_legend
Layer(isochrones_gdf, basic_style(opacity=0.5), basic_legend('Isochrones'))
|
Isodistances
For isodistances, let’s calculate the distance ranges of 100, 500 and 1000 meters. These ranges are input in meters
, so they will be 100, 500, and 1000 respectively.
1
| _, isodistances_dry_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk', dry_run=True)
|
1
2
3
4
| print('available {0}, required {1}'.format(
iso_service.available_quota(),
isodistances_dry_metadata.get('required_quota'))
)
|
1
| available 115510, required 30
|
1
| isodistances_gdf, isodistances_metadata = iso_service.isodistances(geo_gdf, [100, 500, 1000], mode='walk')
|
1
| Success! Isolines created correctly
|
1
| isodistances_gdf.head()
|
| source_id | data_range | the_geom |
0 | 0 | 100 | MULTIPOLYGON (((-73.95850 40.67139, -73.95721 ... |
1 | 0 | 500 | MULTIPOLYGON (((-73.96262 40.67276, -73.96219 ... |
2 | 0 | 1000 | MULTIPOLYGON (((-73.96880 40.67345, -73.96820 ... |
3 | 1 | 100 | MULTIPOLYGON (((-73.96125 40.57800, -73.96065 ... |
4 | 1 | 500 | MULTIPOLYGON (((-73.96605 40.57800, -73.96563 ... |
1
2
3
| from cartoframes.viz import Layer, basic_style, basic_legend
Layer(isodistances_gdf, basic_style(opacity=0.5), basic_legend('Isodistances'))
|