CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

CARTOframes v1.0.0 includes breaking changes from betas and 0.10 version, check the migration guide to learn how to update your code.

Location Data Services

Introduction

CARTOframes provides the functionality to use the CARTO Data Services API. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook.

For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade areas analysis by computing isodistances or isochrones programmatically.

Given a set of ten simulated Starbucks store addresses, this guide walks through the use case of finding good location candidates to open an additional store.

Based on your account plan, some of these location data services are subject to different quota limitations

Data

This guide uses the same dataset of simulated Starbucks locations that has been used in the other guides and can be downloaded here.

Authentication

Using Location Data Services requires to be authenticated. For more information about how to authenticate, please read the Login to CARTO Platform guide

1
2
3
from cartoframes.auth import Credentials, set_default_credentials

set_default_credentials('creds.json')

Geocoding

To get started, let’s read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name and address. There’s also a third column that reflects the annual revenue of the store.

1
2
3
4
import pandas as pd

df = pd.read_csv('http://libs.cartocdn.com/cartoframes/files/starbucks_brooklyn.csv')
df
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1.321041e+06
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1.268080e+06
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1.248134e+06
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1.185703e+06
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1.148427e+06
5 Court St & Dean St 167 Court Street,Brooklyn, NY 11201 1.144067e+06
6 Target Gateway T-1401 519 Gateway Dr,Brooklyn, NY 11239 1.021083e+06
7 3rd Ave & 92nd St 9202 Third Avenue,Brooklyn, NY 11209 9.257073e+05
8 Lam Group @ Sheraton Brooklyn 228 Duffield st,Brooklyn, NY 11201 7.657935e+05
9 33-42 Hillel Place 33-42 Hillel Place,Brooklyn, NY 11210 7.492163e+05

Quota consumption

Each time you run Location Data Services, you consume quota. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run parameter when running the service function.

It is also possible to check the available quota by running the available_quota function.

1
2
3
4
5
6
7
8
9
10
11
from cartoframes.data.services import Geocoding

geo_service = Geocoding()

_, geo_dry_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)
1
geo_dry_metadata
1
2
3
4
5
{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0}
1
geo_service.available_quota()
1
4998758
1
2
3
4
5
6
geo_gdf, geo_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'}
)
1
Success! Data geocoded correctly

If the input data file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.

In order to use cached results, we have to save the results to a CARTO table using the table_name and cached=True parameters.

1
2
3
4
5
6
7
8
geo_gdf_cached, geo_metadata_cached = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    table_name='starbucks_cache',
    cached=True
)
1
Success! Data geocoded correctly

Let’s compare geo_dry_metadata and geo_metadata to see the differences between the information returned with and without the dry_run option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.

1
geo_metadata
1
2
3
4
5
6
7
8
9
{'total_rows': 10,
 'required_quota': 10,
 'previously_geocoded': 0,
 'previously_failed': 0,
 'records_with_geometry': 0,
 'final_records_with_geometry': 10,
 'geocoded_increment': 10,
 'successfully_geocoded': 10,
 'failed_geocodings': 0}

The resulting data is a GeoDataFrame that contains three new columns:

  • geometry: The resulting geometry
  • gc_status_rel: The percentage of accuracy of each location
  • carto_geocode_hash: Geocode information
1
geo_gdf.head()
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.96561 40.67241) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79
2 POINT (-73.98981 40.61950) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718
3 POINT (-74.02767 40.63204) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a
4 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495

In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process.

This will happen automatically in these cases:

  1. Your input is a table from CARTO processed in place (without a table_name parameter)
  2. If you save your results to a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

If you try to geocode this DataFrame now, that contains both the_geom and the carto_geocode_hash, you will see that the required quota is 0 because it has already been geocoded.

1
2
3
4
5
6
7
_, repeat_geo_metadata = geo_service.geocode(
    geo_gdf,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)
1
repeat_geo_metadata.get('required_quota')
1
0

Precision

The address column is more complete than the name column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name column (0.95, 0.93, 0.96, 0.83, 0.78, 0.9) are lower than the ones we get by using the address column for geocoding (0.97, 0.99, 0.98).

1
2
3
4
5
6
geo_name_gdf, geo_name_metadata = geo_service.geocode(
    df,
    street='name',
    city={'value': 'New York'},
    country={'value': 'USA'}
)
1
Success! Data geocoded correctly
1
geo_name_gdf.head()
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-76.56478 39.30853) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.95 0be7693fc688eca36e1077656dcb00a5
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.95 084a5c4d42ccf3c3c8e69426619f270e
2 POINT (-73.99018 40.61914) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.93 1d9a17c20c11d0454aff10548a328c47
3 POINT (-74.02778 40.63146) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.96 d531df27fc02336dc722cb4e7028b244
4 POINT (-95.45238 29.83378) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.93 9d8c13b5b4a93591f427d3ce0b5b4ead
1
geo_name_gdf.gc_status_rel.unique()
1
array([0.95, 0.93, 0.96, 0.83, 0.78, 0.9 ])
1
geo_gdf.head()
the_geom name address revenue gc_status_rel carto_geocode_hash
0 POINT (-73.96561 40.67241) Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1
1 POINT (-73.96122 40.57796) 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79
2 POINT (-73.98981 40.61950) 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718
3 POINT (-74.02767 40.63204) Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a
4 POINT (-74.00098 40.59321) Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495

Visualize the results

Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from cartoframes.viz import Layer, color_bins_style, popup_element

Layer(
    geo_gdf,
    color_bins_style(
        'gc_status_rel',
        method='equal',
        bins=geo_gdf.gc_status_rel.unique().size,
    ),
    popup_hover=[
        popup_element('address', 'Address'),
        popup_element('gc_status_rel', 'Precision')
    ],
    title='Geocoding Precision'
)

Isolines

There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.

By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:

  • Time in the case of isochrones
  • Distance in the case of isodistances

Isochrones

For isochones, let’s calculate the time ranges of: 5, 15 and 30 min. These ranges are input in seconds, so they will be 300, 900, and 1800 respectively.

1
2
3
4
5
from cartoframes.data.services import Isolines

iso_service = Isolines()

_, isochrones_dry_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk', dry_run=True)

Remember to always check the quota using dry_run parameter and available_quota method before running the service!

1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isochrones_dry_metadata.get('required_quota'))
)
1
available 115082, required 30
1
isochrones_gdf, isochrones_metadata = iso_service.isochrones(geo_gdf, [300, 900, 1800], mode='walk')
1
Success! Isolines created correctly
1
isochrones_gdf.head()
source_id data_range lower_data_range the_geom range_label
0 0 300 0 MULTIPOLYGON (((-73.96571 40.67339, -73.96582 ... 5 min.
1 0 900 300 POLYGON ((-73.96575 40.68146, -73.96461 40.682... 15 min.
2 0 1800 900 POLYGON ((-73.96688 40.69334, -73.96353 40.689... 30 min.
3 1 300 0 MULTIPOLYGON (((-73.96176 40.58099, -73.96195 ... 5 min.
4 1 900 300 POLYGON ((-73.96228 40.57503, -73.96273 40.575... 15 min.

The isolines helper

The most straightforward way of visualizing the resulting geometries is to use the isolines_style visualization layer. This visualization layer uses the range_label column that is automatically added by the service to classify each polygon by category.

1
2
3
from cartoframes.viz import isolines_style

Layer(isochrones_gdf, isolines_style())

Isodistances

The isoline services accepts several options to manually change the resolution or the quality of the polygons. There’s more information about these settings in the Isolines Reference

1
2
3
4
5
6
7
8
isodistances_gdf, isodistances_dry_metadata = iso_service.isodistances(
    geo_gdf,
    [900, 1800, 3600],
    mode='walk',
    resolution=16.0,
    quality=1,
    dry_run=True
)
1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isodistances_dry_metadata.get('required_quota'))
)
1
available 115052, required 30
1
2
3
4
5
6
7
8
isodistances_gdf, isodistances_metadata = iso_service.isodistances(
    geo_gdf,
    [900, 1800, 3600],
    mode='walk',
    mode_traffic='enabled',
    resolution=16.0,
    quality=2
)
1
Success! Isolines created correctly
1
isodistances_gdf.head()
source_id data_range lower_data_range the_geom range_label
0 0 900 0 MULTIPOLYGON (((-73.96563 40.67265, -73.96566 ... 15 min.
1 0 1800 900 POLYGON ((-73.96569 40.67198, -73.96679 40.669... 30 min.
2 0 3600 1800 POLYGON ((-73.96598 40.68382, -73.96506 40.684... 60 min.
3 1 900 0 MULTIPOLYGON (((-73.96162 40.58063, -73.96221 ... 15 min.
4 1 1800 900 POLYGON ((-73.96155 40.57507, -73.96193 40.575... 30 min.
1
Layer(isodistances_gdf, isolines_style())

All together

Let’s visualize the data in one map to see what insights we can find.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from cartoframes.viz import Map, Layer, isolines_style, size_continuous_style

Map([
    Layer(
        isochrones_gdf,
        isolines_style(),
        title='Walking Time'
    ),
    Layer(
        geo_gdf,
        size_continuous_style(
            'revenue',
            color='white',
            opacity='0.2',
            stroke_color='blue',
            size_range=[20, 80],
        ),
        popup_hover=[
            popup_element('address', 'Address'),
            popup_element('gc_status_rel', 'Precision'),
            popup_element('revenue', 'Revenue')
        ],
        title='Revenue $',
    )
])

Looking at the map above, we can see the store at 228 Duffield St, Brooklyn, NY 11201 is really close to another store with higher revenue, which means we could even think about closing that one in favor of another one with a better location.

We could try to calculate where to place a new store between other stores that don’t have as much revenue as others and that are placed separately.

Now, let’s calculate the centroid of three different stores that we’ve identified previously and use it as a possible location for a new spot:

1
2
3
4
5
6
7
8
9
10
from shapely import geometry

new_store_location = [
    geo_gdf.iloc[6].the_geom,
    geo_gdf.iloc[9].the_geom,
    geo_gdf.iloc[1].the_geom
]

# Create a polygon using three points from the geo_gdf
polygon = geometry.Polygon([[p.x, p.y] for p in new_store_location])
1
2
3
4
5
6
7
8
from geopandas import GeoDataFrame, points_from_xy

new_store_gdf = GeoDataFrame({
    'name': ['New Store'],
    'geometry': points_from_xy([polygon.centroid.x], [polygon.centroid.y])
})
    
isochrones_new_gdf, isochrones_new_metadata = iso_service.isochrones(new_store_gdf, [300, 900, 1800], mode='walk')
1
Success! Isolines created correctly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from cartoframes.viz import Map, Layer, isolines_style, size_continuous_style

Map([
    Layer(
        isochrones_gdf,
        isolines_style(opacity='0.2'),
        title='Walking Time - Current'
    ),
    Layer(
        isochrones_new_gdf,
        isolines_style(),
        title='Walking Time - New'
    ),
    Layer(
        geo_gdf,
        size_continuous_style(
            'revenue',
            color='white',
            opacity='0.2',
            stroke_color='blue',
            size_range=[20, 80]
        ),
        popup_hover=[
            popup_element('address', 'Address'),
            popup_element('gc_status_rel', 'Precision'),
            popup_element('revenue', 'Revenue')
        ],
        title='Revenue $',
    ),
    Layer(new_store_gdf)
])

Conclusion

In this example you’ve seen how to use Location Data Services to perform a trade area analysis using CARTOframes built-in functionality without leaving the notebook.

Using the results, we’ve calculated a possible new location for a store, and used the isoline areas to help in the decision making process.

Take into account that finding optimal spots for new stores is not an easy task and requires more analysis, but this is a great first step!