CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

Location Data Services

Introduction

CARTOframes provides the functionality to use the CARTO Data Services API. This API consists of a set of location-based functions that can be applied to your data to perform geospatial analyses without leaving the context of your notebook.

For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform a trade areas analysis by computing isodistances or isochrones programmatically.

Given a set of ten simulated Starbucks store addresses, this guide walks through the use case of finding good location candidates to open an additional store.

Based on your account plan, some of these location data services are subject to different quota limitations

Data

This guide uses the same dataset of simulated Starbucks locations that has been used in the other guides and can be downloaded here.

Authentication

Using Location Data Services requires CARTO authentication. For more information about how to authenticate, please read the Authentication guide.

1
2
3
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

Geocoding

To get started, let’s read in and explore the Starbucks location data we have. With the Starbucks store data in a DataFrame, we can see that there are two columns that can be used in the geocoding service: name and address. There’s also a third column that reflects the annual revenue of the store.

1
2
3
4
import pandas

df = pandas.read_csv('../files/starbucks_brooklyn.csv')
df
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1.321041e+06
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1.268080e+06
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1.248134e+06
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1.185703e+06
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1.148427e+06
5 Court St & Dean St 167 Court Street,Brooklyn, NY 11201 1.144067e+06
6 Target Gateway T-1401 519 Gateway Dr,Brooklyn, NY 11239 1.021083e+06
7 3rd Ave & 92nd St 9202 Third Avenue,Brooklyn, NY 11209 9.257073e+05
8 Lam Group @ Sheraton Brooklyn 228 Duffield st,Brooklyn, NY 11201 7.657935e+05
9 33-42 Hillel Place 33-42 Hillel Place,Brooklyn, NY 11210 7.492163e+05

Quota consumption

Each time you run Location Data Services, you consume quota. For this reason, we provide the ability to check in advance the amount of credits an operation will consume by using the dry_run parameter when running the service function.

It is also possible to check the available quota by running the available_quota function.

1
2
3
4
5
6
7
8
9
10
11
from cartoframes.data.services import Geocoding

geo_service = Geocoding()

_, geo_dry_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)
1
geo_dry_metadata
1
2
3
4
5
{'total_rows': 10,
  'required_quota': 10,
  'previously_geocoded': 0,
  'previously_failed': 0,
  'records_with_geometry': 0}
1
geo_service.available_quota()
1
1470
1
2
3
4
5
6
geo_cdf, geo_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'}
)

If the input data file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.

In order to use cached results, we have to save the results to a CARTO table using the table_name and cached=True parameters.

1
2
3
4
5
6
7
8
geo_cdf, geo_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    table_name='starbucks_cache',
    cached=True
)

Let’s compare geo_dry_metadata and geo_metadata to see the differences between the information returned with and without the dry_run option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.

1
geo_metadata
1
2
3
4
5
6
7
8
9
{'total_rows': 10,
  'required_quota': 10,
  'previously_geocoded': 0,
  'previously_failed': 0,
  'records_with_geometry': 0,
  'final_records_with_geometry': 10,
  'geocoded_increment': 10,
  'successfully_geocoded': 10,
  'failed_geocodings': 0}

The resulting data is a CartoDataFrame that contains three new columns:

  • geometry: The resulting geometry
  • gc_status_rel: The percentage of accuracy of each location
  • carto_geocode_hash: Geocode information
1
geo_cdf.head()
name address revenue gc_status_rel carto_geocode_hash geometry
1 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1 POINT (-73.95901 40.67109)
2 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79 POINT (-73.96122 40.57796)
3 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718 POINT (-73.98976 40.61912)
4 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a POINT (-74.02744 40.63152)
5 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495 POINT (-74.00098 40.59321)

In addition, to prevent geocoding records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process.

This will happen automatically in these cases:

  1. Your input is a table from CARTO processed in place (without a table_name parameter)
  2. If you save your results to a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

If you try to geocode this DataFrame now, that contains both the_geom and the carto_geocode_hash, you will see that the required quota is 0 because it has already been geocoded.

1
2
3
4
5
6
7
8
9
_, repeat_geo_metadata = geo_service.geocode(
    geo_cdf,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)

repeat_geo_metadata.get('required_quota')
1
0

Precision

The address column is more complete than the name column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name column (0.95, 0.93, 0.96, 0.83, 0.78, 0.9) are lower than the ones we get by using the address column for geocoding (0.97, 0.99, 0.98).

1
2
3
4
5
6
geo_name_cdf, geo_name_metadata = geo_service.geocode(
    df,
    street='name',
    city={'value': 'New York'},
    country={'value': 'USA'}
)
1
geo_name_cdf.head()
name address revenue gc_status_rel carto_geocode_hash geometry
1 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.95 0be7693fc688eca36e1077656dcb00a5 POINT (-76.56478 39.30853)
2 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.95 084a5c4d42ccf3c3c8e69426619f270e POINT (-73.96122 40.57796)
3 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.93 1d9a17c20c11d0454aff10548a328c47 POINT (-73.99018 40.61914)
4 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.96 d531df27fc02336dc722cb4e7028b244 POINT (-74.02778 40.63146)
5 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.93 9d8c13b5b4a93591f427d3ce0b5b4ead POINT (-95.45238 29.83378)
1
geo_name_cdf.gc_status_rel.unique()
1
array([0.95, 0.93, 0.96, 0.83, 0.78, 0.9 ])
1
geo_cdf.head()
name address revenue gc_status_rel carto_geocode_hash geometry
1 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1 POINT (-73.95901 40.67109)
2 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79 POINT (-73.96122 40.57796)
3 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718 POINT (-73.98976 40.61912)
4 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a POINT (-74.02744 40.63152)
5 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495 POINT (-74.00098 40.59321)
1
geo_cdf.gc_status_rel.unique()
1
array([0.97, 0.99, 0.98])

Visualize the results

Finally, we can visualize the precision of the geocoded results using a CARTOframes visualization layer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from cartoframes.viz.helpers import color_bins_layer
from cartoframes.viz import Popup

color_bins_layer(
    geo_cdf,
    'gc_status_rel',
    method='equal',
    bins=geo_cdf.gc_status_rel.unique().size,
    title='Geocoding Precision',
    popup=Popup({
        'hover': [{
                'title': 'Address',
                'value': '$address'
            }, {
                'title': 'Precision',
                'value': '$gc_status_rel'
            }]
    })
)

Isolines

There are two Isoline functions: isochrones and isodistances. In this guide we will use the isochrones function to calculate walking areas by time for each Starbucks store and the isodistances function to calculate the walking area by distance.

By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:

  • Time in the case of isochrones
  • Distance in the case of isodistances

Isochrones

For isochones, let’s calculate the time ranges of: 5, 15 and 30 min. These ranges are input in seconds, so they will be 300, 900, and 1800 respectively.

1
2
3
4
5
from cartoframes.data.services import Isolines

iso_service = Isolines()

_, isochrones_dry_metadata = iso_service.isochrones(geo_cdf, [300, 900, 1800], mode='walk', dry_run=True)

Remember to always check the quota using dry_run parameter and available_quota method before running the service!

1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isochrones_dry_metadata.get('required_quota'))
)
1
available 1437, required 30
1
isochrones_cdf, isochrones_metadata = iso_service.isochrones(geo_cdf, [300, 900, 1800], mode='walk')
1
isochrones_cdf.head()
source_id data_range lower_data_range geometry range_label
1 1 300 0 MULTIPOLYGON (((-73.95911 40.67183, -73.95917 ... 5 min.
2 2 900 300 POLYGON ((-73.95934 40.68011, -73.95839 40.680... 15 min.
3 3 1800 900 POLYGON ((-73.95949 40.69066, -73.95753 40.692... 30 min.
4 4 300 0 MULTIPOLYGON (((-73.96177 40.58098, -73.96201 ... 5 min.
5 5 900 300 POLYGON ((-73.96229 40.57502, -73.96232 40.575... 15 min.

Visualize with the isolines layer

The most straightforward way of visualizing the resulting geometries is to use the isolines_layer visualization layer. This visualization layer uses the range_label column that is automatically added by the service to classify each polygon by category.

1
2
3
from cartoframes.viz.helpers import isolines_layer

isolines_layer(isochrones_cdf)

Isodistances

The isoline services accepts several options to manually change the resolution or the quality of the polygons. There’s more information about these settings in the Isolines Reference

1
2
3
4
5
6
7
8
isodistances_cdf, isodistances_dry_metadata = iso_service.isodistances(
    geo_cdf,
    [900, 1800, 3600],
    mode='walk',
    resolution=16.0,
    quality=1,
    dry_run=True
)
1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isodistances_dry_metadata.get('required_quota'))
)
1
  available 1407, required 30
1
2
3
4
5
6
7
8
isodistances_cdf, isodistances_metadata = iso_service.isodistances(
    geo_cdf,
    [900, 1800, 3600],
    mode='walk',
    mode_traffic='enabled',
    resolution=16.0,
    quality=2
)
1
isodistances_cdf.head()
source_id data_range lower_data_range geometry range_label
1 1 900 0 MULTIPOLYGON (((-73.95911 40.67183, -73.95917 ... 15 min.
2 2 1800 900 POLYGON ((-73.95953 40.67636, -73.95895 40.675... 30 min.
3 3 3600 1800 POLYGON ((-73.95968 40.68235, -73.95882 40.682... 60 min.
4 4 900 0 MULTIPOLYGON (((-73.96163 40.58063, -73.96221 ... 15 min.
5 5 1800 900 POLYGON ((-73.96180 40.57505, -73.96193 40.575... 30 min.
1
2
3
from cartoframes.viz.helpers import isolines_layer

isolines_layer(isodistances_cdf)

All together

Let’s visualize the data in one map to see what insights we can find.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from cartoframes.viz import Map
from cartoframes.viz.helpers import size_continuous_layer

Map([
    isolines_layer(
        isochrones_cdf,
        title='Walking Time'
    ),
    size_continuous_layer(
        geo_cdf,
        'revenue',
        title='Revenue $',
        color='white',
        opacity='0.2',
        stroke_color='blue',
        size=[20, 80],
        popup=Popup({
            'hover': [{
                    'title': 'Address',
                    'value': '$address'
                }, {
                    'title': 'Precision',
                    'value': '$gc_status_rel'
                }, {
                    'title': 'Revenue',
                    'value': '$revenue'
                }]
        })
    )
])

Looking at the map above, we can see the store at 228 Duffield St, Brooklyn, NY 11201 is really close to another store with higher revenue, which means we could even think about closing that one in favor of another one with a better location.

We could try to calculate where to place a new store between other stores that don’t have as much revenue as others and that are placed separately.

Now, let’s calculate the centroid of three different stores that we’ve identified previously and use it as a possible location for a new spot:

1
2
3
4
5
6
7
8
9
10
from shapely import geometry

new_store_location = [
    geo_cdf.iloc[6].the_geom,
    geo_cdf.iloc[9].the_geom,
    geo_cdf.iloc[1].the_geom
]

# Create a polygon using three points from the geo_cdf
polygon = geometry.Polygon([[p.x, p.y] for p in new_store_location])
1
2
3
4
5
6
7
8
9
from cartoframes import CartoDataFrame

new_store_cdf = CartoDataFrame(
    [['New Store', polygon.centroid.y, polygon.centroid.x]],
    columns=['name', 'lat', 'lon'])

new_store_cdf.set_geometry_from_xy('lon', 'lat', inplace=True)

isochrones_new_cdf, isochrones_new_metadata = iso_service.isochrones(new_store_cdf, [300, 900, 1800], mode='walk')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from cartoframes.viz import Layer

Map([
    isolines_layer(
        isochrones_cdf,
        title='Walking Time - Current',
        opacity='0.2'
    ),
    isolines_layer(
        isochrones_new_cdf,
        title='Walking Time - New',
    ),
    size_continuous_layer(
        geo_cdf,
        'revenue',
        title='Revenue $',
        color='white',
        opacity='0.2',
        stroke_color='blue',
        size=[20, 80],
        popup=Popup({
            'hover': [{
                    'title': 'Address',
                    'value': '$address'
                }, {
                    'title': 'Precision',
                    'value': '$gc_status_rel'
                }, {
                    'title': 'Revenue',
                    'value': '$revenue'
                }]
        })
    )
])

Conclusion

In this example you’ve seen how to use Location Data Services to perform a trade area analysis using CARTOframes built-in functionality without leaving the notebook.

Using the results, we’ve calculated a possible new location for a store, and used the isoline areas to help in the decision making process.

Take into account that finding optimal spots for new stores is not an easy task and requires more analysis, but this is a great first step!