CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

This version includes breaking changes, check the CHANGELOG for more information

Location Data Services

Introduction

CARTOframes provides the functionality of using the CARTO Data Services API. This API consists of a set of location based functions that can be applied to your data in order to perform geospatial analyses without leaving the context of your notebook.

For instance, you can geocode a pandas DataFrame with addresses on the fly, and then perform trade areas analysis by computing isodistances or isochrones programatically.

In this guide we go through the use case of, given a set of ten Starbucks store addresses, finding good location candidates to open another store.

Based on your account plan, some of these location data services are subject to different quota limitations

Data

We will be using the same dataset of fake locations used along these guides starbucks_brooklyn.csv

Authentication

Using Location Data Services requires to be authenticated. For more information about how to authenticate, please read the Login to CARTO Platform guide

1
2
3
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

Geocoding

The first step is to read and understand the data we have. Once we’ve the Starbucks store data in a DataFrame, we can see we’ve two columns that can be used in the geocoding service: name and address. There’s also a third column that reflects the anual revenue of the store.

1
2
3
4
import pandas

df = pandas.read_csv('../files/starbucks_brooklyn.csv')
df
name address revenue
0 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1.321041e+06
1 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1.268080e+06
2 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1.248134e+06
3 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1.185703e+06
4 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1.148427e+06
5 Court St & Dean St 167 Court Street,Brooklyn, NY 11201 1.144067e+06
6 Target Gateway T-1401 519 Gateway Dr,Brooklyn, NY 11239 1.021083e+06
7 3rd Ave & 92nd St 9202 Third Avenue,Brooklyn, NY 11209 9.257073e+05
8 Lam Group @ Sheraton Brooklyn 228 Duffield st,Brooklyn, NY 11201 7.657935e+05
9 33-42 Hillel Place 33-42 Hillel Place,Brooklyn, NY 11210 7.492163e+05

Quota consumption

Each time you run a Location Data Service, you’re consuming quota. For this reason, we provide the hability to check in advance the amount of credits this operation will consume by using the dry_run parameter when running the service function.

Also, it is possible to check the available quota by running the available_quota function.

1
2
3
4
5
6
7
8
9
10
11
from cartoframes.data.services import Geocoding

geo_service = Geocoding()

_, geo_dry_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)
1
geo_dry_metadata
1
2
3
4
5
{'total_rows': 10,
  'required_quota': 10,
  'previously_geocoded': 0,
  'previously_failed': 0,
  'records_with_geometry': 0}
1
geo_service.available_quota()
1
1470
1
2
3
4
5
6
geo_cdf, geo_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'}
)

If the CSV file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.

In order to be able to use cached results, we have to save the results in a CARTO table using table_name and cached=True parameters.

1
2
3
4
5
6
7
8
geo_cdf, geo_metadata = geo_service.geocode(
    df,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    table_name='starbucks_cache',
    cached=True
)

Let’s compare the geo_dry_metadata and the geo_metadata to see the differences between the information when using or not the dry_run option. As we can see, this information reflects that all the locations have been geocoded successfully and that it has consumed 10 credits of quota.

1
geo_metadata
1
2
3
4
5
6
7
8
9
{'total_rows': 10,
  'required_quota': 10,
  'previously_geocoded': 0,
  'previously_failed': 0,
  'records_with_geometry': 0,
  'final_records_with_geometry': 10,
  'geocoded_increment': 10,
  'successfully_geocoded': 10,
  'failed_geocodings': 0}

The resulting data is a CartoDataFrame that contains three new columns:

  • geometry: The resulting geometry
  • gc_status_rel: The percentage of accuracy of each location
  • carto_geocode_hash: Geocode information
1
geo_cdf.head()
name address revenue gc_status_rel carto_geocode_hash geometry
1 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1 POINT (-73.95901 40.67109)
2 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79 POINT (-73.96122 40.57796)
3 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718 POINT (-73.98976 40.61912)
4 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a POINT (-74.02744 40.63152)
5 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495 POINT (-74.00098 40.59321)

In addition, to prevent having to geocode records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process.

This will happen automatically in these cases:

  1. Your input is a table from CARTO processed in place (without a table_name parameter)
  2. If you save your results in a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

If try to geocode now this DataFrame, which contains both the_geom and the carto_geocode_hash, we can see that the required quota is 0 cause it has already been geocoded.

1
2
3
4
5
6
7
8
9
_, repeat_geo_metadata = geo_service.geocode(
    geo_cdf,
    street='address',
    city={'value': 'New York'},
    country={'value': 'USA'},
    dry_run=True
)

repeat_geo_metadata.get('required_quota')
1
0

Precision

The address column is more complete than the name column, and therefore, the resulting coordinates calculated by the service will be more accurate. If we check this, the accuracy values using the name column (0.95, 0.93, 0.96, 0.83, 0.78, 0.9) are lower than the ones we get by using the address column for geocoding (0.97, 0.99, 0.98)

1
2
3
4
5
6
geo_name_cdf, geo_name_metadata = geo_service.geocode(
    df,
    street='name',
    city={'value': 'New York'},
    country={'value': 'USA'}
)
1
geo_name_cdf.head()
name address revenue gc_status_rel carto_geocode_hash geometry
1 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.95 0be7693fc688eca36e1077656dcb00a5 POINT (-76.56478 39.30853)
2 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.95 084a5c4d42ccf3c3c8e69426619f270e POINT (-73.96122 40.57796)
3 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.93 1d9a17c20c11d0454aff10548a328c47 POINT (-73.99018 40.61914)
4 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.96 d531df27fc02336dc722cb4e7028b244 POINT (-74.02778 40.63146)
5 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.93 9d8c13b5b4a93591f427d3ce0b5b4ead POINT (-95.45238 29.83378)
1
geo_name_cdf.gc_status_rel.unique()
1
array([0.95, 0.93, 0.96, 0.83, 0.78, 0.9 ])
1
geo_cdf.head()
name address revenue gc_status_rel carto_geocode_hash geometry
1 Franklin Ave & Eastern Pkwy 341 Eastern Pkwy,Brooklyn, NY 11238 1321040.772 0.97 c834a8e289e5bce280775a9bf1f833f1 POINT (-73.95901 40.67109)
2 607 Brighton Beach Ave 607 Brighton Beach Avenue,Brooklyn, NY 11235 1268080.418 0.99 7d39a3fff93efd9034da88aa9ad2da79 POINT (-73.96122 40.57796)
3 65th St & 18th Ave 6423 18th Avenue,Brooklyn, NY 11204 1248133.699 0.98 1a2312049ddea753ba42bf77f5ccf718 POINT (-73.98976 40.61912)
4 Bay Ridge Pkwy & 3rd Ave 7419 3rd Avenue,Brooklyn, NY 11209 1185702.676 0.98 827ab4dcc2d49d5fd830749597976d4a POINT (-74.02744 40.63152)
5 Caesar's Bay Shopping Center 8973 Bay Parkway,Brooklyn, NY 11214 1148427.411 0.98 119a38c7b51195cd4153fc81605a8495 POINT (-74.00098 40.59321)
1
geo_cdf.gc_status_rel.unique()
1
array([0.97, 0.99, 0.98])

Visualize the results

Finally, we can visualize through CARTOframes helpers the geocoding results by precision.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from cartoframes.viz.helpers import color_bins_layer
from cartoframes.viz import Popup

color_bins_layer(
    geo_cdf,
    'gc_status_rel',
    method='equal',
    bins=geo_cdf.gc_status_rel.unique().size,
    title='Geocoding Precision',
    popup=Popup({
        'hover': [{
                'title': 'Address',
                'value': '$address'
            }, {
                'title': 'Precision',
                'value': '$gc_status_rel'
            }]
    })
)

Isolines

There are two Isolines functions: isochrones and isodistances. In this guide we’re using the isochrones to know the walking area by time for each Starbucks store and the isodistances to discover the walking area by distance.

By definition, isolines are concentric polygons that display equally calculated levels over a given surface area, and they are calculated as the intersection areas from the origin point, measured by:

  • Time in the case of isochrones
  • Distance in the case of isodistances

Isolines: Isochrones

We’re going to use these values to set the ranges: 5, 15 and 30 min. These ranges are in seconds, so they will be 300, 900, and 1800 respectively.

1
2
3
4
5
from cartoframes.data.services import Isolines

iso_service = Isolines()

_, isochrones_dry_metadata = iso_service.isochrones(geo_cdf, [300, 900, 1800], mode='walk', dry_run=True)

Remember to always check the quota using dry_run parameter and available_quota method before running the service!

1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isochrones_dry_metadata.get('required_quota'))
)
1
available 1437, required 30
1
isochrones_cdf, isochrones_metadata = iso_service.isochrones(geo_cdf, [300, 900, 1800], mode='walk')
1
isochrones_cdf.head()
source_id data_range lower_data_range geometry range_label
1 1 300 0 MULTIPOLYGON (((-73.95911 40.67183, -73.95917 ... 5 min.
2 2 900 300 POLYGON ((-73.95934 40.68011, -73.95839 40.680... 15 min.
3 3 1800 900 POLYGON ((-73.95949 40.69066, -73.95753 40.692... 30 min.
4 4 300 0 MULTIPOLYGON (((-73.96177 40.58098, -73.96201 ... 5 min.
5 5 900 300 POLYGON ((-73.96229 40.57502, -73.96232 40.575... 15 min.

The isolines helper

The most straight forward way of visualizing the the resulting geometries is by using the isolines_layer helper. It will use the range_label column added automatically by the service to classify each polygon by category.

1
2
3
from cartoframes.viz.helpers import isolines_layer

isolines_layer(isochrones_cdf)

Isolines: Isodistances

The isoline services accepts several options to manually change the resolution or the quality of the polygons. There’s more information about these settings in the Isolines Reference

1
2
3
4
5
6
7
8
isodistances_cdf, isodistances_dry_metadata = iso_service.isodistances(
    geo_cdf,
    [900, 1800, 3600],
    mode='walk',
    resolution=16.0,
    quality=1,
    dry_run=True
)
1
2
3
4
print('available {0}, required {1}'.format(
    iso_service.available_quota(),
    isodistances_dry_metadata.get('required_quota'))
)
1
  available 1407, required 30
1
2
3
4
5
6
7
8
isodistances_cdf, isodistances_metadata = iso_service.isodistances(
    geo_cdf,
    [900, 1800, 3600],
    mode='walk',
    mode_traffic='enabled',
    resolution=16.0,
    quality=2
)
1
isodistances_cdf.head()
source_id data_range lower_data_range geometry range_label
1 1 900 0 MULTIPOLYGON (((-73.95911 40.67183, -73.95917 ... 15 min.
2 2 1800 900 POLYGON ((-73.95953 40.67636, -73.95895 40.675... 30 min.
3 3 3600 1800 POLYGON ((-73.95968 40.68235, -73.95882 40.682... 60 min.
4 4 900 0 MULTIPOLYGON (((-73.96163 40.58063, -73.96221 ... 15 min.
5 5 1800 900 POLYGON ((-73.96180 40.57505, -73.96193 40.575... 30 min.
1
2
3
from cartoframes.viz.helpers import isolines_layer

isolines_layer(isodistances_cdf)

All together

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from cartoframes.viz import Map
from cartoframes.viz.helpers import size_continuous_layer

Map([
    isolines_layer(
        isochrones_cdf,
        title='Walking Time'
    ),
    size_continuous_layer(
        geo_cdf,
        'revenue',
        title='Revenue $',
        color='white',
        opacity='0.2',
        stroke_color='blue',
        size=[20, 80],
        popup=Popup({
            'hover': [{
                    'title': 'Address',
                    'value': '$address'
                }, {
                    'title': 'Precision',
                    'value': '$gc_status_rel'
                }, {
                    'title': 'Revenue',
                    'value': '$revenue'
                }]
        })
    )
])

We observe the store at 228 Duffield st, Brooklyn, NY 11201 is really close to another store with higher revenue, which means we could even think about closing that one in favor to another one with a better location.

We could try to calculate where to place a new possible store between other stores that don’t have as much revenue as others and that are placed separately.

Now, let’s calculate the centroid of three different stores that we’ve identified previously and use it as a possible location for a new spot:

1
2
3
4
5
6
7
8
9
10
from shapely import geometry

new_store_location = [
    geo_cdf.iloc[6].the_geom,
    geo_cdf.iloc[9].the_geom,
    geo_cdf.iloc[1].the_geom
]

# Create a polygon using three points from the geo_cdf
polygon = geometry.Polygon([[p.x, p.y] for p in new_store_location])
1
2
3
4
5
6
7
8
9
from cartoframes import CartoDataFrame

new_store_cdf = CartoDataFrame(
    [['New Store', polygon.centroid.y, polygon.centroid.x]],
    columns=['name', 'lat', 'lon'])

new_store_cdf.set_geometry_from_xy('lon', 'lat', inplace=True)

isochrones_new_cdf, isochrones_new_metadata = iso_service.isochrones(new_store_cdf, [300, 900, 1800], mode='walk')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from cartoframes.viz import Layer

Map([
    isolines_layer(
        isochrones_cdf,
        title='Walking Time - Current',
        opacity='0.2'
    ),
    isolines_layer(
        isochrones_new_cdf,
        title='Walking Time - New',
    ),
    size_continuous_layer(
        geo_cdf,
        'revenue',
        title='Revenue $',
        color='white',
        opacity='0.2',
        stroke_color='blue',
        size=[20, 80],
        popup=Popup({
            'hover': [{
                    'title': 'Address',
                    'value': '$address'
                }, {
                    'title': 'Precision',
                    'value': '$gc_status_rel'
                }, {
                    'title': 'Revenue',
                    'value': '$revenue'
                }]
        })
    )
])

Conclusion

In this example we’ve explained how to use the Location Data Services to perform trade areas analysis easily using CARTOframes built-in functionality without leaving the notebook.

As a result, we’ve calculated a possible new location for our store, and we can check how the isoline areas of our interest can influence in our decission.

Take into account that finding optimal spots for new stores is not an easy task and requires more analysis, but this is a great first step!