and

# Calculating Routes at Scale using SQL on BigQuery

Computing distance metrics is a challenging problem in spatial analysis. When studying urban travel, calculating distances between two locations as the crow flies is the most straightforward method but this approach often introduces gross errors. If we are after a more exact result, it is necessary to consider these distances as the shortest path taking into account the available transportation network.

At CARTO we have been working hard on making scalable spatial analytics possible through the implementation of spatial functions and procedures natively within Google BigQuery as part of our Spatial Extension. These functions can be further enhanced by using our curated data offering through the Data Observatory, also accessible through BigQuery.

For this scenario, we decided to take on the challenge of solving routing problems at scale natively in BigQuery, the results of which we are unveiling today with the release of a new routing module inside the CARTO Spatial Extension for BigQuery.

As with any spatial analysis challenge, the first step is to source relevant data. For routing, there are mainly two sources of information required. First we need the road network information over which we will compute distance metrics, the so-called shortest paths. OpenStreetMap data is a fantastic option for this, as it’s freely available in BigQuery. In particular, we are going to use the bigquery-public-data:geo_openstreetmap.planet_ways table where public roads are stored.

Second we need a set of point location pairs to use as start and end points to compute distances. For that, we decided to use the San Francisco Bike Share Trips dataset, which contains almost 2 million Bay Area Bike Share Trips since 2013 and is also publicly available in BigQuery under the name bigquery-public-data:san_francisco_bikeshare, thanks to the San Francisco Open Data Project.

Now that we have sourced our data, the next step is to transform the public road information offered by OpenStreetMap into a network object composed of nodes and edges. The aim of this object is to encode the road network information in a way that will enable us to efficiently perform shortest path computations. For that, we will make use of the GENERATE_NETWORK function:

DECLARE county_geom GEOGRAPHY;

SET county_geom = (
SELECT county_geom from bigquery-public-data.geo_us_boundaries.counties
WHERE county_name LIKE '%San Francisco%');

SELECT
*
FROM
UNNEST((
SELECT
bqcarto.routing.GENERATE_NETWORK(ARRAY_AGG(STRUCT(geometry, 1.)))
FROM
bigquery-public-data.geo_openstreetmap.planet_ways,
UNNEST(all_tags) tag
WHERE
ST_INTERSECTS(geometry,
county_geom)
AND key = 'highway'


The result of this query is publicly available and can be found in cartobq.docs.san-francisco-network.

From the input San Francisco’s roads linestrings, the GENERATE_NETWORK function creates a collection of nodes and edges with an assigned weight representing the Euclidean distance between nodes. Moreover, as we can see in the map below, it drastically simplifies the topology whilst ensuring the total path’s lengths are preserved.

San Francisco’s roads linestrings on the left; the resulting network on the right.

Now, let’s put our network to work by computing the shortest paths for all bike share trips completed on the 25th of April 2018, using the FIND_SHORTEST_PATH_FROM_NETWORK function.

WITH
agg_network AS (
SELECT
FROM
cartobq.docs.san-francisco-network link ),
shortest_paths AS (
SELECT
bqcarto.routing.FIND_SHORTEST_PATH_FROM_NETWORK(network,
start_station_geom,
end_station_geom) as paths, trip_id FROM bigquery-public-data.san_francisco_bikeshare.bikeshare_trips, agg_network
WHERE
DATE(start_date)="2018-04-25"
AND start_station_id=59 AND end_station_id != 59
)
SELECT paths.geom, paths.weight, trip_id FROM shortest_paths


Using the DISTANCE_MAP_FROM_NETWORK function we can also use the same network to compute the distance to each node of the network from bike station 59, which was the starting point of all the paths we computed in the previous step.

DECLARE start_geom GEOGRAPHY;
SET start_geom = (
SELECT station_geom FROM  bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info
WHERE station_id = 59);
SELECT *
FROM
UNNEST((
SELECT
bqcarto.routing.DISTANCE_MAP_FROM_NETWORK(ARRAY_AGG(link),
start_geom)
FROM
cartobq.docs.san-francisco-network link));


This solution not only allows us to compute distances, but also to identify the most common routes and transited roads based on the assumption that users always choose the shortest path. Leveraging the processing power of BigQuery, let’s discover the most used street segments for more than a million bicycle trips.

-- Temporary function to split our routes into segments
-- so we can later analyze their usage independently
CREATE TEMP FUNCTION
pairwise(myarray ARRAY<STRUCT<idx INT64,
geom GEOGRAPHY>>) AS ((
SELECT
ARRAY_AGG(STRUCT(x AS a,
myarray[SAFE_OFFSET(off-1)] AS b,
ST_MAKELINE(x.geom,
myarray[SAFE_OFFSET(off-1)].geom) AS geom))
FROM
UNNEST(myarray) AS x
WITH OFFSET off
WHERE myarray[SAFE_OFFSET(off-1)] IS NOT NULL ));

-- Get network to compute the shortest paths
WITH agg_network AS (
FROM cartobq.docs.san-francisco-network link ),

-- Get bikeshare trips, always considering the starting
-- station the one with lower ID
bikeshare_trips AS (
SELECT
IF
(start_station_id>end_station_id, start_station_id,
end_station_id) start_station_id,
IF
(start_station_id>end_station_id, end_station_id,
start_station_id) end_station_id,
ANY_VALUE(
IF
(start_station_id>end_station_id,
start_station_geom,
end_station_geom)) start_station_geom,
ANY_VALUE(
IF
(start_station_id>end_station_id,
end_station_geom,
start_station_geom)) end_station_geom,
COUNT(*) count
FROM bigquery-public-data.san_francisco_bikeshare.bikeshare_trips
INNER JOIN
bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info start_station
ON
start_station_id = start_station.station_id
INNER JOIN
bigquery-public-data.san_francisco_bikeshare.bikeshare_regions start_region
ON start_region.region_id = start_station.region_id
INNER JOIN
bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info end_station
ON end_station_id = end_station.station_id
INNER JOIN
bigquery-public-data.san_francisco_bikeshare.bikeshare_regions end_region
ON end_region.region_id = end_station.region_id
WHERE
start_station_id<>end_station_id
AND start_region.name="San Francisco"
AND end_region.name="San Francisco"
GROUP BY start_station_id, end_station_id),

-- Compute shortest paths
routes AS (
SELECT bqcarto.routing.FIND_SHORTEST_PATH_FROM_NETWORK(network,
start_station_geom,
end_station_geom) compacted_route,
count, start_station_id, end_station_id
FROM
agg_network, bikeshare_trips)

-- Get usage frequency for each segment of the resulting routes
SELECT
ANY_VALUE(segments.geom) geo,
SUM(count) count,
FROM routes route,
UNNEST(pairwise(compacted_route.path)) segments
GROUP BY
CONCAT(segments.a.idx,segments.b.idx)


This is just a taste of the possibilities that route computation using the Spatial Extension for BigQuery can offer, with many more use cases available. For instance, the GENERATE_NETWORK function supports the assignment of custom weights to each node of the network. In the example provided in this post we have set uniform weights throughout the network to keep it simple, but this feature is critical for the routing in real-world use cases where the maximum speed allowed in each type of road matters. We are also working on adding directionality to the edges of the network, so that not all can be traversed bidirectionally as it is the case in the current implementation. We will keep you posted on the progress!

We’d love for you to give this a try and give us suggestions on how to utilize this Spatial Extension module further. To get access, even if you are not a CARTO customer, please fill in this form and we will get back to you to set up access. All functions used in this blogpost are documented here.

The authors of this post would like to acknowledge Allen Day, Developer Advocate, Google Cloud for their work making the OpenStreetMap data available in BigQuery.
 This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 960401.

Product Manager at CARTO

Software Engineer at CARTO

• ## Databricks support for H3 in collaboration with CARTO

Over the past few months we have been working with Databricks to add built-in support for H3, and this added functionality was released recently. Native support for H3 mean...

• ## What's New in CARTO - Q2 2022

As we approach the midpoint of the year, it is time for us to share with you some of the exciting new enhancements we have developed for the most recent release of the CART...

• ## CARTO and Google Cloud Announce Climate Insights for Infrastructure

The ever-growing threat of climate change on the built environment cannot be ignored. Intensifying wildfires, storms, floods and sea level rise not only impact the health a...