Calculating Routes at Scale using SQL on BigQuery

Summary

We are pleased to announce a new routing module as part of the Spatial Extension for Google BigQuery to enable route optimization at scale. Learn more & sign up!

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Calculating Routes at Scale using SQL on BigQuery

Computing distance metrics is a challenging problem in spatial analysis. When studying urban travel  calculating distances between two locations as the crow flies is the most straightforward method but this approach often introduces gross errors. If we are after a more exact result  it is necessary to consider these distances as the shortest path taking into account the available transportation network.

At CARTO we have been working hard on making scalable spatial analytics possible through the implementation of spatial functions and procedures natively within Google BigQuery as part of our Spatial Extension. These functions can be further enhanced by using our curated data offering through the  Data Observatory  also accessible through  BigQuery.

For this scenario  we decided to take on the challenge of solving routing problems at scale natively in BigQuery  the results of which we are unveiling today with the release of a new routing module inside the CARTO Spatial Extension for BigQuery.

 

   

     

map from Carto

   

   

     Want to learn more about our logistics solutions?              See examples          

 

As with any spatial analysis challenge  the first step is to source relevant data. For routing  there are mainly two sources of information required. First we need the road network information over which we will compute distance metrics   the so-called shortest paths. OpenStreetMap data is a fantastic option for this  as it’s freely available in BigQuery. In particular  we are going to use the bigquery-public-data:geo_openstreetmap.planet_ways table where public roads are stored.

Second we need a set of point location pairs to use as start and end points to compute distances. For that  we decided to use the San Francisco Bike Share Trips dataset  which contains almost 2 million Bay Area Bike Share Trips since 2013 and is also publicly available in BigQuery under the name bigquery-public-data:san_francisco_bikeshare  thanks to the San Francisco Open Data Project.

Now that we have sourced our data  the next step is to transform the public road information offered by OpenStreetMap into a network object composed of nodes and edges. The aim of this object is to encode the road network information in a way that will enable us to efficiently perform shortest path computations. For that  we will make use of the GENERATE_NETWORK function:

DECLARE county_geom GEOGRAPHY;

SET county_geom = (
 SELECT county_geom from `bigquery-public-data.geo_us_boundaries.counties`
 WHERE county_name LIKE '%San Francisco%');

SELECT
 *
FROM
 UNNEST((
   SELECT
     `bqcarto.routing.GENERATE_NETWORK`(ARRAY_AGG(STRUCT(geometry  1.)))
   FROM
     `bigquery-public-data.geo_openstreetmap.planet_ways` 
     UNNEST(all_tags) tag
   WHERE
       ST_INTERSECTS(geometry 
       county_geom)
     AND key = 'highway'
     AND value IN UNNEST(['trunk' 'primary' 'secondary' 'tertiary' 'unclassified' 'residential' 'living_street' 'road' 'pedestrian' 'path' 'cycleway']) ));

The result of this query is publicly available and can be found in cartobq.docs.san-francisco-network.

Screenshot of San Francisco Street Network


From the input San Francisco’s roads linestrings  the GENERATE_NETWORK function creates a collection of nodes and edges with an assigned weight representing the Euclidean distance between nodes. Moreover  as we can see in the map below  it drastically simplifies the topology whilst ensuring the total path’s lengths are preserved.

San Francisco’s roads linestrings on the left; the resulting network on the right.

Now  let’s put our network to work by computing the shortest paths for all bike share trips completed on the 25th of April 2018  using the FIND_SHORTEST_PATH_FROM_NETWORK function.

Screenshot of shortest path result


WITH
 agg_network AS (
 SELECT
   ARRAY_AGG(link) as network
 FROM
   `cartobq.docs.san-francisco-network` link ) 
shortest_paths AS (
SELECT
 `bqcarto.routing.FIND_SHORTEST_PATH_FROM_NETWORK`(network 
   start_station_geom 
   end_station_geom) as paths  trip_id FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`  agg_network
   WHERE
 DATE(start_date)="2018-04-25"
 AND start_station_id=59 AND end_station_id != 59
   )
SELECT paths.geom  paths.weight  trip_id FROM shortest_paths

Using the DISTANCE_MAP_FROM_NETWORK function we can also use the same network to compute the distance to each node of the network from bike station 59  which was the starting point of all the paths we computed in the previous step.

DECLARE start_geom GEOGRAPHY;
SET start_geom = (
SELECT station_geom FROM  `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE station_id = 59);
 SELECT *
FROM
 UNNEST((
   SELECT
     `bqcarto.routing.DISTANCE_MAP_FROM_NETWORK`(ARRAY_AGG(link) 
     start_geom)
   FROM
     `cartobq.docs.san-francisco-network` link));

This solution not only allows us to compute distances  but also to identify the most common routes and transited roads based on the assumption that users always choose the shortest path. Leveraging the processing power of BigQuery  let’s discover the most used street segments for more than a million bicycle trips.

-- Temporary function to split our routes into segments
-- so we can later analyze their usage independently
CREATE TEMP FUNCTION
 pairwise(myarray ARRAY<STRUCT<idx INT64 
   geom GEOGRAPHY>>) AS ((
   SELECT
     ARRAY_AGG(STRUCT(x AS a 
         myarray[SAFE_OFFSET(off-1)] AS b 
         ST_MAKELINE(x.geom 
           myarray[SAFE_OFFSET(off-1)].geom) AS geom))
   FROM
     UNNEST(myarray) AS x
   WITH OFFSET off
   WHERE myarray[SAFE_OFFSET(off-1)] IS NOT NULL ));

-- Get network to compute the shortest paths
WITH agg_network AS (
 SELECT ARRAY_AGG(link) network
 FROM `cartobq.docs.san-francisco-network` link ) 

-- Get bikeshare trips  always considering the starting
-- station the one with lower ID
 bikeshare_trips AS (
 SELECT
 IF
   (start_station_id>end_station_id  start_station_id 
   end_station_id) start_station_id 
 IF
   (start_station_id>end_station_id  end_station_id 
   start_station_id) end_station_id 
   ANY_VALUE(
   IF
     (start_station_id>end_station_id 
       start_station_geom 
       end_station_geom)) start_station_geom 
   ANY_VALUE(
   IF
     (start_station_id>end_station_id 
       end_station_geom 
       start_station_geom)) end_station_geom 
   COUNT(*) count
 FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info` start_station
 ON
   start_station_id = start_station.station_id
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions` start_region
 ON start_region.region_id = start_station.region_id
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info` end_station
 ON end_station_id = end_station.station_id
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions` end_region
 ON end_region.region_id = end_station.region_id
 WHERE
   start_station_id<>end_station_id
   AND start_region.name="San Francisco"
   AND end_region.name="San Francisco"
 GROUP BY start_station_id  end_station_id) 

-- Compute shortest paths
 routes AS (
 SELECT `bqcarto.routing.FIND_SHORTEST_PATH_FROM_NETWORK`(network 
     start_station_geom 
     end_station_geom) compacted_route 
   count  start_station_id  end_station_id
 FROM
   agg_network  bikeshare_trips)

-- Get usage frequency for each segment of the resulting routes
SELECT
 ANY_VALUE(segments.geom) geo 
 SUM(count) count 
FROM routes route 
 UNNEST(pairwise(compacted_route.path)) segments
GROUP BY
 CONCAT(segments.a.idx segments.b.idx)

This is just a taste of the possibilities that route computation using the Spatial Extension for BigQuery can offer  with many more use cases available. For instance  the GENERATE_NETWORK function supports the assignment of custom weights to each node of the network. In the example provided in this post we have set uniform weights throughout the network to keep it simple  but this feature is critical for the routing in real-world use cases where the maximum speed allowed in each type of road matters. We are also working on adding directionality to the edges of the network  so that not all can be traversed bidirectionally as it is the case in the current implementation. We will keep you posted on the progress!

We’d love for you to give this a try and give us suggestions on how to utilize this Spatial Extension module further. To get access  even if you are not a CARTO customer  please fill in this form and we will get back to you to set up access. All functions used in this blogpost are documented here.

 

   

     

map from Carto

   

   

     Want to try this for yourself?              Get access to the Spatial Extension for BigQuery          

 

The authors of this post would like to acknowledge Allen Day  Developer Advocate  Google Cloud for their work making the OpenStreetMap data available in BigQuery.

EU Flag This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 960401.