Margara Tejera François Baptiste

and

Calculating Routes at Scale using SQL on BigQuery

Computing distance metrics is a challenging problem in spatial analysis. When studying urban travel, calculating distances between two locations as the crow flies is the most straightforward method but this approach often introduces gross errors. If we are after a more exact result, it is necessary to consider these distances as the shortest path taking into account the available transportation network.

At CARTO we have been working hard on making scalable spatial analytics possible through the implementation of spatial functions and procedures natively within Google BigQuery as part of our Spatial Extension. These functions can be further enhanced by using our curated data offering through the Data Observatory, also accessible through BigQuery.

For this scenario, we decided to take on the challenge of solving routing problems at scale natively in BigQuery, the results of which we are unveiling today with the release of a new routing module inside the CARTO Spatial Extension for BigQuery.

map from Carto

Want to learn more about our logistics solutions?

See examples

As with any spatial analysis challenge, the first step is to source relevant data. For routing, there are mainly two sources of information required. First we need the road network information over which we will compute distance metrics, the so-called shortest paths. OpenStreetMap data is a fantastic option for this, as it’s freely available in BigQuery. In particular, we are going to use the bigquery-public-data:geo_openstreetmap.planet_ways table where public roads are stored.

Second we need a set of point location pairs to use as start and end points to compute distances. For that, we decided to use the San Francisco Bike Share Trips dataset, which contains almost 2 million Bay Area Bike Share Trips since 2013 and is also publicly available in BigQuery under the name bigquery-public-data:san_francisco_bikeshare, thanks to the San Francisco Open Data Project.

Now that we have sourced our data, the next step is to transform the public road information offered by OpenStreetMap into a network object composed of nodes and edges. The aim of this object is to encode the road network information in a way that will enable us to efficiently perform shortest path computations. For that, we will make use of the GENERATE_NETWORK function:

DECLARE county_geom GEOGRAPHY;
 
SET county_geom = (
 SELECT county_geom from `bigquery-public-data.geo_us_boundaries.counties`
 WHERE county_name LIKE '%San Francisco%');
 
SELECT
 *
FROM
 UNNEST((
   SELECT
     `bqcarto.routing.GENERATE_NETWORK`(ARRAY_AGG(STRUCT(geometry, 1.)))
   FROM
     `bigquery-public-data.geo_openstreetmap.planet_ways`,
     UNNEST(all_tags) tag
   WHERE
       ST_INTERSECTS(geometry,
       county_geom)
     AND key = 'highway'
     AND value IN UNNEST(['trunk','primary','secondary','tertiary','unclassified','residential','living_street','road','pedestrian','path','cycleway']) ));

The result of this query is publicly available and can be found in cartobq.docs.san-francisco-network.

Screenshot of San Francisco Street Network


From the input San Francisco’s roads linestrings, the GENERATE_NETWORK function creates a collection of nodes and edges with an assigned weight representing the Euclidean distance between nodes. Moreover, as we can see in the map below, it drastically simplifies the topology whilst ensuring the total path’s lengths are preserved.

San Francisco’s roads linestrings on the left; the resulting network on the right.


Now, let’s put our network to work by computing the shortest paths for all bike share trips completed on the 25th of April 2018, using the FIND_SHORTEST_PATH_FROM_NETWORK function.

Screenshot of shortest path result


WITH
 agg_network AS (
 SELECT
   ARRAY_AGG(link) as network
 FROM
   `cartobq.docs.san-francisco-network` link ),
shortest_paths AS (
SELECT
 `bqcarto.routing.FIND_SHORTEST_PATH_FROM_NETWORK`(network,
   start_station_geom,
   end_station_geom) as paths, trip_id FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`, agg_network
   WHERE
 DATE(start_date)="2018-04-25"
 AND start_station_id=59 AND end_station_id != 59
   )
SELECT paths.geom, paths.weight, trip_id FROM shortest_paths

Using the DISTANCE_MAP_FROM_NETWORK function we can also use the same network to compute the distance to each node of the network from bike station 59, which was the starting point of all the paths we computed in the previous step.

DECLARE start_geom GEOGRAPHY;
SET start_geom = (
SELECT station_geom FROM  `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE station_id = 59);
 SELECT *
FROM
 UNNEST((
   SELECT
     `bqcarto.routing.DISTANCE_MAP_FROM_NETWORK`(ARRAY_AGG(link),
     start_geom)
   FROM
     `cartobq.docs.san-francisco-network` link));

This solution not only allows us to compute distances, but also to identify the most common routes and transited roads based on the assumption that users always choose the shortest path. Leveraging the processing power of BigQuery, let’s discover the most used street segments for more than a million bicycle trips.

-- Temporary function to split our routes into segments
-- so we can later analyze their usage independently
CREATE TEMP FUNCTION
 pairwise(myarray ARRAY<STRUCT<idx INT64,
   geom GEOGRAPHY>>) AS ((
   SELECT
     ARRAY_AGG(STRUCT(x AS a,
         myarray[SAFE_OFFSET(off-1)] AS b,
         ST_MAKELINE(x.geom,
           myarray[SAFE_OFFSET(off-1)].geom) AS geom))
   FROM
     UNNEST(myarray) AS x
   WITH OFFSET off
   WHERE myarray[SAFE_OFFSET(off-1)] IS NOT NULL ));
 
-- Get network to compute the shortest paths
WITH agg_network AS (
 SELECT ARRAY_AGG(link) network
 FROM `cartobq.docs.san-francisco-network` link ),
 
-- Get bikeshare trips, always considering the starting
-- station the one with lower ID
 bikeshare_trips AS (
 SELECT
 IF
   (start_station_id>end_station_id, start_station_id,
   end_station_id) start_station_id,
 IF
   (start_station_id>end_station_id, end_station_id,
   start_station_id) end_station_id,
   ANY_VALUE(
   IF
     (start_station_id>end_station_id,
       start_station_geom,
       end_station_geom)) start_station_geom,
   ANY_VALUE(
   IF
     (start_station_id>end_station_id,
       end_station_geom,
       start_station_geom)) end_station_geom,
   COUNT(*) count
 FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info` start_station
 ON
   start_station_id = start_station.station_id
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions` start_region
 ON start_region.region_id = start_station.region_id
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info` end_station
 ON end_station_id = end_station.station_id
 INNER JOIN
   `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions` end_region
 ON end_region.region_id = end_station.region_id
 WHERE
   start_station_id<>end_station_id
   AND start_region.name="San Francisco"
   AND end_region.name="San Francisco"
 GROUP BY start_station_id, end_station_id),
 
-- Compute shortest paths
 routes AS (
 SELECT `bqcarto.routing.FIND_SHORTEST_PATH_FROM_NETWORK`(network,
     start_station_geom,
     end_station_geom) compacted_route,
   count, start_station_id, end_station_id
 FROM
   agg_network, bikeshare_trips)
 
-- Get usage frequency for each segment of the resulting routes
SELECT
 ANY_VALUE(segments.geom) geo,
 SUM(count) count,
FROM routes route,
 UNNEST(pairwise(compacted_route.path)) segments
GROUP BY
 CONCAT(segments.a.idx,segments.b.idx)

This is just a taste of the possibilities that route computation using the Spatial Extension for BigQuery can offer, with many more use cases available. For instance, the GENERATE_NETWORK function supports the assignment of custom weights to each node of the network. In the example provided in this post we have set uniform weights throughout the network to keep it simple, but this feature is critical for the routing in real-world use cases where the maximum speed allowed in each type of road matters. We are also working on adding directionality to the edges of the network, so that not all can be traversed bidirectionally as it is the case in the current implementation. We will keep you posted on the progress!

We’d love for you to give this a try and give us suggestions on how to utilize this Spatial Extension module further. To get access, even if you are not a CARTO customer, please fill in this form and we will get back to you to set up access. All functions used in this blogpost are documented here.

The authors of this post would like to acknowledge Allen Day, Developer Advocate, Google Cloud for their work making the OpenStreetMap data available in BigQuery.
EU Flag This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 960401.
About the author
Margara Tejera

Product Manager at CARTO

More posts from Margara Tejera
About the author
François Baptiste

Software Engineer at CARTO

More posts from François Baptiste

Related Posts

Ready to optimize your territories with Location Intelligence?

Close circle icon

Contact us

Please fill out the below form and we'll be in touch real soon.