Calculating Routes at Scale using SQL on BigQuery

Computing distance metrics is a challenging problem in spatial analysis. When studying urban travel calculating distances between two locations as the crow flies is the most straightforward method but this approach often introduces gross errors. If we are after a more exact result it is necessary to consider these distances as the shortest path taking into account the available transportation network.
At CARTO we have been working hard on making scalable spatial analytics possible through the implementation of spatial functions and procedures natively within Google BigQuery as part of our Spatial Extension. These functions can be further enhanced by using our curated data offering through the Data Observatory also accessible through BigQuery.
For this scenario we decided to take on the challenge of solving routing problems at scale natively in BigQuery the results of which we are unveiling today with the release of a new routing module inside the CARTO Spatial Extension for BigQuery.

Want to learn more about our logistics solutions? See examples
As with any spatial analysis challenge the first step is to source relevant data. For routing there are mainly two sources of information required. First we need the road network information over which we will compute distance metrics the so-called shortest paths. OpenStreetMap data is a fantastic option for this as it’s freely available in BigQuery. In particular we are going to use the bigquery-public-data:geo_openstreetmap.planet_ways table where public roads are stored.
Second we need a set of point location pairs to use as start and end points to compute distances. For that we decided to use the San Francisco Bike Share Trips dataset which contains almost 2 million Bay Area Bike Share Trips since 2013 and is also publicly available in BigQuery under the name bigquery-public-data:san_francisco_bikeshare thanks to the San Francisco Open Data Project.
Now that we have sourced our data the next step is to transform the public road information offered by OpenStreetMap into a network object composed of nodes and edges. The aim of this object is to encode the road network information in a way that will enable us to efficiently perform shortest path computations. For that we will make use of the GENERATE_NETWORK function:
The result of this query is publicly available and can be found in cartobq.docs.san-francisco-network.

From the input San Francisco’s roads linestrings the GENERATE_NETWORK function creates a collection of nodes and edges with an assigned weight representing the Euclidean distance between nodes. Moreover as we can see in the map below it drastically simplifies the topology whilst ensuring the total path’s lengths are preserved.
San Francisco’s roads linestrings on the left; the resulting network on the right.
Now let’s put our network to work by computing the shortest paths for all bike share trips completed on the 25th of April 2018 using the FIND_SHORTEST_PATH_FROM_NETWORK function.

Using the DISTANCE_MAP_FROM_NETWORK function we can also use the same network to compute the distance to each node of the network from bike station 59 which was the starting point of all the paths we computed in the previous step.
This solution not only allows us to compute distances but also to identify the most common routes and transited roads based on the assumption that users always choose the shortest path. Leveraging the processing power of BigQuery let’s discover the most used street segments for more than a million bicycle trips.
This is just a taste of the possibilities that route computation using the Spatial Extension for BigQuery can offer with many more use cases available. For instance the GENERATE_NETWORK function supports the assignment of custom weights to each node of the network. In the example provided in this post we have set uniform weights throughout the network to keep it simple but this feature is critical for the routing in real-world use cases where the maximum speed allowed in each type of road matters. We are also working on adding directionality to the edges of the network so that not all can be traversed bidirectionally as it is the case in the current implementation. We will keep you posted on the progress!
We’d love for you to give this a try and give us suggestions on how to utilize this Spatial Extension module further. To get access even if you are not a CARTO customer please fill in this form and we will get back to you to set up access. All functions used in this blogpost are documented here.

Want to try this for yourself? Get access to the Spatial Extension for BigQuery
The authors of this post would like to acknowledge Allen Day Developer Advocate Google Cloud for their work making the OpenStreetMap data available in BigQuery.