Javier de la Torre Margara Tejera

and

Enhancing Geospatial in BigQuery with CARTO Spatial Extension

At CARTO, we firmly believe that the future of geospatial analysis lies in the realm of next generation cloud data warehouses. Traditional data sources (such as census data or survey data) are rapidly being replaced by vast volumes of ever-changing, real-time data streams from multiple sources, and that is pushing the requirements of computing engines to perform spatial analytics at a scale never seen before.

But it is not only a matter of scalability. Platforms like Google BigQuery, provide significant cost savings with their compute-storage separation, and support for SQL. With SQL as the lingua franca for analytics, we believe it will democratize access to many analytical capabilities in the future, and in the GIS world, this trend is clear and unstoppable. This began almost 20 years ago with PostGIS, and is now migrating over to cloud native platforms.

Graphic showing trend migrating from PostGIS to geospatial enabled cloud platforms


BigQuery was the first of these cloud data warehouses to add spatial support to their engine back in 2018, opening the door to move more and more spatial workflows there. However it still lacks some of the analytical capabilities required for many geospatial use cases. Based on the work from The Urban Analytics Lab we can see a comparison of the number of geospatial functionalities, compared to PostGIS.

Comparing available geospatial functions across platforms compared to PostGIS


Building upon the Urban Analytics Lab’s initial study, we have collated a more complete and up-to-date view of the available geospatial functions across platforms. The results of this study can be found here.

In 2019, we introduced a preliminary set of spatial functions for BigQuery through the Open Source project jslibs. But we always knew this was just the beginning. So many more functions and capabilities are needed to make a data warehouse a complete and foundational Spatial Analytics engine. We have already added capabilities to create tilesets in BigQuery, and we are now excited to announce the expansion of those capabilities.

Introducing the CARTO Spatial Extension for BigQuery

Imagine you could run this simple command:

SELECT ST_GETPOPULATIONDENSITY(ST_GEOGPOINT(-75.0, 40.7))

To compute the population density anywhere in the world. Or combine two datasets simply by doing a JOIN:

SELECT table1.var, table2.var, table1.h3id
FROM table1
JOIN table2 ON table1.h3id=table2.h3id;

With our Spatial Extension for BigQuery this is possible and straight out of your BigQuery console. The Spatial Extension is packed with functions and procedures to perform geospatial analysis through expressive and simple SQL, all natively within the data warehouse environment.

Want to try out the CARTO Spatial Extension for BigQuery?

Get access

A multilayered approach with an Open Core

Diagram showing the components of the CARTO Spatial Extension


The core layer of the Spatial Extension, which enhances the built-in GIS capabilities of the cloud data warehouses, is open source and freely available on GitHub. With this core, we are making a set of functions available to the general public which we believe are the foundation for the next generation of spatial analytics, providing support for key spatial indices such as H3 and quadkeys. For more detailed information, check out our website.


As an open source core, we actively welcome contributions and love pull requests, so if you would like to contribute specific functions or help us to debug them, we will see you on GitHub!

Advanced analytics and visualization

The Spatial Extension is already equipped with our BigQuery Tiler, a unique solution to visualize massive spatial datasets hosted in BigQuery. We are now working to develop out-of-the-box advanced spatial analysis and tools such as the identification of hotspots, the computation of twin areas or performing a spatial cross validation. You can see a sample of what’s possible when we combine the power of cloud-native visualization and analytics in the making of our vaccination progress application.

Powered by our Data Observatory

Modern spatial analytics is all about connectivity to data sources, which is why we continue to grow the largest spatial data catalog out there: the CARTO Data Observatory. Take the example of a function to determine a catchment area. You can do this simply by using a buffer around a point or with a function that uses a human mobility dataset “under the hood”.

By leveraging our huge catalog of curated data, the Spatial Extension not only streamlines your analysis workflows, but also opens up a limitless array of analytical possibilities, from enriching your data with a simple SQL command, to creating complex machine learning models.

Available today for BigQuery

Today we are announcing the availability of more than 50 spatial functions organised in ten different modules, seven of them Open Source and free to use for anyone with a BigQuery account. These functions enhance BigQuery GIS capabilities with geometry constructors, transformations and measurements, support for H3 and Quadkey, and the Tiler, among others. A complete list of the functions can be viewed here. And best of all, these functions can be used directly with your BigQuery hosted data without the need for painful ETL processes.

Diagram showing the available spatial functions


Thanks to BigQuery’s flexible permissions system, we are able to make the core modules of the Spatial Extension available as a public database. Therefore, these functions are freely available to query from your BigQuery console, without the need for any installation. To test this, simply type the following SQL statement on your BigQuery console:

bqcarto.h3.VERSION()

** Please use bqcartoeu instead of bqcarto, if your account is located in the European multi region.

If you would like to get access to the complete suite of modules, please sign up here. We will be adding new functions every week!

Unlock the power of spatial indices

Spatial indices enable scalable analytics by partitioning the space at different resolution levels. An immediate application is the compact representation of an arbitrarily complex geometry, such as a census tract in the US with a set of indices:

Screenshot showing scalable analytics by partitioning the space at different resolution levels


WITH data as (
	SELECT bqcarto.h3.COMPACT(
  	bqcarto.h3.ST_ASH3_POLYFILL(geom,11)) as geo 
  	FROM `carto-do-public-data.carto.geography_usa_censustract_2019`
  	WHERE geoid='36061009900')

SELECT bqcarto.h3.ST_BOUNDARY(h3) as geom
FROM data,UNNEST(geo) as h3

Using these simplified representations we can transform traditionally expensive operations, like spatial intersections, into computationally efficient array operations. See here an example and the full script to reproduce it:


-- Declare variables to hold quadkey indices
DECLARE buffer_idxs ARRAY<INT64>;
DECLARE tract_idxs ARRAY<INT64>;
 
-- Quadint representation of buffer
SET buffer_idxs = (
SELECT bqcarto.quadkey.ST_ASQUADINT_POLYFILL(geom, 20) as qks
FROM ( SELECT bqcarto.transformations.ST_BUFFER(ST_GEOGPOINT(-73.9551819,
         40.7455594), 0.4,'kilometers',20) AS geom)   
);
 
-- Quadint representation of census tract
SET tract_idxs = (
SELECT bqcarto.quadkey.ST_ASQUADINT_POLYFILL(geom,20) as qks
   FROM `carto-do-public-data.carto.geography_usa_censustract_2019`
   WHERE geoid='36081000700'
);
 
-- Intersection
SELECT qk_b
FROM UNNEST(buffer_idxs) AS qk_b WHERE qk_b IN
(SELECT qk_t FROM UNNEST(tract_idxs) AS qk_t)

Finally, you can see how easy it is to create a heatmap representations, in this example, Starbucks locations in the US using an H3 grid. If you require an analysis at a finer scale, you simply need to choose a higher resolution when calling these functions.

WITH
  data AS (
  SELECT
    bqcarto.h3.ST_ASH3(geog, 4) AS h3id,
    COUNT(*) AS agg_total
  FROM `carto-docs.examples.starbucks-locations-usa`
  GROUP BY h3id
  )
SELECT
  h3id, 
  agg_total,
  bqcarto.h3.ST_BOUNDARY(h3id) AS geom
FROM
  data

For a comprehensive list of the functions and procedures available, as well as a set of guides and examples to help you get started, please refer to our Documentation Center.

Our vision is to enable Spatial Analytics in SQL

Building on our success with the BigQuery Tiler, today’s announcement further demonstrates our commitment to cloud native spatial analytics. We firmly believe that the future of geospatial analysis and application development is in the cloud and that SQL will rapidly become the lingua franca for advanced geospatial analytics. These are dynamic, exciting times for Location Intelligence. We will be announcing further cloud native developments in the coming weeks, so stay tuned for regular updates from us!

Want to find out more?

Check out our website
About the author
Javier de la Torre

Javier de la Torre is founder and Chief Strategy Officer of CARTO. One of the pioneers of location intelligence, Javier founded the company with a vision to democratize data analysis and visualization. Under his leadership, CARTO has grown from a groundbreaking idea into one of the fastest growing geospatial companies in the world.
In 2007, he founded Vizzuality, a renowned geospatial company dedicated to bridging the gap between science and policy making by the better use of data.

More posts from Javier de la Torre
About the author
Margara Tejera

Product Manager at CARTO

More posts from Margara Tejera

Related Posts

Ready to optimize your territories with Location Intelligence?

Close circle icon

Contact us

Please fill out the below form and we'll be in touch real soon.