Javier de la Torre

Databricks support for H3 in collaboration with CARTO

Over the past few months we have been working with Databricks to add built-in support for H3, and this added functionality was released recently. Native support for H3 means less friction when using H3 Spatial Indexes in Databricks, it also brings major speed improvements thanks to Photon acceleration.

We have been using these new H3 capabilities in CARTO for some time now and we are pretty excited about the impressive speed improvements.

This is a significant development that supports our mission to unlock massive-scale spatial analytics natively in the Lakehouse platform.

H3 is a powerful global hierarchical grid system - sometimes referred to as a Spatial Index - which enables the processing and analysis of truly big geospatial data. By leveraging the power of H3, Databricks users can now perform faster and more efficient workflows, as well as unlock totally new types of spatial analysis.

The announcement from Databricks - which you can read more about here - comes as part of their milestone 11.2 Runtime release and offers huge advancements for users who are processing and analyzing geospatial data.

CARTO has collaborated closely with Databricks to be able bring these latest advances to our cloud native Location Intelligence platform. Our expertise and knowledge of the spatial problems that Databricks customers are solving helped to inform the H3 roadmap. In particular, we worked with Databricks to define a list of geospatial functions that customers need to be able to get the most value from H3, which you can read about later in this post!

To see H3 in action on Databricks using CARTO, check out our presentation at the recent DATA & AI Summit.

What is H3?

H3 is a global hierarchical grid system which was developed to efficiently manage and analyze large geospatial datasets. The concept of a global hierarchical grid is simple; they consist of multiple resolutions which “map” directly to each other. H3 consists of 16 resolutions ranging from having an area of 4 million km² to 0.895m². Within each “parent” hexagonal cell, 7 “child” hexagonal cells can be found.

Globe with H3

What are the advantages of using H3?

There are a number of advantages to using H3 for your spatial analytics, including:

  • Efficiency and performance:H3 is stored as an index (a string variable) rather than a complex geometry variable. This makes them smaller to store and faster to analyze.
  • Scalability: harness the true, highly distributed power of Spatial Data Warehouses. Unlike with geometries, query costs do not increase exponentially with bigger, more complex areas.
  • Flexibility: cross-analyze data from multiple geographies when aggregated to one grid system, often referred to as a “Support Geography.”
  • Collaborate: easily share and transfer data with other H3 users by calling from the same index referencing system.
  • Analyze: converting data into a continuous grid systems enables new forms of analysis.

Want to learn more about Spatial Indexes? Check out our blog post to find out why hexagons are such a powerful tool for Location Intelligence.

Using H3 in Databricks with CARTO

Thanks to our Spatial Extension for Databricks, CARTO users can connect directly to their Databricks cluster to access data and perform massive-scale data visualization and analytics. The latest enhancements to the Databricks platform brings added H3 functionality to allow dynamic aggregation natively within Databricks. And in addition, our spatial data catalog opens up a wealth of H3-indexed datasets that can be used for highly efficient data enrichment workflows.

This Databricks release includes 28 native H3 functions. These include:

  • Functions to generate H3 cells and grids from geometries or well known text (WKT), such as h3_polyfillash3() where users can generate a H3 grid of a defined resolution to cover the extent of a polygon. Similarly, h3_longlatash3() can be used to generate a H3 cell at a defined coordinate.
  • The reverse of this; functions which create a geometry feature from H3. These include h3_boundaryaswkt() - which converts a H3 cell into a polygon - and h3_centeraswkb() - which converts a H3 cell into a point at its centroid. Having the ability to easily move between H3 and geometry types is very useful for running spatial filters and joins.
  • Distance-based functions which are far cheaper to compute than analyzing distance based on geometry functions such as ST_DISTANCE(). For instance, h3_kring() creates a ring of H3 cells around an origin cell for a defined grid distance, and h3_distance() will return the grid distance between two cells.
  • Functions to move between resolutions, such as h3_tochildren() and h3_toparent().

…And many more! Check out the full list here.

Data explorer

Getting started with H3 in CARTO: An example

The example SQL code below illustrates how a point table can easily be aggregated to a H3 grid. This code can be executed in your Databricks console or directly in CARTO Builder. Once executed, this query can be visualized as a dynamic H3 layer (using the H3 field), which renders faster than a conventional geometry table and is rendered dynamically; the more you zoom in, the more detailed the rendering.

WITH
--define inputs
points AS (SELECT geom, value from POINTS),
study_area AS (SELECT carto.ST_MAKEPOLYGON(carto.ST_GEOMFROMWKT('LINESTRING(long lat, long lat, long lat, long lat)')) AS geom),
h3grid AS (SELECT h3_polyfillash3(geom) as h3 FROM study_area)

--aggregate the point variables to h3 cells, based on the geospatial relationship "contains"
SELECT h3grid.h3, COUNT(points.geom) AS count, SUM(points.value) AS value_total
FROM h3grid
LEFT JOIN carto.ST_CONTAINS(h3_boundaryasgeojson (h3index.h3), points.geom)

In addition to the new H3 functions available in Databricks, users can also leverage functions from CARTO’s Analytics Toolbox to undertake complex geospatial analysis. Check out our guide to installing our Analytics Toolbox for Databricks here and start unleashing the power of big spatial data!

Our roadmap includes even more enhancements to support CARTO users running advanced spatial analytics in the Databricks Lakehouse platform. Stay tuned for more exciting announcements in the coming weeks!

And if you would like to test drive the CARTO Location Intelligence platform in full, why not sign up for our free 14-day trial!

map from Carto

Want to find out more?

Request a demo
About the author
Javier de la Torre

Javier de la Torre is founder and Chief Strategy Officer of CARTO. One of the pioneers of location intelligence, Javier founded the company with a vision to democratize data analysis and visualization. Under his leadership, CARTO has grown from a groundbreaking idea into one of the fastest growing geospatial companies in the world.
In 2007, he founded Vizzuality, a renowned geospatial company dedicated to bridging the gap between science and policy making by the better use of data.

More posts from Javier de la Torre

Related Posts

Ready to optimize your territories with Location Intelligence?

Close circle icon

Contact us

Please fill out the below form and we'll be in touch real soon.