Enhancing geospatial analytics with CARTO & Databricks

Summary

Unlock advanced spatial analysis in Databricks with CARTO’s Analytics Toolbox. Visualize & analyze large geospatial data inside your data warehouse.

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Enhancing geospatial analytics with CARTO & Databricks

At CARTO, we want to break the GIS data silo. Rather than having geospatial data sit in a different database, running on different software and being used by a completely separate team, we believe it should be integrated. This will save time, drive efficiencies and de-risk decision making. Data analysts and spatial data scientists should be able to speak the same language, use the same tools and gain insights from the same data.  

That means cloud data warehouses, like Databricks.

CARTO is the leading cloud-native Location Intelligence platform. Everything that you get with CARTO - our advanced analytics functions, our dynamic visualization suite and our modern developer toolkit - happens inside your data warehouse. We’re excited to share what that means for Databricks users, and give you a preview of the exciting developments coming to your data warehouse!

So, if you’re ready to extend your geospatial capabilities and start using location data to make decisions at scale - keep reading!

A screenshot of a CARTO Builder map showing a 3D map and widgets around the side
Turning location data into insights with CARTO

Databricks: the state of spatial

Historically, users wishing to perform Location Intelligence in Databricks had two main options.

Firstly, they could use single node libraries - such as GDAL and GeoPandas. These could only be used on a driver or single node cluster, and were also relatively slow to run and offered limited scalability.

Secondly, users could leverage the Apache Spark (TM) framework - a more scalable option offering data distribution across clusters. This was often leveraged in conjunction with Mosaic or Apache Sedona for more extensive geospatial capabilities.

CARTO’s Analytics Toolbox simplifies all of this with a unified approach to spatial analysis.

CARTO’s Analytics Toolbox for Databricks

We’re excited to share that we will be releasing our Analytics Toolbox for Databricks in summer 2024! This will provide users with advanced spatial analysis capabilities, allowing them to index large geospatial datasets for efficient ingestion and visualization.

The Analytics Toolbox will run natively on top of customer existing Databricks clusters - meaning all spatial processing will happen directly inside the data warehouse. Not only will this enable more efficient analytics within your existing tech stack, but will help ensure data governance standards and single-source-of-truth compliance.

A schematic diagram of CARTO's Analytics Toolbox for Databricks
Running CARTO natively with Databricks

CARTO’s Analytics Toolbox for Databricks will allow users to draw on Spatial SQL from different origins, always automatically prioritizing the most performant option. Users will be able to access functions from:

  • Databricks native Spatial SQL (currently in private preview)
  • Apache Sedona
  • Apache Spark (TM) UDFs
  • CARTO Analytics Toolbox

There are many exciting new developments coming to the Analytics Toolbox for Databricks users - keep reading for a sneak preview!

At Databricks, we’re constantly pushing the boundaries of data and AI. Our collaboration with CARTO represents a significant step forward in merging the worlds of data science and spatial analysis, opening up exciting new possibilities for innovation across industries.

Michael Johns, Geospatial Specialist Leader at Databricks

What else is coming?

In addition to the Analytics Toolbox - due to launch in June 2024 - here’s a taster of what Databricks users can look forward to!

  • RasQuet: raster parquet or ‘Rasquet’ is a new standard of raster data for the cloud. Data is stored as multi-dimensional arrays, making handling large volumes of raster data easier. This simplifies working with big datasets and allows users to process and analyze spatial data at scale, in the cloud. By integrating seamlessly with non-geospatial data and supporting projections and encodings, Rasquet is designed for interoperability, enhancing the efficiency of spatial data analysis.
  • Workflows: CARTO Workflows - coming to Databricks users in September 2024 - is our no-code tool for building multi-step analyses in a visual, drag-and-drop environment. You can schedule workflows to reflect real-time data updates, or trigger them via API to integrate with wider organizational processes. We’re excited to bring the Workflows solution to Databricks users to unlock all of the benefits of no-code analytics, from saving time to enhanced collaboration.
  • Dynamic Tiling: this development will allow you to dynamically generate map tiles on demand by querying data directly from Databricks, eliminating the need to build pre-generated tilesets and complex ETL processes. This approach optimizes performance, reduces inefficiencies and costs, and ensures real-time, scalable map rendering for large datasets. Learn more here.
  • AI assistant: get ready for CARTO AI Agents! Coming soon to CARTO Builder, utilize the latest LLM models to interact with spatial data and visualizations to uncover insights and meaning. Enhance efficiency by automating tasks such as map manipulation and direct analysis. Keep your eyes peeled for more news on this!

The Data + AI Summit: What we’re excited for

The Databricks Data + AI Summit - held on the 10th-13th June 2024 - was a great opportunity for us to learn and share some of what Databricks and CARTO users can look forward to! Here's what we’re most excited about:

  • Databricks announced Spatial SQL! One of the biggest highlights was Databricks finally announcing Spatial SQL. This announcement shows how important geospatial capabilities are to Databricks users and how many users are eager to leverage this technology.
  • The acquisition of Tabular: In acquiring Tabular, Databricks are gearing up to offer even greater geospatial capabilities, with users particularly interested in the implications of this acquisition for Iceberg & Delta. We’re excited to see what this means for the spatial community, particularly as we continue to play a big part in developing cloud-native open geospatial formats such as GeoParquet for Parquet, and GeoArrow for Apache Arrow.
  • Spark 4.0 is coming: organizations are preparing for the launch of Apache Spark 4.0 which includes polymorphic Python UDTF, string collation support, the new VARIANT data type, streaming state store data source, structured logging and more. While other geospatial platforms may face compatibility issues with Spark 4.0, CARTO is Spark 4.0 ready meaning users will be able to hit the ground running with the launch!
A collage of photographs of the CARTO booth at the Data+AI Summit
The CARTO team at the Data+AI Summit 2024

Wish you were there? Our team hosted three talks at the summit - you can watch them all below 👇

Start your journey

Hear a full round-up of our vision for spatial in Databricks in this webinar with our founder & CSO Javier de la Torre & Platform APIs Lead Milos Colic. Can’t wait? You can also schedule a demo with one of our geospatial experts to learn more about what these developments mean for you.