A spatial join combines records from two datasets based on the geographic relationship between their locations rather than on a matching column value. Where a traditional SQL join uses an equality between two keys, a spatial join uses a geometric predicate such as “contains,” “intersects,” “is within X meters of,” or “is the nearest to.”
What is a Spatial Join?
Spatial joins answer questions that involve geography, for example:
- Which sales transactions happened inside each city polygon?
- Which customers live within a 10-minute drive of each store?
- Which weather stations report measurements covering each parcel of land?
The join is performed by evaluating a geometric predicate between every candidate pair of geometries and returning the matched rows along with attributes from both datasets.
Common Spatial Join Predicates
- Point-in-polygon: assigning each point to the polygon that contains it (e.g. customers to ZIP codes)
- Polygon-polygon intersection: finding all overlapping shapes (e.g. parcels intersecting a flood zone)
- Distance-based: joining records within a given radius of each other (e.g. POIs within 500 meters of a panel location)
- Nearest neighbor: joining each record to its closest match in another dataset (e.g. each customer to the nearest store)
Why Spatial Joins are Hard at Scale
A naive spatial join compares every geometry on the left side with every geometry on the right side, producing a Cartesian product of geometric tests. Because each test requires non-trivial floating-point math, spatial joins quickly become expensive on tables with millions of rows. Traditional GIS tools handle this with R-tree or quadtree indexes; cloud data warehouses use a combination of bounding-box filtering and spatial indexes.
Spatial Joins in Cloud Data Warehouses
Modern cloud data warehouses include native spatial functions (ST_CONTAINS, ST_INTERSECTS, ST_DWITHIN, etc.) that make spatial joins expressible directly in Spatial SQL. Performance can be further improved by:
- Transform geometries to H3 or Quadbin: converting geometries to H3 or Quadbin cells reduces a spatial join to an integer equality join, often improving performance by orders of magnitude.
- Bounding-box filtering: discarding obvious non-matches early using simple coordinate comparisons before invoking expensive geometric predicates.
- Partitioning by geographic region: co-locating data in the same partitions to minimize shuffle.
- Warehouse-specific optimizations: each platform offers its own way to structure tables for fast spatial access. In BigQuery, cluster the table by the geometry column (or by the spatial index column for H3/Quadbin sources). In Snowflake, order simple-feature tables with
ST_GEOHASH(geom), activate the Search Optimization Service on the GEOGRAPHY column (Enterprise Edition), or cluster by the spatial index column. In Oracle, create a spatial index on the SDO_GEOMETRY column (or a standard index on the H3/Quadbin column) and keep table statistics current. In Databricks, useZORDER BYon the H3 column. In Redshift, set aSORTKEYon the spatial index column and project geometries to EPSG:4326. See the CARTO performance considerations docs for full guidance.
Spatial Joins in CARTO
The CARTO Analytics Toolbox provides accelerated spatial join functions in BigQuery, Snowflake, Databricks, Redshift, and Oracle. Analysts can run point-in-polygon, distance, and nearest-neighbor joins directly against tables in their warehouse, optionally using H3 or Quadbin to scale to billion-row datasets without moving data.
CARTO Workflows exposes spatial joins as drag-and-drop components, making them accessible to analysts who prefer not to write SQL.
CARTO for Agents goes a step further than no-code. Through CARTO Agent Skills, analysts can perform spatial joins conversationally: describe a business question in plain language, and the agent recommends the right join approach, executes it, and suggests follow-on analysis. This removes the SQL barrier and also guides users toward the right spatial methodology for the question they are asking.



