Spatial interpolation: which technique is best & how to run it

Summary

Fix missing or coarse data with Spatial Interpolation. Compare IDW and Kriging methods & follow along with tutorials!

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Spatial interpolation: which technique is best & how to run it

Data is rarely perfect.

A common imperfection in data is that values are missing or incomplete. This could be caused by an error in data collection, an uneven or coarse data resolution due to a limited set of sampled data, or a desire to create a more detailed, continuous data visualization. 

To mitigate these imperfections, Interpolation is needed. 

In this blog post, we’ll be explaining what Spatial Interpolation is and the differences between the main GIS interpolation methods - as well as tutorials on how to use them to smooth out the imperfections in your data. Ready?

What is Spatial Interpolation?

Interpolation is a mathematical technique used to estimate values between known data points. It involves constructing a function or curve that passes through the given data points and can be used to predict missing values at intermediate locations.

Spatial Interpolation is the same general function, except the predicted values are influenced by the geographic proximity - or distance - to known values. 

Two of the most popular spatial interpolation methods are Inverse Distance Weighting (IDW) and Kriging. Both approaches offer unique advantages and have distinct applications. Let’s check them out!

Inverse Distance Weighted

Inverse Distance Weighting is a simple yet powerful spatial interpolation technique. The fundamental concept behind IDW is that points close to each other in space tend to have similar attribute values. This makes it a true manifestation of Tobler’s First Law of Geography: “everything is related to everything else, but near things are more related than distant things."

IDW calculates the values of unknown locations by assigning weights based on the distance between the unknown point and its neighboring data points. The closer a point is to the unknown location, the higher its weight in determining the interpolated value.

Advantages of IDW

  • Simplicity: IDW is relatively straightforward to implement and understand, making it an excellent choice for quick analyses or initial exploratory spatial interpolation.
  • Fast Computation: The computational overhead of IDW is generally lower compared to other interpolation techniques, allowing for faster processing, particularly with large datasets.
  • Smooth Surfaces: IDW produces smooth surfaces with gradual transitions between points, making it suitable for applications where gradual change is expected, such as terrain modeling or pollution analysis.

Limitations of IDW

  • Oversensitivity to Outliers: IDW treats all neighboring points equally, which can lead to excessive influence by outliers.
  • Lack of Structural Information: IDW does not consider the spatial correlation structure of the data, resulting in potentially unrealistic interpolations in areas where the underlying attribute values exhibit complex spatial patterns.

Kriging

(Ordinary) Kriging is a geostatistical interpolation technique that takes into account both the distance and spatial correlation of data points, incorporating both local variations and broader trends.

Advantages of Kriging

  • Spatially Efficient: Suitable for datasets with complex spatial structures or irregularly spaced observations.
  • Accounting for Uncertainty: Kriging provides estimates of uncertainty for the interpolated values, which is valuable in assessing the reliability of your results.
  • Optimized Estimation: Kriging minimizes the interpolation error by adjusting weights based on the spatial correlation structure derived from an empirical variogram model, resulting in less precise (more uncertain) estimates in areas with limited data.

Limitations of Kriging

  • Computationally Demanding: Kriging requires estimating variogram models and solving large systems of equations, which can be computationally intensive, especially with large datasets.
  • Model Selection Challenges: Choosing an appropriate variogram model is crucial for accurate Kriging results. Very sparse sampling and/or selecting an inadequate model can lead to poor interpolation outcomes.
  • Interpretation Complexity: Kriging relies on some statistical assumptions, namely stationarity and isotropy. When these are not met the method is not guaranteed to provide best linear unbiased predictions , making it a less appropriate method for less experienced spatial data scientists.

IDW vs Kriging - which should I use?

The choice between IDW and Kriging depends on the characteristics of your data, the spatial patterns you wish to capture, your experience with spatial data science, and the objectives of your analysis. Here are some considerations to guide your decision:

  • IDW is best for simple, fast and computationally efficient interpolation, particularly for small to medium-sized datasets and when a smooth surface is desired. For example, IDW is well suited to interpolating terrain values as you would expect similar values to always be located close together. 
  • Kriging is more suitable for datasets with complex spatial patterns, where you require accurate estimation and quantification of uncertainty. A good use case for Kriging may be interpolating traffic density, as you could expect significant - and quite sudden - local variation here.

So how do I get started?

In this section of the blog, we’ll be using the statistics module from CARTO’s Analytics Toolbox to run some example interpolation. To follow these steps, sign up for a free 14-day trial account!

The data

To illustrate the difference between these two examples, we’re going to undertake an experiment!

We’ll be using the USA Spatial Features dataset, which you can access via our Spatial Data Catalog here. This dataset includes variables relating to demographics, the economy, climate and environment - in this instance, we’re only interested in elevation in Coconino County, Arizona (see below, or open in full screen here).

We’re going to remove a random 20% of the 57,915 features in this dataset - then use our two interpolation methods to compare the results! This dataset leverages H3 - a type of spatial data known as a “Spatial Index.” These data are geolocated by a short reference ID rather than a long geometry string, making them small to store and quick to analyze! Learn more about Spatial Indexes in our free ebook!

Breaking our data!

How good does it feel to do the reverse of what you normally would and actually break your data, on purpose!? The SQL below randomly selects 80% of the Spatial Features in our study area (Coconino County, which has a geoid of “04005”). You will need to replace the “XXXXXXXX” with your personal Data Observatory connection, and make sure you’re subscribed to both the state and Spatial Features datasets (they’re both free!).

   
WITH
--Select an area of interest
aoi AS (SELECT geom FROM `carto-data.ac_XXXXXXXX.sub_carto_geography_usa_county_2019` WHERE geoid ='04005')


--Filter Spatial Features to the AOI, randomly selecting only 80% of features
SELECT
 geo.geoid AS h3, geo.elevation, RAND() AS random
FROM `carto-data.ac_XXXXXXXX.sub_carto_derived_spatialfeatures_usa_h3res8_v1_yearly_v2` geo, aoi
WHERE ST_INTERSECTS(aoi.geom, `carto-un`.carto.H3_CENTER(geo.geoid))
ORDER BY random
LIMIT 46332
   

With the results of this committed to a table, we can now reverse this selection to find our “missing” features. 

   
WITH
--Select an area of interest
aoi AS (SELECT geom FROM `carto-data.ac_XXXXXXXX.sub_carto_geography_usa_county_2019` WHERE geoid ='04005')


--Filter Spatial Features to the AOI, randomly selecting only 80% of features
SELECT
 geo.geoid AS h3, geo.elevation
FROM `carto-data.ac_XXXXXXXX.sub_carto_derived_spatialfeatures_usa_h3res8_v1_yearly_v2` geo, aoi
WHERE ST_INTERSECTS(aoi.geom, `carto-un`.carto.H3_CENTER(geo.geoid))


--Select missing features
AND geoid NOT IN (SELECT h3 FROM yourproject.yourdataset.AZ_spatialfeaturesh3_sample)
   

This leaves us with the below - the blue cells are our “sample” features, and the pink cells are our “missing” features to be interpolated… So let's interpolate!

A map screenshot showing the sampling technique with roughly 80% blue hexagons and 20% pink hexagons.

Running IDW

IDW can be run with a short bit of SQL, further information on which can be found in our documentation here. It requires two point input arrays (see we use H3_CENTER to convert the H3 cells to centroids), as well as the following parameters

  • The maximum distance between points to interpolate
  • The number of points to use to calculate the interpolation
  • The distance power
   
SELECT
     point, value
   FROM
     UNNEST(`carto-un`.carto.IDW(
            ARRAY(SELECT AS STRUCT `carto-un`.carto.H3_CENTER(h3), elevation FROM `yourproject.yourdataset.AZ_spatialfeaturesh3_sample`),
            ARRAY(SELECT `carto-un`.carto.H3_CENTER(h3) FROM `yourproject.yourdataset.AZ_spatialfeaturesh3_missing`),
            1.0E5, -- max. distance
            10, -- no. neighbors
            2-- power of distance)) WITH OFFSET pos
   ORDER BY pos
   

This process ran in 7 min 38 seconds - not bad for such a complex process with 57,915 input features!

Running Kriging

Our Analytics Toolbox allows for a few different kriging options - here we use Ordinary Kriging, a full explanation of which can be found in our documentation here

Like IDW, this function requires both sample and input points as arrays, as well as maximum distance and number of points used to compute. In addition, kriging requires variogram parameters (the fourth input) and model type. 

There are two options here; exponential and spherical. Exponential models assume that spatial correlation decreases continuously with increasing distance, but never reaches zero. On the other hand, the spherical model assumes that the spatial correlation levels off or reaches a maximum at a certain distance, making it more suitable when there is a clear limit to the spatial influence of data points.

   
SELECT
 `carto-un`.carto.ORDINARY_KRIGING(
            ARRAY(SELECT AS STRUCT `carto-un`.carto.H3_CENTER(h3), elevation FROM `cartodb-gcp-marketing-team.Social2022.AZ_spatialfeaturesh3_sample`),
            ARRAY(SELECT `carto-un`.carto.H3_CENTER(h3) FROM `cartodb-gcp-marketing-team.Social2022.AZ_spatialfeaturesh3_missing`),
            1.0E5, -- max. distance
            [0.1,1E8,0.1], -- variogram parameters
            10, -- no. neighbors
            'exponential'-- no. model type)
   

This process took only slightly longer to compute, at 7 min 44 sec. Let’s look at the results!

Interpolation results

We can see the results of the interpolation in the map above (open in full screen here), with IDW on the top and kriging below. The distribution of these are shown on the bar charts below.

A histogram showing the distribution of IDW error, with most results around 0.
A histogram showing the distribution of kriging error, with most results around 0.

Looks pretty good right? Let’s inspect the RMSE.

IDW Kriging
0.510031 0.398007

Therefore, in our example, we can conclude that kriging has produced a more accurate result. However, that’s not the whole picture.

Examine the spatial distribution of error on the maps above (open in full screen here - note the resolution of the input cells has been made coarser for legibility). The map above shows error for IDW and the map below shows the error for Kriging. Pink areas indicate an under prediction and green indicates an over prediction of elevation. 

Do you notice a pattern? Greater error can be found in an area stretching from the north west to north east? Geography enthusiasts may be able to guess what is causing this…

The Golden Rule of Interpolation: Know your geography

A photograph of the Grand Canyon
Photo by Omer Nezih Gerek at Unplash.

That’s right - this area of particularly high error represents the Grand Canyon!

If we switch the basemap in CARTO Builder to aerial (see below), we can see how the size of the H3 cells compares to the intricate terrain of the canyon.

A zoomed in map of the Grand Canyon with error hexagons overlaid. The hexagons are much wider than the valleys.t

We can see there’s a real disparity here; the H3 cells are far too coarse to accurately predict such hyper-local terrain variation. A more detailed terrain model - perhaps with a smaller neighbor search area - would be likely to create far more accurate results in this area.

So the golden rule of interpolation? Know your geography. Know the patterns you would expect and adapt your data inputs and analysis accordingly. 

Ready to start interpolating your own data? Sign up for a free 14-day trial to get started today!