One of the things that makes managing geospatial data challenging is the huge variety of scales that geospatial data covers: areas as large as a continent or as small as a man-hole cover.
The data in the database also covers a wide range, from single points, to polygons described with thousands of vertices. And size matters! A large object takes more time to retrieve from storage, and more time to run calculations on.
The Natural Earth countries file is a good example of that variation. Load the data into CARTO and inspect the object sizes using SQL:
Over half (149) of the countries in the table are larger than the database page size (8Kb) which means they will take extra time to retrieve.
We can see the overhead involved in working with large data by forcing a large retrieval and computation.
Load the Natural Earth populated places into CARTO as well, and then run a full spatial join between the two tables:
Even though the places table (7322) and countries table (255) are quite small the computation still takes several seconds (about 30 seconds on my computer).
The large objects cause a number of inefficiencies:
How can we speed things up? Make the large objects smaller using ST_Subdivide()!
First, generate a new, sub-divided countries table:
Remember to register the table with CARTO, so that the editor interface can pick it up:
Now we have the same data, but no object is more than 255 vertices (about 4Kb) in size!
Run the spatial join torture test again, and see the change!
On my computer, the return time about 0.5 seconds, or 60 times faster, even though the countries table is now 8633 rows. The subdivision has accomplished two things:
Subdividing big things can make map drawing faster too, but beware: once your polygons are subdivided you’ll have turn off the polygon outlines to avoid showing the funny square boundaries in your rendered map.
Happy mapping and querying!
Introducing BigQuery TilerNews
Please fill out the below form and we'll be in touch real soon.