Patching Plain PostgreSQL for Parallel PostGIS Plans

Sep 20, 2024

4 mins read

CARTO users want to see maps drawn quickly and drawing maps quickly means rendering every geometry in a table to an output format. By deploying the latest version of PostgreSQL to our infrastructure we are taking advantage of improved parallel query support to make our maps and widgets go faster.

This would be the shortest blog post of all time except for one inconvenient fact: stock PostgreSQL 10 with stock PostGIS 2.4 doesn't parallelize spatial queries very well at all.

In particular for the kinds of routine calculations that CARTO runs parallel processing doesn't kick in:

Parallel scans still require higher function costs to come into action even in reasonable cases.
Parallel joins on spatial conditions still seem to have poor planning requiring a good deal of manual poking to get parallel plans.

To get the kind of parallel performance we wanted from PostgreSQL 10 + PostGIS 2.4 we had to tweak both packages a little bit.

PostgreSQL required two tweaks to the planner to generate more agressive parallel plans with PostGIS functions in them
PostGIS required a simple yet extensive tweak to add higher costs over the whole collection of PostGIS functions

PostgreSQL Planner Pain

For users of PostGIS one of the big surprises (and disappointments) of stock parallel support in PostgreSQL 10 is that the planner only looks at the contents of the WHERE clause when determining if a scan should be parallelized.

{% highlight sql %}-- This MAY be parallelized (if there enough rows)SELECT * FROM mytableWHERE ST_Area(the_geom_webmercator) > 10000{% endhighlight %}

On the other hand a query with even a very costly spatial function (like a buffer) on the SELECT line (what PostgreSQL hackers call the "target list") won't get parallelized no matter how many rows are in play.

{% highlight sql %}-- This WON'T be parallelizedSELECT ST_Buffer(the_geom_webmercator 10) AS the_geom_webmercator FROM mytable{% endhighlight %}

Fortunately for us at CARTO this behaviour was pointed out on the pgsql-hackers email list only a couple months before we needed a fix.

If I have a slow function which is evaluated in a simple seq scan I do not get parallel execution even though it would be massively useful. Unless force_parallel_mode=ON then I get a dummy parallel plan with one worker.
-- Jeff Janes pgsql-hackers 2017-07-11

The ensuing discussion led to a revised patch from Amit Kapila of EnterpriseDB that would generate parallel plans for queries that included costly functions in the target list.

We took Amit's patch and applied it to our PostgreSQL 10 fork so we can deploy a stable PostgreSQL branch with improved parallelism behaviour for our use cases.

While Amit has continued to revise his patch over the succeeding months it has not been committed to the main PostgreSQL development tree yet so there's no guarantee that PostgreSQL parallel behaviour for costly functions on the target list will improve in version 11.

The High(er) Costs of PostGIS

As a general proposition the costs of PostGIS spatial functions are much much higher than the costs of ordinary functions in PostgreSQL. Even though all PostGIS functions are written in C they still have to perform vastly more calculations to return a result than most standard PostgreSQL functions: computational geometry is complicated!

Prior to parallelism the declared COST of a function would rarely effect the query plan generated by PostgreSQL -- no matter what plan was chosen the function result had to be calculated.

With parallelism though the decision about whether or not to set up a parallel plan is a function of number of rows to be processed times function cost. A higher function cost can flip a plan over from single-worker to parallel.

PostGIS has not globally applied costs to functions in the past because the planner had no use for it and it's a lot of work -- there's hundreds of functions in PostGIS. Why spend all that effort to add costs that would never actually affect a plan?

However with Amit's patch we had a use for costs so we added explicit costs to a number of high-cost spatial functions.

{% highlight sql %}CREATE OR REPLACE FUNCTION ST_Centroid(geography use_spheroid boolean DEFAULT true) RETURNS geography AS 'MODULE_PATHNAME' 'geography_centroid' LANGUAGE 'c' IMMUTABLE STRICT _PARALLEL COST 100;{% endhighlight %}

What Still Broken?

At this point we had:

added a PostgreSQL planner patch to honor function costs more effectively and
adjusted PostGIS function costs appropriately to capture the high cost of spatial functions.

So we should be done! But we weren't.

A lot of very heavily used PostGIS functions are actually SQL wrappers that cover up more complex combinations of other functions. For example:

{% highlight sql %}CREATE OR REPLACE FUNCTION ST_Intersects(geom1 geometry geom2 geometry) RETURNS boolean AS 'SELECT $1 && $2 AND _ST_Intersects($1 $2)' LANGUAGE 'sql' PARALLEL SAFE;{% endhighlight %}

PostGIS expects that functions of this sort will be "inlined". That PostgreSQL will when finding one of these functions replace the function with contents of the function definition. So:

{% highlight sql %}SELECT * FROM a B WHERE ST_Intersects(a.geom b.geom){% endhighlight %}

Would be silently inlined as:

{% highlight sql %}SELECT * FROM a B WHERE a.geom && b.geom AND _ST_Intersects(a.geom b.geom){% endhighlight %}

The trouble is that deep in the PostgreSQL code base is logic that determines if a given function can be inlined and one of the things that determines whether inlining is allowed is if the functions being inlined are not too costly. And we just increased the costs of many of our functions.

The PostgreSQL inlining behaviour seems to be justified for users who use SQL wrapper functions as a way to force the caching of expensive calculations underneath the wrapper. That is never our use case at CARTO so we don't mind losing that behaviour in order to get back "sensible" inlining with our properly costed functions.

Again we applied a very small patch to our PostgreSQL 10 stable branch.

The problem of losing inlining behaviour was discussed on the pgsql-hackers list in November 2017 but no patch was offered and our patch is not suitable for general application since it changes behaviours that some users might want to retain.