PostGIS Performance Profiling

Summary

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
PostGIS Performance Profiling

More and more CARTO users are asking to render more and more data onto maps  and they want to see the results on their screen quickly. In order to get a whole map drawn in less than a second  the individual tiles visible on a map have to arrive even faster!

The first view of a map we give to a user usually includes the entire extent of their data  so if their data is 1M points  then we have to render 1M points to show them their data. Similarly  when initially applying styling and filters  users tend to look at the entire extent of their data. We want the first impression to be fast  so optimizing the "lots of features on screen at once" use case makes sense.

I generated some test data to simulate the "lots of features" case: 1M random points  100K random 2-vertex lines  and 10K 130-point polygons. In order to keep things simple  I generated them all in the same 10k by 10k planar box. Here's what a small part of the top corner looks rendered in QGIS.

     

QGIS render of random data

 


The tile life-cycle can be broken up into a few distinct phases:

  • Database query execution. Takes raw data  filters and transforms for transport to the renderer.
  • Server-side rendering. Takes features  applies styling rules  draws image  and compresses into transport format.
  • Network transit time.
  • Client-side rendering. Place the information from the server on the screen.

I've been concentrating on reducing the amount of our time budget we spend at the database level. That has meant examining the performance of a two key cases  and looking for optimizations:

  • Sending large numbers of points  lines or polygons to the Mapnik image renderer.
  • Converting large numbers of points  lines or polygons to the MVT vector format.

Mapnik Rendering

In order to strip out data transmission overhead and only look at the cost of filters and data transformations  I used this SQL to approximate the Mapnik load  and profiled the load using the OSX  Instruments tool:

{% highlight sql %}SELECT sum(length(output))FROM (  SELECT     ST_AsTWKB(      ST_Simplify(      ST_RemoveRepeatedPoints(geom  15)  15)  1) AS output  FROM pt_table_1000000  WHERE geom &&     ST_SetSRID(ST_MakeEnvelope(1500  1500  8500  8500)  26910)  ) a;{% endhighlight %}

One thing to note right away about this construction is that it chains together three function calls to get the final result. For a simple data type  like a number  this would be cheap  but for a more complex type like geometry there is a cost: at the start of each function call  the database has to "deserialize" the storage format into memory; and at the end of each call that memory structure has to be serialized back into the storage format.

The functions themselves do a fair amount of copying data around: both the ST_Simplify() and ST_RemoveRepeatedPoints() functions generate complete net-new geometries  even when they don't actually alter the inputs. When the inputs are points  which by definition cannot be simplified in any way  all that copying is for nothing. This is also true for simple lines or polygons.

Finally  the implementation of ST_AsTWKB() ("tiny well-known binary") includes a good deal of memory management of structures on the heap  using variable-sized byte buffers to keep the code clean and small while generating outputs.

The net effect of all this book-keeping is that the relatively simple act of converting a database point into a TWKB point via this function chain involves dozens of memory allocations and deallocations.

How did we allow this to happen? Well  PostgreSQL uses a "hierarchical memory manager"  which allows developers to safely and efficiently allocate memory that the database will clean up when it's no longer needed. This simultaneously removes most memory leak issues  and improves performance massively over using operating system level memory allocation. So PostGIS developers mostly ignore the potential overhead of memory management  because the database makes it safe and efficient.

So… why the problem? The problem is specific to the relative scale of the number of allocations compared to the processing on the objects. For a large polygon  where non-trivial work is required to actually remove points and simplify  the overhead of allocations is hardly noticeable. For points  where almost all the work is in the allocation book-keeping  the overhead is significant.

Having measured the problem  I spent some effort changing the structure of ST_AsTWKB() to minimize allocations for small objects (points!) and on the ST_Simplify() and ST_RemoveRepeatedPoints() functions to allow them to generate outputs with less copying.

The less invasive changes were added to PostGIS 2.4 before release. The large changes went into PostGIS trunk (future PostGIS 2.5)  and also into our CARTO fork of 2.4  where we merge particular features we want back into the 2.4 series. I compared the "2.4rc1" release  which pre-dated my work  to the CARTO fork.

{:.tb__content}

Mapnik Query 2.4rc1 2.4cdb
# features ms μs/feature ms μs/feature improvement
Polygons 5145 211ms 41.0μs 195ms 37.9μs 8%
Lines 49846 218ms 4.4μs 179ms 3.6μs 18%
Points 489446 1055ms 2.2μs 651ms 1.3μs 38%

For further improvements  the 2.4cdb branch includes a CDB_AsSimplifiedTWKB function that wraps the three function calls to remove/simplify/twkb into a single call. Doing so strips out the extra function call overhead and serialize/deserialize costs  but requires Mapnik to request features using the custom function.

Since this approach is very specific to just one use case (Mapnik-based rendering) it's been kept out of the general release of PostGIS. The correct way to lower function chaining overhead is probably to implement PostgreSQL "expanded object headers" support for functions that are commonly chained. That way we avoid an explosion of new functions that do nothing but chain together existing functions.

MVT Rendering

As we add vector rendering to the client side of the CARTO stack  we are simultaneously transitioning Mapnik out of the request chain: why ask Mapnik to "render" vector tiles for us when we can now request them directly from the database via ST_AsMVT()?

Just as the Mapnik query does  the MVT process applies several simplifications to the raw geometry before encoding it into the final transfer format. However for MVT  the transformations are hidden within the ST_AsMVTGeom() function.

Here is the test SQL used to generate the load for profiling:

{% highlight sql %}SELECT length(ST_AsMVT(a)) FROM (  SELECT ST_AsMVTGeom(p.geom  b.geom  4096  0  true)   FROM pt_table_1000000 p  (    SELECT ST_SetSRID(ST_MakeEnvelope(1500  1500  8500  8500)  26910) AS geom    ) AS b  WHERE p.geom && b.geom  AND p.geom IS NOT NULL  ) a;{% endhighlight %}

Within the ST_AsMVTGeom() function the geometries are translated into tile coordinates  clipped to the tile bounds  and snapped to the grid resolution. Polygons are checked for validity  and their rings are consistently oriented.

Some of this processing results in unanticipated copying  as in the previous tests. There are also large overheads in running validity checks and repairs on polygons. To speed up MVT generation in our custom branch  I made the following changes:

  • Apply repeated point removal and simplification prior to translation and snapping.
  • Avoid copying in all the transformations.
  • Remove polygon validity tests and repairs.

The last change is the most controversial. There is no doubt that simplification and then snapping into the MVT grid can result in invalid output polygons. However  the computational cost applied over all cases is very high  and the number of cases where this happens is very low. On balance  we want to see how frequently we encounter rendering issues due to invalidity in the real world  before committing to the cost of validity enforcement on all polygon geometry.

The performance changes  for polygons in particular  are substantial:

{:.tb__content}

MVT Query 2.4rc1 2.4cdb
# features ms μs/feature ms μs/feature improvement
Polygons 5145 5299ms 1029.9μs 687ms 133.5μs 87%
Lines 49846 350ms 7.0μs 290ms 5.8μs 17%
Points 489446 2574ms 5.3μs 1864ms 3.8μs 28%

There are still potentially some gains to be made in reviewing the actual MVT format encoding  as thus far all the attention has been on the ST_AsMVTGeom() transformation step.

Next Steps

At some point  there will be no more blood to squeeze out of the PostGIS performance stone  but for now a few places still hold some promise:

  • Frequently used function chains can be optimized by enabling "expanded object headers" for their inputs and outputs.
  • The actual production of MVT encoding may have some optimizations available.

The biggest win will probably come from changing some assumptions about how to pass large sets of geometry through the system: for points in particular  aggregating during the query phase and only encoding a limited subset of features can radically reduce the amount of data to be transmitted and rendered.