PostGIS Performance Profiling
More and more CARTO users are asking to render more and more data onto maps and they want to see the results on their screen quickly. In order to get a whole map drawn in less than a second the individual tiles visible on a map have to arrive even faster!
The first view of a map we give to a user usually includes the entire extent of their data so if their data is 1M points then we have to render 1M points to show them their data. Similarly when initially applying styling and filters users tend to look at the entire extent of their data. We want the first impression to be fast so optimizing the "lots of features on screen at once" use case makes sense.
I generated some test data to simulate the "lots of features" case: 1M random points 100K random 2-vertex lines and 10K 130-point polygons. In order to keep things simple I generated them all in the same 10k by 10k planar box. Here's what a small part of the top corner looks rendered in QGIS.
The tile life-cycle can be broken up into a few distinct phases:
- Database query execution. Takes raw data filters and transforms for transport to the renderer.
- Server-side rendering. Takes features applies styling rules draws image and compresses into transport format.
- Network transit time.
- Client-side rendering. Place the information from the server on the screen.
I've been concentrating on reducing the amount of our time budget we spend at the database level. That has meant examining the performance of a two key cases and looking for optimizations:
- Sending large numbers of points lines or polygons to the Mapnik image renderer.
- Converting large numbers of points lines or polygons to the MVT vector format.
Mapnik Rendering
In order to strip out data transmission overhead and only look at the cost of filters and data transformations I used this SQL to approximate the Mapnik load and profiled the load using the OSX Instruments tool:
{% highlight sql %}SELECT sum(length(output))FROM ( SELECT ST_AsTWKB( ST_Simplify( ST_RemoveRepeatedPoints(geom 15) 15) 1) AS output FROM pt_table_1000000 WHERE geom && ST_SetSRID(ST_MakeEnvelope(1500 1500 8500 8500) 26910) ) a;{% endhighlight %}
One thing to note right away about this construction is that it chains together three function calls to get the final result. For a simple data type like a number this would be cheap but for a more complex type like geometry there is a cost: at the start of each function call the database has to "deserialize" the storage format into memory; and at the end of each call that memory structure has to be serialized back into the storage format.
The functions themselves do a fair amount of copying data around: both the ST_Simplify() and ST_RemoveRepeatedPoints() functions generate complete net-new geometries even when they don't actually alter the inputs. When the inputs are points which by definition cannot be simplified in any way all that copying is for nothing. This is also true for simple lines or polygons.
Finally the implementation of ST_AsTWKB() ("tiny well-known binary") includes a good deal of memory management of structures on the heap using variable-sized byte buffers to keep the code clean and small while generating outputs.
The net effect of all this book-keeping is that the relatively simple act of converting a database point into a TWKB point via this function chain involves dozens of memory allocations and deallocations.
How did we allow this to happen? Well PostgreSQL uses a "hierarchical memory manager" which allows developers to safely and efficiently allocate memory that the database will clean up when it's no longer needed. This simultaneously removes most memory leak issues and improves performance massively over using operating system level memory allocation. So PostGIS developers mostly ignore the potential overhead of memory management because the database makes it safe and efficient.
So… why the problem? The problem is specific to the relative scale of the number of allocations compared to the processing on the objects. For a large polygon where non-trivial work is required to actually remove points and simplify the overhead of allocations is hardly noticeable. For points where almost all the work is in the allocation book-keeping the overhead is significant.
Having measured the problem I spent some effort changing the structure of ST_AsTWKB() to minimize allocations for small objects (points!) and on the ST_Simplify() and ST_RemoveRepeatedPoints() functions to allow them to generate outputs with less copying.
The less invasive changes were added to PostGIS 2.4 before release. The large changes went into PostGIS trunk (future PostGIS 2.5) and also into our CARTO fork of 2.4 where we merge particular features we want back into the 2.4 series. I compared the "2.4rc1" release which pre-dated my work to the CARTO fork.
{:.tb__content}
For further improvements the 2.4cdb branch includes a CDB_AsSimplifiedTWKB function that wraps the three function calls to remove/simplify/twkb into a single call. Doing so strips out the extra function call overhead and serialize/deserialize costs but requires Mapnik to request features using the custom function.
Since this approach is very specific to just one use case (Mapnik-based rendering) it's been kept out of the general release of PostGIS. The correct way to lower function chaining overhead is probably to implement PostgreSQL "expanded object headers" support for functions that are commonly chained. That way we avoid an explosion of new functions that do nothing but chain together existing functions.
MVT Rendering
As we add vector rendering to the client side of the CARTO stack we are simultaneously transitioning Mapnik out of the request chain: why ask Mapnik to "render" vector tiles for us when we can now request them directly from the database via ST_AsMVT()?
Just as the Mapnik query does the MVT process applies several simplifications to the raw geometry before encoding it into the final transfer format. However for MVT the transformations are hidden within the ST_AsMVTGeom() function.
Here is the test SQL used to generate the load for profiling:
{% highlight sql %}SELECT length(ST_AsMVT(a)) FROM ( SELECT ST_AsMVTGeom(p.geom b.geom 4096 0 true) FROM pt_table_1000000 p ( SELECT ST_SetSRID(ST_MakeEnvelope(1500 1500 8500 8500) 26910) AS geom ) AS b WHERE p.geom && b.geom AND p.geom IS NOT NULL ) a;{% endhighlight %}
Within the ST_AsMVTGeom() function the geometries are translated into tile coordinates clipped to the tile bounds and snapped to the grid resolution. Polygons are checked for validity and their rings are consistently oriented.
Some of this processing results in unanticipated copying as in the previous tests. There are also large overheads in running validity checks and repairs on polygons. To speed up MVT generation in our custom branch I made the following changes:
- Apply repeated point removal and simplification prior to translation and snapping.
- Avoid copying in all the transformations.
- Remove polygon validity tests and repairs.
The last change is the most controversial. There is no doubt that simplification and then snapping into the MVT grid can result in invalid output polygons. However the computational cost applied over all cases is very high and the number of cases where this happens is very low. On balance we want to see how frequently we encounter rendering issues due to invalidity in the real world before committing to the cost of validity enforcement on all polygon geometry.
The performance changes for polygons in particular are substantial:
{:.tb__content}
There are still potentially some gains to be made in reviewing the actual MVT format encoding as thus far all the attention has been on the ST_AsMVTGeom() transformation step.
Next Steps
At some point there will be no more blood to squeeze out of the PostGIS performance stone but for now a few places still hold some promise:
- Frequently used function chains can be optimized by enabling "expanded object headers" for their inputs and outputs.
- The actual production of MVT encoding may have some optimizations available.
The biggest win will probably come from changing some assumptions about how to pass large sets of geometry through the system: for points in particular aggregating during the query phase and only encoding a limited subset of features can radically reduce the amount of data to be transmitted and rendered.