It has rained a lot since we last wrote about MVT encoders performance. Taking advantage of the recent Postgis 2.5 release, we decided to update our measurements and explain why we made the switch and started generating vector tiles in the database.
A few weeks ago we decided to change our main tiler (Windshaft) to use Postgis’
St_AsMVT instead of
node-mapnik as the default technology to generate MVT. Some of the reasons behind this decision were:
Our main concern when preparing the transition was to have the process be as seamless as possible, which meant having the tiles generated by
St_AsMVT match the ones from
Mapnik as close as reasonably possible.
Aside from multiple bugs fixed in Postgis 2.4.6 and 2.5, we introduced various performance improvements as a PARALLEL implementation. Tile by tile comparison even led us to find and address issues in our Mapnik stack.
Nevertheless, there are still some differences between the two renderers:
simplify_distancedistance parameter to
idto each feature by generating it sequentially by tile. Postgis doesn’t add it as it’s optional per the MVT spec. In the next major release of Postgis (3.0) you will be able to assign it using a table column.
As we mentioned before, one of the reasons that move us to use Postgis to generate vector tiles was performance and the graph at the header of the post shows the impact that it had in some of our platform benchmarks in production.
We set up a simple framework to test performance, not only to compare Mapnik and Postgis encoders but also different iterations of St_AsMVT.
To be able to compare distinct geometry types, we loaded several public datasets into the database using CARTO’s Import API:
We used simple
SELECT queries as any complexity added there would affect both encoders in the same way.
We launched http requests against the tiler, running each different one at least 50 times and 30 seconds, and discarded the first 5 iterations to reduce the impact of hard drive cache misses.
The requests cover a different amount of geometries per tile and, in some cases, different amount of columns to be able to see the impact of attribute encoding. Since we observed a clear difference when geometries are discarded because of their size, we also had different zoom levels.
The specs of the machine are:
3.0.0dev r16981, GEOS
Here are some of the most representative graphs derived from our tests:
St_AsMVTGeomwent spent in
[0/0/0], it can be up to 20x faster.
One of the important aspects to consider when generating vector tiles is that their size and the time it takes to generate them it is highly tied to the number of properties that each geometry has associated. As seen above, the same point tile generated using St_AsMVT takes 9x more time when moving from 5 properties to 42. Consider optimizing your requests by simply removing unnecessary columns from the SQL queries.
A second element to consider is that since points aren’t simplified automatically by zoom level, the lower the zoom the more points were included in each tile. Point aggregations are a good way to work around this and improve performance by encoding too many points that are going to be rendered in the same place anyway.
Another idea we considered in the past was disabling polygon validation but we have discarded it as one single invalid polygon can pollute whole visualizations. It would be interesting to analyze why Postgis validation (based on GEOS) is way slower than Mapnik’s (based on Boost), and addressing this would benefit multiple SQL functions that use it, like
As I final note, beware that Mapbox Vector Tile Specification version 3 is currently under development. It will bring new feature types and improvements to reduce tile size, but it will require adapting encoders and decoders which could impact performance.
Our primary goal at CARTO is to be the world’s leading Location Intelligence platform, empowering our clients with the best data and the best spatial analysis. We frequentl...Core Tech
Please fill out the below form and we'll be in touch real soon.