MVT generation: Mapnik vs PostGIS

Summary

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
MVT generation: Mapnik vs PostGIS

With the last release of PostGIS  a new set of functions were added to generate Mapbox vector tiles (MVT). The most important of these functions is ST_AsMVT which allows users to create a full tile from a SQL query. Björn Harrtell completed the majority of the project  with partial sponsorship from CARTO.

You might be wondering  "What's an MVT"? A map is usually created with a set of small images  called tiles  that are rendered together to create a mosaic. MVT's  instead of using small images  are created using the data that is stored in those tiles. That allows for rendering of maps client-side with WebGL (BTW  we are hiring someone who loves rendering stuff on the client or any other render technology.

At CARTO  we use Mapnik to render those tiles using a combination of Postgres & PostGIS. We wondered if rendering those tiles straight from Postgres/PostGIS would be faster.

The benchmark

First of all  I created a benchmark  took it as a baseline  and then compared PostGIS'sST_AsMVT against Mapnik's MVT renderer.

The benchmark was to generate a tile with a lot ofpoints  and then check how long it takes to render.

So  let's get our hands dirty and create a tile with 100K points:

CREATE TABLE test_100k_points AS
  SELECT (ST_Dump(points)).geom AS the_geom FROM ST_GeneratePoints(cartodb.CDB_XYZ_Extent(0  0  16)  100000) as points;

SELECT CDB_CartodbfyTable('test_100k_points');

Mapnik

Let's now request the tile with 100K points using our Maps API:

$ curl -v -o /dev/null \
  "http://development.localhost.lan:8181/api/v1/map/${layergroup_id}/1/16/0/0.mvt"
#...
< X-Tiler-Profiler: {"setDBAuth":2 "res":221 "getTileOrGrid":221 "render-mvt":221 "total":223}
#...

Out of 223 ms  what portion of that time is spent in DBquerying features versus encoding them into the MVT?

Let's instrument mapnik in order to help answer that question.

First  we have to locate the interesting pieces of code:

  • here's the datasource that queries the database
  • and here's the processor  which encodes vector tiles

Using systemtap we can hook into those functions to getan idea of how much time is spent retrieving features versus encodingthem into an MVT. Here's a simple script to do that:

# File mapnik.stp
# based on https://sourceware.org/systemtap/examples/profiling/fntimes.stp

function time() { return gettimeofday_us() }

function print_time(t)
{
    printf("function %s took %d us\n"  ppfunc()  t)
}

probe process("node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("*get_resultset*").return
{
    print_time(time() - @entry(time()));
}

probe process("node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik.node").function("*update_tile*").return
{
    print_time(time() - @entry(time()));
}

then we can request the same tile  and gather execution times:

$ sudo stap mapnik.stp
function _ZNK18postgis_datasource13get_resultsetERSt10shared_ptrI10ConnectionERKSsRKS0_IN6mapnik4PoolIS1_17ConnectionCreatorEEES0_INS6_17IProcessorContextEE took 93423 us
function _ZN6mapnik16vector_tile_impl9processor11update_tileERNS0_4tileEdii took 216795 us

This tells us that the update_tile function takes about 217 ms  ofwhich 93 ms are spent retrieving features from the database.

If we execute the query used to retrieve features directly from the database weget the following numbers:

EXPLAIN ANALYZE
  SELECT
    ST_AsBinary(ST_Simplify(ST_SnapToGrid("the_geom_webmercator" 0.0597164) 0.119433))
      AS geom "cartodb_id"
    FROM (SELECT * FROM test_100k_points) as cdbq
    WHERE "the_geom_webmercator" &&
      ST_SetSRID('BOX3D(-20037508.3
      20036743.97250642 -20036743.97250639 20037508.3)'::box3d  3857);
                                                                                                                                                             QUERY PLAN                                                                                                                                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on test_100k_points  (cost=0.00..5482.66 rows=99922 width=40) (actual time=0.015..63.357 rows=100000 loops=1)
   Filter: (the_geom_webmercator && '01030000A0110F00000100000005000000CDCCCC44F81B73C1E4628F7FC81B73410000000000000000CDCCCC44F81B73C1CDCCCC44F81B73410000000000000000DC628F7FC81B73C1CDCCCC44F81B73410000000000000000DC628F7FC81B73C1E4628F7FC81B73410000000000000000CDCCCC44F81B73C1E4628F7FC81B73410000000000000000'::geometry)
 Planning time: 0.735 ms
 Execution time: 66.432 ms
(4 rows)

that is significantly faster than the  execution times reported inthe database log:

2017-08-09 18:18:05.420 CEST [7662] LOG:  duration: 98.150 ms  execute <unnamed>: SELECT ST_AsBinary(ST_Simplify(ST_SnapToGrid("the_geom_webmercator" 0.0597164)  0.119433)) AS geom "cartodb_id" FROM (SELECT * FROM test_100k_points) as cdbq WHERE "the_geom_webmercator" && ST_SetSRID('BOX3D(-20037508.3 20036896.84656299 -20036896.84656296 20037508.3)'::box3d  3857)

This is due to the mapnik postgis input plugin iteratingon the result set from postgres WKB format  and transforming it to themapnik internal format.

Don't freak out at those magic numbers in the ST_Simplify call  they are calculated using the tile zoom to simplify the geometry to a resolution enough to be rendered correctly but without sending too many vertices to the rendering process.

So  roughly speaking  out of those 223 ms:

  • 93 ms are spent retrieving data and transforming it
  • 124 ms are spent on the actual encoding of the tile
  • the rest (around 6 ms) is overhead because of authorization  datatransfer  etc.

ST_AsMVT

What if we made a similar query to spit the MVT straight from the database?

EXPLAIN ANALYZE
SELECT ST_AsMVT('test'  4096  'geom'  q)
  FROM (SELECT ST_Simplify(ST_SnapToGrid("the_geom_webmercator" 0.0597164)  0.119433) AS geom "cartodb_id" FROM (SELECT * FROM test_100k_points) as cdbq WHERE "the_geom_webmercator" && ST_SetSRID('BOX3D(-20037508.3 20036743.97250642 -20036743.97250639 20037508.3)'::box3d  3857)) as q;
                                                                                                                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------
 Aggregate  (cost=3234.42..3234.43 rows=1 width=32) (actual time=185.818..185.818 rows=1 loops=1)
   ->  Seq Scan on test_100k_points  (cost=0.00..2485.00 rows=99922 width=40) (actual time=0.039..26.918 rows=100000 loops=1)
         Filter: (the_geom_webmercator && '01030000A0110F00000100000005000000CDCCCC44F81B73C1E4628F7FC81B73410000000000000000CDCCCC44F81B73C1CDCCCC44F81B73410000000000000000DC628F7FC81B73C1CDCCCC44F81B73410000000000000000DC628F7FC81B73C1E4628F7FC81B73410000000000000000CDCCCC44F81B73C1E4628F7FC81B73410000000000000000'
::geometry)
 Planning time: 0.865 ms
 Execution time: 187.031 ms
(5 rows)

Note we removed the ST_AsBinary. It is not required by ST_AsMVT as it works with PostGIS native geometry types (saving some time as well).

We may also consider removing the ST_Simplify and ST_SnapToGrid as they aim to facilitate raster rendering and it's done by the MVT encoding mechanism. On the other hand  the equivalent query also requires a transformation from webmercator to the coordinate space of an MVT  which is achieved through ST_AsMVTGeom. The final query looks like this:

EXPLAIN ANALYZE
  SELECT ST_AsMVT('dbe9fc29-ef99-4a4f-afc2-0f652dc9985b'  4096 
  'geom'  q)
  FROM (SELECT cartodb_id  ST_AsMVTGeom(qbounds.the_geom_webmercator 
    ST_MakeBox2D(ST_Point(-20037508.5  20037508.5) 
    ST_Point(-20036897.00376892  20036897.00376892))  4096  0 false)
    AS geom
        FROM (SELECT * FROM test_100k_points WHERE
            "the_geom_webmercator" && ST_MakeEnvelope(-20037508.5 
            20037508.5  -20036897.00376892  20036897.00376892  3857))
    as qbounds
  ) AS q;
                                                                                                                        QUERY PLAN                                                                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2983.48..2983.49 rows=1 width=32) (actual time=328.254..328.255 rows=1 loops=1)
   ->  Seq Scan on test_100k_points  (cost=0.00..2485.00 rows=99696 width=40) (actual time=0.010..29.480 rows=100000 loops=1)
         Filter: (the_geom_webmercator && '0103000020110F0000010000000500000000000048F81B73C100000048F81B734100000048F81B73C100700F10D21B734100700F10D21B73C100700F10D21B734100700F10D21B73C100000048F81B734100000048F81B73C100000048F81B7341'::geometry)
 Planning time: 0.825 ms
 Execution time: 329.123 ms
(5 rows)

The execution time is significantly worse than in the previous case. Why is that?

In order to answer that question  we need to look into how ST_AsMVT and ST_AsMVTGeom are implemented:

  • ST_AsMVT is actually anaggregate.Here isan interesting discussion about that design choice for theinterface. As mentioned in that thread there are some advantages tohaving the data first before encoding. However  nothingprevents you from concatenating MVT's (e.g: from different layers)into a single tile.
  • ST_AsMVTGeom is meant to transform an arbitrary geometry into thecoordinate space of an MVT. It is kind of a requirement ofST_AsMVT to have the input geometries in such coordinatespace. Basically  it performs an affine transformation from onecoordinate space to another  plus some additional checks.

There's an excellent tutorial on how to profile postgreshere.

Using those tools we can profile our query and get an overviewof where the time is spent. Take a look at this kcachegrind screenshot:

     

Kcachegrind screenshot

 


By looking at that graph and the code  you can figure out that a lot ofthe time is spent copying  checking  and passing geometriesaround. Here's a PR thatoptimizes some of those paths for the points case.

Here are the results after applying those optimizations:

EXPLAIN ANALYZE
  SELECT ST_AsMVT('dbe9fc29-ef99-4a4f-afc2-0f652dc9985b'  4096 
  'geom'  q)
  FROM (SELECT cartodb_id  ST_AsMVTGeom(qbounds.the_geom_webmercator 
    ST_MakeBox2D(ST_Point(-20037508.5  20037508.5) 
    ST_Point(-20036897.00376892  20036897.00376892))  4096  0 false)
    AS geom
        FROM (SELECT * FROM test_100k_points WHERE
            "the_geom_webmercator" && ST_MakeEnvelope(-20037508.5 
            20037508.5  -20036897.00376892  20036897.00376892  3857))
    as qbounds
  ) AS q;
                                                                                                                        QUERY PLAN                                                                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2983.48..2983.49 rows=1 width=32) (actual time=197.396..197.396 rows=1 loops=1)
   ->  Seq Scan on test_100k_points  (cost=0.00..2485.00 rows=99696 width=40) (actual time=0.011..29.509 rows=100000 loops=1)
         Filter: (the_geom_webmercator && '0103000020110F0000010000000500000000000048F81B73C100000048F81B734100000048F81B73C100700F10D21B734100700F10D21B73C100700F10D21B734100700F10D21B73C100000048F81B734100000048F81B73C100000048F81B7341'::geometry)
 Planning time: 0.887 ms
 Execution time: 198.373 ms
(5 rows)

To sum up: around 200 ms for data retrieval and encoding of anMVT. That is about 10% faster than mapnik.

Putting all the pieces together

You can find an implementation of a Windshaft renderer (the main piece of ourMaps API) based on ST_AsMVT in the 1000x branch.

Let's run our little benchmark again using this new renderer:

$ curl -v -o /dev/null "http://development.localhost.lan:8181/api/v1/map/${layergroup_id}/1/16/0/0.mvt"
#...
< X-Tiler-Profiler: {"setDBAuth":2 "req2params":1 "res":272 "getTileOrGrid":272 "render-mvt":271 "query":271 "total":275}
#...

Ouch! The bulk of the job is meant to be done in the database (takingabout 200 ms). I expected it to have some overhead on the node sideof things but not that much. How come?

We can alsoprofile the node serverto get some insights about where the time is being spent:

     

Tiler profile

 


What you can see is some initial setup time  then the querybeing executed in the database and outside node (in grey  taking about210 ms in this very instance) and then node copying  parsing andprocessing the results of the query.

It is worth noting that  despite the node server is able to performnon-blocking IO  it has to wait for the query results and cannotbegin streaming the encoded tile until the query returns.

Actually  if we just request a tile with 10k features instead the time goes down to about 25 ms on my machine  which is on par withmapnik  and that  after all  makes perfect sense.

Conclusions

Most of the above has been centered around executionspeed. This was my main focus at the time of writing the code andexploring the possibilities.

It is true that generating the MVT's straight in the database ispotentially faster as it can avoid data transfer  conversion betweeninternal representations of the features to encode  and soon. It also opens up the possibility of offloading the encoding operation(which is CPU-intensive) from the tiler to the database. Depending onyour set up  this may be (or may be not) a good idea.

We also found out that MVT's are good for rendering client side  butnot so good when you want to render tiles with a lot of features.

However  it is not all about speed. More importantly  having MVT's inthe database as a building block lets you build the SQL on top which gives you a lot of flexibility.

I hope you enjoyed reading this post as much as I enjoyed writingit. If that is the case  you might also be interested injoining us. Thanks!