Measuring tile rendering processes to improve performance

Summary

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Measuring tile rendering processes to improve performance

One of the most important parts of our infrastructure is how we generate tiles. This process consists of two parts: gathering data and rendering tiles. As part of our mission to maintain the best possible product for our clients  we set out to improve the performance of our tiles. To do so  we needed to better understand and measure the tile rendering process. We determined that data gathering had a much wider improvement margin due to it's time and resource consumption compared to actual tile rendering.

Put simply  the tile rendering process has two main steps:

  1. Gather the required data from the database filtered by tile bounding box.
  2. Send this data to the renderer and generate the tile  which can be a raster tile (png)  a vector tile (mvt)  a grid  and so on.

Depending on the size of the table  how the data is queried  and other more minor factors  calls to the database could take anywhere from 10 milliseconds to 20 seconds. We knew we were rendering the same tiles multiple times  but we didn't know the implications of this on rendering performance. In any case  we didn't want to keep executing the same query without having a good reason to do so.  

Our first task was to understand how many times we were drawing the same tile by gathering the necessary data. The only reliable source at the time to obtain this data was our database logs and we needed to process our PostgreSQL logs and extract information such as the query execution time  the XYZ coordinates of the requested tile  the db  the affected tables  the username  etc.

There are great log analyzers for PostgreSQL  like pgbadger. However  these analyzers did not allow us to run operations such as transforming the tile bounding box to XYZ format. Because of this  we ended up creating a custom parser to better serve our needs.

Processing the PostgreSQL logs was a challenge that exceeded our original assessment. While there is a common and configurable prefix for each log line  everything else could be custom and we discovered many consistency issues. We faced a series of challenges with our approach:

  • Some queries were split between different lines and needed to be merged
  • There was a ton of information we did not need slowing down our processes
  • There were inconsistencies in line formatting
  • Prepared statements have three parts: parse  bind and execute with different execution times we needed to aggregate.
  • We had huge amounts of log data to process

Since we are working with geospatial databases  we needed to extract the XYZ value from the bounding box of the query  originally stored as coordinates in the 3857 projection:

BOX3D(-2504688.542848656 -20037508.3 20037508.3 2504688.542848656)'::box3d  3857)

In addition to this  we had to take into account the possible values for metatiling and [buffer-size]that we were using for each query to help us make better assumptions about optimization strategies.

Here is an example of a query used to generate a tile:

Mar 27 11:44:44 host postgres[13993]: [2254-1] 2017-03-27 11:44:44 GMT publicuser databasename [unknown] LOG:  duration: 2465.723 ms  execute <unnamed>: SELECT ST_AsTWKB(ST_Simplify(ST_RemoveRepeatedPoints("the_geom_webmercator" 10000) 10000 true) -4) AS geom FROM (select
Mar 27 11:44:44 host postgres[13993]: [2254-2] #011  c.the_geom 
Mar 27 11:44:44 host postgres[13993]: [2254-3] #011  st_transform(st_makevalid(c.the_geom_webmercator)  3786) as the_geom_webmercator 
Mar 27 11:44:44 host postgres[13993]: [2254-4] #011  s.iso 
Mar 27 11:44:44 host postgres[13993]: [2254-5] #011  s.score
Mar 27 11:44:44 host postgres[13993]: [2254-6] #011FROM scores as s JOIN countries as c on c.iso = s.iso
Mar 27 11:44:44 host postgres[13993]: [2254-7] #011WHERE filter='foo'
Mar 27 11:44:44 host postgres[13993]: [2254-8] #011) as cdbq WHERE "the_geom_webmercator" && ST_SetSRID('BOX3D(-2504688.542848656 -20037508.3 20037508.3 2504688.542848656)'::box3d  3857)

and this is the result in CSV format after we process the log:

host database user tables timestamp duration bbox x y z query update basemaps
host databasename publicuser "{countries scores}" 2017-03-27 11:44:45 2313.126 "-2504688.542848656 -2504688.542848656 20037508.3 20037508.3" 2 0 2 "SELECT ST_AsTWKB(ST_Simplify(ST_RemoveRepeatedPoints(""the_geom_webmercator"" 10000) 10000 true) -4) AS geom FROM (select c.the_geom  st_transform(st_makevalid(c.the_geom_webmercator)  3786) as the_geom_webmercator  s.iso  s.score FROM scores as s JOIN countries as c on c.iso = s.iso WHERE filter='foo' ) as cdbq WHERE ""the_geom_webmercator"" && ST_MakeEnvelope(-2504688.542848656 -2504688.542848656 20037508.3 20037508.3 3857)" False False

As you can see  this gives us more user friendly data to work with.

Once we had processed all the logs  we loaded the resulting data into the database which then allowed us to extract the most useful information. Now we know query runtimes and how many times we're generating the same tile to help guide us in improving the process.

Do you want to know more about how we improve our tile generation process? More CARTO blogpost are coming to our Inside blog.

Did you like it? Join us