Measuring tile rendering processes to improve performance

Core Tech

mins read

Measuring tile rendering processes to improve performance

One of the most important parts of our infrastructure is how we generate tiles. This process consists of two parts: gathering data and rendering tiles. As part of our mission to maintain the best possible product for our clients we set out to improve the performance of our tiles. To do so we needed to better understand and measure the tile rendering process. We determined that data gathering had a much wider improvement margin due to it's time and resource consumption compared to actual tile rendering.

Put simply the tile rendering process has two main steps:

Gather the required data from the database filtered by tile bounding box.
Send this data to the renderer and generate the tile which can be a raster tile (png) a vector tile (mvt) a grid and so on.

Depending on the size of the table how the data is queried and other more minor factors calls to the database could take anywhere from 10 milliseconds to 20 seconds. We knew we were rendering the same tiles multiple times but we didn't know the implications of this on rendering performance. In any case we didn't want to keep executing the same query without having a good reason to do so.

Our first task was to understand how many times we were drawing the same tile by gathering the necessary data. The only reliable source at the time to obtain this data was our database logs and we needed to process our PostgreSQL logs and extract information such as the query execution time the XYZ coordinates of the requested tile the db the affected tables the username etc.

There are great log analyzers for PostgreSQL like pgbadger. However these analyzers did not allow us to run operations such as transforming the tile bounding box to XYZ format. Because of this we ended up creating a custom parser to better serve our needs.

Processing the PostgreSQL logs was a challenge that exceeded our original assessment. While there is a common and configurable prefix for each log line everything else could be custom and we discovered many consistency issues. We faced a series of challenges with our approach:

Some queries were split between different lines and needed to be merged
There was a ton of information we did not need slowing down our processes
There were inconsistencies in line formatting
Prepared statements have three parts: parse bind and execute with different execution times we needed to aggregate.
We had huge amounts of log data to process

Since we are working with geospatial databases we needed to extract the XYZ value from the bounding box of the query originally stored as coordinates in the 3857 projection:

BOX3D(-2504688.542848656 -20037508.3 20037508.3 2504688.542848656)'::box3d  3857)

In addition to this we had to take into account the possible values for metatiling and [buffer-size]that we were using for each query to help us make better assumptions about optimization strategies.

Here is an example of a query used to generate a tile:

Mar 27 11:44:44 host postgres[13993]: [2254-1] 2017-03-27 11:44:44 GMT publicuser databasename [unknown] LOG:  duration: 2465.723 ms  execute <unnamed>: SELECT ST_AsTWKB(ST_Simplify(ST_RemoveRepeatedPoints("the_geom_webmercator" 10000) 10000 true) -4) AS geom FROM (select
Mar 27 11:44:44 host postgres[13993]: [2254-2] #011  c.the_geom 
Mar 27 11:44:44 host postgres[13993]: [2254-3] #011  st_transform(st_makevalid(c.the_geom_webmercator)  3786) as the_geom_webmercator 
Mar 27 11:44:44 host postgres[13993]: [2254-4] #011  s.iso 
Mar 27 11:44:44 host postgres[13993]: [2254-5] #011  s.score
Mar 27 11:44:44 host postgres[13993]: [2254-6] #011FROM scores as s JOIN countries as c on c.iso = s.iso
Mar 27 11:44:44 host postgres[13993]: [2254-7] #011WHERE filter='foo'
Mar 27 11:44:44 host postgres[13993]: [2254-8] #011) as cdbq WHERE "the_geom_webmercator" && ST_SetSRID('BOX3D(-2504688.542848656 -20037508.3 20037508.3 2504688.542848656)'::box3d  3857)

and this is the result in CSV format after we process the log:

host database user tables timestamp duration bbox x y z query update basemaps
host databasename publicuser "{countries scores}" 2017-03-27 11:44:45 2313.126 "-2504688.542848656 -2504688.542848656 20037508.3 20037508.3" 2 0 2 "SELECT ST_AsTWKB(ST_Simplify(ST_RemoveRepeatedPoints(""the_geom_webmercator"" 10000) 10000 true) -4) AS geom FROM (select c.the_geom  st_transform(st_makevalid(c.the_geom_webmercator)  3786) as the_geom_webmercator  s.iso  s.score FROM scores as s JOIN countries as c on c.iso = s.iso WHERE filter='foo' ) as cdbq WHERE ""the_geom_webmercator"" && ST_MakeEnvelope(-2504688.542848656 -2504688.542848656 20037508.3 20037508.3 3857)" False False

As you can see this gives us more user friendly data to work with.

Once we had processed all the logs we loaded the resulting data into the database which then allowed us to extract the most useful information. Now we know query runtimes and how many times we're generating the same tile to help guide us in improving the process.

Do you want to know more about how we improve our tile generation process? More CARTO blogpost are coming to our Inside blog.

Did you like it? Join us

Don’t forget to share this post on Twitter, Facebook and Linkedin!

About the author

More Posts from

About the author

Provided by our community, industry experts, or the CARTO Team, these blog posts cover the entire spectrum of spatial analysis. From location intelligence to GIS, spatial data science, industry trends, and much more, we’ve crafted relevant content to accompany you at every stage of your journey, whether you have a technical or business background. With our Blog, you are one step closer to taking spatial analysis to the next level.

About the author

About the author

About the author

About the author

About the author