Geospatial processing with Clickhouse, take 2

Summary

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Geospatial processing with Clickhouse, take 2

In an earlier post  we talked about the potential of Clickhouse as a database for a GIS. Today we want to talk about some of the design decisions we made and some of the iterations we did to improve the performance of the system.

Map tiles

In CARTO  as any other map web service  we use a Web Mercator projection to generate our maps. This projection takes several shortcuts and limitations to reduce calculations and increase speed greatly and leaves us with 2 interesting properties important to us throughout this post:

  • There are limited zoom levels. That is  you cannot select a special resolution outside the ones provided by the system. For example  CARTO Builder is currently setup to allow zooms from 1 to 18  which is from around 78km per pixel to around 0.5 ground meters per pixel.
  • The map is divided in cache-friendly tiles so for a certain zoom level you only get a small amount of tiles. When you want to show a map  the browser will get all the pre-established tiles to overlap the screen. The most common tile size is 256x256.

The quadtree keys  or quadkeys  system was designed with these two concepts in mind and it also divides by zoom levels and tiles. At level 1  it divides the map in 4 tiles (0 to 4  or 00 to 11 in binary) and  at each subsequent zoom layer  it divides every sub-tile in 4 parts. This can easily be encoded in a 64 bit integer by using 2 bits per level.

Simple count

Let's suppose you have a ~400 million points Clickhouse database in a single server (Intel E5-2670  32G RAM) and another single server to run CARTO Builder on top (specs not important for our tests) and you want to show some of the data in the map.When you load a map  several requests for tiles are created in this fashion:

http://{$BUILDER_DOMAIN}/api/v1/map/{$RESOURCE_ID}:0/1/13/2101/3047.png?api_key={$API_KEY}

This is essentially asking for the tile (2101 / 3047) at zoom level 13. Once translated to webmercator coordinates it means that it is asking for the tile between (-9759479.771451304  5126784.361143341) and (-9754587.801641053  5131676.330953592).

In a straightforward way  this is translated in a query like this one:

SELECT count(1)
FROM table
WHERE ((wm_x >= -9759479.771451304)
  AND (wm_x <= -9754587.801641053))
  AND ((wm_y >= 5126784.361143341)
  AND (wm_y <= 5131676.330953592))

--count(1)--
|  2100094 |
------------

1 rows in set. Elapsed: 0.976 sec. Processed 387.18 million rows  2.19 GB (396.88 million rows/s.  2.25 GB/s.)

Making use of the common properties between quadkey and the projection  we now that this tile is taking at zoom level 13 so if we translate the coordinates we see that it is formed by (0x0CAAD3B000000000 to 0x0CAAD3BFFFFFFFFF)  which you can check with an AND operator:

SELECT count(1)
FROM table
WHERE bitAnd(quadkey  4611685949707911168) = 912774627363651584

--count(1)--
|  2100094 |
------------

1 rows in set. Elapsed: 0.854 sec. Processed 387.18 million rows  3.10 GB (453.39 million rows/s.  3.63 GB/s.)

And now  as we decided to add the quadkey to the index but it only works with simple operations (comparisons)  we'll translate the coordinates to quadkeys (you can use our OSS library to do it):

SELECT count(1)
FROM table
WHERE (quadkey >= 912774627363651584) AND (quadkey <= 912774696083128319)

--count(1)--
|  2100094 |
------------

1 rows in set. Elapsed: 0.212 sec. Processed 2.13 million rows  17.01 MB (10.04 million rows/s.  80.29 MB/s.)

This is a 5x improvement  mainly because we have reduced the number of read rows from 387M to just 2M and the index is always in memory. I'll explain later what index we use and why  but you can find a nice explanation of how they work in this Medium post.

Actual tiles

These are pretty good times  but we don't need the count of the points in the requested tile. Instead  we need the count of matches for each pixel (256x256 = 65536 pixels) so we will aggregate them directly in Clickhouse to make use of its parallelization.

To aggregate the points into 65k we use the properties of quadkeys again: Since we want 2^16 points and we know each zoom level uses 2 bits  we can use the mask for our level + 8 (13 + 8 in this example) to aggregate them. In this case  the mask for level 13 + 8 is 0x3FFFFFFFFFF00000 or 4611686018426339328):

SELECT
    bitAnd(quadkey  4611686018426339328) AS quadkey_g 
    count(1)
FROM table
WHERE (quadkey >= 912774627363651584) AND (quadkey <= 912774696083128319)
GROUP BY quadkey_g

-----------quadkey_g---count(1)--
| 912774691671769088 |        5 |
| 912774695507460096 |       16 |
...

51561 rows in set. Elapsed: 0.797 sec. Processed 2.13 million rows  17.01 MB (2.67 million rows/s.  21.33 MB/s.)

Note that 0x0CAAD3BFFFF00000 would also be a valid mask in this case and that  since we only encode 31 levels of zoom  the first 2 bits of any quadkey are always 0.

SAMPLING

For big datasets  there is another nice Clickhouse feature that came in handy: sampling. You can SAMPLE for an approximate part of the data (e.g. SAMPLE 0.5 for 50%) for the number of rows read from the database (e.g SAMPLE 1000000 for 1M rows)  thus reducing the reads from disk.

SAMPLE is highly related to the database index  as the sampling expression is part of the primary key and that leave us with 2 options: Use a sampling function that makes sure close points are also close in the index (for example using some form of bitAnd(quadkey  appropiate_zoom_level)) or use a sampling expression to make sure that  when we wanted to take a sample of the data  it is heterogeneous so we can quickly extrapolate results for the whole table.

In our case we initially went for grouping the nearest points together:

ENGINE = MergeTree(date_index  bitAnd(quadkey  65535)  (quadkey  date_index  bitAnd(quadkey  65535))  8192)

But after some testing  and considering that we would generate each tile only once (that is what caches are there for) and widgets like histograms or categories are recalculated with every screen movement  we chose to disperse near points so sampling worked better and allowed us to read less rows:

ENGINE = MergeTree(date_index  intHash64(quadkey)  (quadkey  date_index  intHash64(quadkey))  8192)

This change alone made tile calculations around 2-5% slower  but histogram queries became ~30% faster.

Our way to make up for that loss in tile calculation was to introduce a SAMPLE based on the zoom level  but instead of using it to read a percentage of the database we limited the amount of rows read to 256 per possible pixel at that zoom level. For example  at zoom level 3 we have 2^3 tiles  each of those have 65k pixels  so: 256 rows/pixel * 2^16 pixel/tile * 2^3 tiles = 2^(8+16+3) = 134217728 rows:

SELECT
    bitAnd(quadkey  4611684918915760128) AS quadkey_g 
    count(1)
FROM table
SAMPLE 134217728
WHERE (quadkey >= 864691128455135232) AND (quadkey <= 936748722493063167)
GROUP BY quadkey_g

28 rows in set. Elapsed: 1.515 sec. Processed 291.40 million rows  2.33 GB (192.33 million rows/s.  1.54 GB/s.)

This is always dependent of your data and sampling function  but in this specific query we reduced the number of read rows by ~25% which meant around 10% faster response with an output good enough for tile data.

Widgets

Once we managed to have fast enough tiles  it became clear that we needed to speed up those queries related to the widgets.  We have to show information next to the map as they where  not only slower than tiles  but sometimes even taking minutes to complete.

Histograms

A histogram is just a division in equal parts of a certain column and counting the number of rows associated to each bucket. To do that you need to know the min and max values (to know the bucket width) and then generate the buckets.

To calculate the min and max values we decided to use quantiles to get rid of possible odd values in the data  and since we needed some extra information we used a query like this:

SELECT
    quantiles(0.01  0.99)(assumeNotNull(acc)) AS min_max_val 
    avg(acc) AS avg_val 
    count(1) * 10 AS total_rows 
    countIf(isNull(acc)) AS nulls_count
FROM table

--min_max_val---avg_val---total_rows---nulls_count--
| [0 1449.36] |  78.31  | 3871776840 |           0 |
----------------------------------------------------

1 rows in set. Elapsed: 3.024 sec. Processed 387.18 million rows  3.48 GB (128.05 million rows/s.  1.15 GB/s.)

The first thought to optimise this query is to use sampling to cut the read rows  but it is simpler and faster if you store the return values you really need (min and max) and calculate the others (total and average) from the following query. This way  except for the first time  we don't have to make this query and we can simply reuse the min_max_val we have stored.

Once you have these values  you can request the actual buckets:

SELECT
    greatest(1  least(floor((10 * (acc - 0)) / (1449.3600000000006 - 0)) + 1  10)) - 1 AS bin 
    count(*) * 10 AS freq
FROM table
WHERE (((wm_x >= -9776066.60659)
  AND (wm_x <= -9733567.61886))
  AND ((wm_y >= 5097050.35714)
  AND (wm_y <= 5132517.13826)))
  AND isNotNull(acc)
GROUP BY bin
ORDER BY bin ASC

10 rows in set. Elapsed: 2.384 sec. Processed 387.18 million rows  3.86 GB (162.41 million rows/s.  1.62 GB/s.)

With a 10% sampling the query is ~25% faster (1.716 sec  283.64M rows) but we can be even faster if we do some of the math beforehand and  using the fact that quadkeys are sampled and in the index keys  we can add a prefilter by the quadkeys that intersect with our coordinates (OR conditions):

So  before all the changes we were at 3.0 + 2.3 and now we are just a merely above 1 second  which is ~80% less time per query (except the first one  as we don't have the needed max and min values yet) or 400% more queries in the same time.

Aggregations

One of the useful widgets we have is the option to aggregate some column from the data to see what the main categories are and how many points are included. Here we initially followed this line of work:

  1. Get the number of totals rows and null rows for that category.
SELECT
    count(1) AS total_rows 
    countIf(isNull(browser)) AS nulls_count
FROM table
WHERE (((wm_x >= -9848911.09454)
  AND (wm_x <= -9678915.14364))
  AND ((wm_y >= 5070068.08616)
  AND (wm_y <= 5159499.40925)))
  AND (((quadkey >= 912752980728479744)
  AND (quadkey <= 912823349472657407))
  OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
  OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983)))

--total_rows---nulls_count--
|  381047705 |           0 |
----------------------------

1 rows in set. Elapsed: 5.168 sec. Processed 387.18 million rows  11.48 GB (74.92 million rows/s.  2.22 GB/s.)
  1. Get the count of the most frequent and most infrequent categories
SELECT
    max(value) AS max_val 
    min(value) AS min_val
FROM
(
    SELECT
        browser AS category 
        count(1) AS value 
    FROM table
    WHERE (((wm_x >= -9848911.09454)
      AND (wm_x <= -9678915.14364))
      AND ((wm_y >= 5070068.08616)
      AND (wm_y <= 5159499.40925)))
      AND (((quadkey >= 912752980728479744)
      AND (quadkey <= 912823349472657407))
      OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
      OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983))) AND isNotNull(browser)
    GROUP BY category
    ORDER BY value DESC
)

----max_val-----min_val--
| 208418569 | 172629136 |
-------------------------

1 rows in set. Elapsed: 7.792 sec. Processed 387.18 million rows  11.62 GB (49.69 million rows/s.  1.49 GB/s.)
  1. Get all the categories with their count
SELECT
    browser AS category 
    count(1) AS value
FROM table
WHERE (((wm_x >= -9848911.09454)
  AND (wm_x <= -9678915.14364))
  AND ((wm_y >= 5070068.08616)
  AND (wm_y <= 5159499.40925)))
  AND (((quadkey >= 912752980728479744)
  AND (quadkey <= 912823349472657407))
  OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
  OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983)))
  AND isNotNull(browser)
GROUP BY category
ORDER BY value DESC

--category-------value--
| cat1     | 208418569 |
| cat2     | 172629136 |
------------------------


2 rows in set. Elapsed: 7.837 sec. Processed 387.18 million rows  11.62 GB (49.40 million rows/s.  1.48 GB/s.)

This procedure is not very efficient  but with small datasets (in the thousands of points) it is not that important as you end up hitting caches and the calculations are fast enough. As you might guess  the bigger the database  the bigger the problem  so now we have a 1000x bigger problem.Easy enough  all three queries can be simplified to just the last one  use the first row as the max and the last row as the min value and adding everything for the full count. From 5.1 + 7.8 + 7.8 to just 7.8 seconds: 60% less time.

Although this was an important issue  it wasn't the most noticeable one as  even after reduction  there were some moments where the connection with the database would slow to a crawl. Here is way:Suppose you have a table with 400 million rows and you decide to count the different values of a column that are present in an average of 500 rows each; that gives us 800 000 different categories.Now  let's suppose that to send this output you use JSON and that  each generated output row is about 50 Bytes (some kind of id plus the count  both in UTF-8)  so the full output is 40 MB of data that needs to be transmited  out of which you are going to discard 99% of it as you only show the top 6.

While reading the values from the column and choosing the top categories is not avoidable  we can make sure we return just what we really need with WITH TOTALS or LIMIT the query to a reasonable amount of categories:

SELECT
    id AS category 
    count(1) * 2. AS value
FROM table
SAMPLE 5 / 10
WHERE (((wm_x >= -9848911.09454)
  AND (wm_x <= -9678915.14364))
  AND ((wm_y >= 5070068.08616)
  AND (wm_y <= 5159499.40925)))
  AND (((quadkey >= 912752980728479744)
  AND (quadkey <= 912823349472657407))
  OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
  OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983)))
GROUP BY category
  WITH TOTALS
ORDER BY value DESC
LIMIT 6

--category---------------------------------value--
| 00000000-0000-0000-0000-000000000000 | 2989142 |
| 00000000-0000-0000-0000-000000000001 | 2441722 |
| 00000000-0000-0000-0000-000000000002 | 2258138 |
| 00000000-0000-0000-0000-000000000003 | 1326528 |
| 00000000-0000-0000-0000-000000000004 | 1209742 |
| 00000000-0000-0000-0000-000000000005 | 1138014 |
--------------------------------------------------

Totals:
--category-------value--
| \N       | 333010618 |
-----------┴------------

6 rows in set. Elapsed: 12.023 sec. Processed 307.31 million rows  18.74 GB (25.56 million rows/s.  1.56 GB/s.)

In this kind of cases we are now over 90% faster  since we went from over 2 minutes and a half to 12 seconds  which is still high but considering we are using a standalone server for these tests  we're happy with the initial results.

If you liked this post you might be interested in joining us.