In an earlier post, we talked about the potential of Clickhouse as a database for a GIS. Today we want to talk about some of the design decisions we made and some of the iterations we did to improve the performance of the system.
In CARTO, as any other map web service, we use a Web Mercator projection to generate our maps. This projection takes several shortcuts and limitations to reduce calculations and increase speed greatly and leaves us with 2 interesting properties important to us throughout this post:
The quadtree keys, or quadkeys, system was designed with these two concepts in mind and it also divides by zoom levels and tiles. At level 1, it divides the map in 4 tiles (0 to 4, or 00 to 11 in binary) and, at each subsequent zoom layer, it divides every sub-tile in 4 parts. This can easily be encoded in a 64 bit integer by using 2 bits per level.
Let’s suppose you have a ~400 million points Clickhouse database in a single server (Intel E5-2670, 32G RAM) and another single server to run CARTO Builder on top (specs not important for our tests) and you want to show some of the data in the map. When you load a map, several requests for tiles are created in this fashion:
http://{$BUILDER_DOMAIN}/api/v1/map/{$RESOURCE_ID}:0/1/13/2101/3047.png?api_key={$API_KEY}
This is essentially asking for the tile (2101 / 3047) at zoom level 13. Once translated to webmercator coordinates it means that it is asking for the tile between (-9759479.771451304, 5126784.361143341)
and (-9754587.801641053, 5131676.330953592)
.
In a straightforward way, this is translated in a query like this one:
SELECT count(1)
FROM table
WHERE ((wm_x >= -9759479.771451304)
AND (wm_x <= -9754587.801641053))
AND ((wm_y >= 5126784.361143341)
AND (wm_y <= 5131676.330953592))
--count(1)--
| 2100094 |
------------
1 rows in set. Elapsed: 0.976 sec. Processed 387.18 million rows, 2.19 GB (396.88 million rows/s., 2.25 GB/s.)
Making use of the common properties between quadkey and the projection, we now that this tile is taking at zoom level 13 so if we translate the coordinates we see that it is formed by (0x0CAAD3B000000000 to 0x0CAAD3BFFFFFFFFF), which you can check with an AND operator:
SELECT count(1)
FROM table
WHERE bitAnd(quadkey, 4611685949707911168) = 912774627363651584
--count(1)--
| 2100094 |
------------
1 rows in set. Elapsed: 0.854 sec. Processed 387.18 million rows, 3.10 GB (453.39 million rows/s., 3.63 GB/s.)
And now, as we decided to add the quadkey to the index but it only works with simple operations (comparisons), we’ll translate the coordinates to quadkeys (you can use our OSS library to do it):
SELECT count(1)
FROM table
WHERE (quadkey >= 912774627363651584) AND (quadkey <= 912774696083128319)
--count(1)--
| 2100094 |
------------
1 rows in set. Elapsed: 0.212 sec. Processed 2.13 million rows, 17.01 MB (10.04 million rows/s., 80.29 MB/s.)
This is a 5x improvement, mainly because we have reduced the number of read rows from 387M to just 2M and the index is always in memory. I’ll explain later what index we use and why, but you can find a nice explanation of how they work in this Medium post.
These are pretty good times, but we don’t need the count of the points in the requested tile. Instead, we need the count of matches for each pixel (256x256 = 65536 pixels) so we will aggregate them directly in Clickhouse to make use of its parallelization.
To aggregate the points into 65k we use the properties of quadkeys again: Since we want 2^16 points and we know each zoom level uses 2 bits, we can use the mask for our level + 8 (13 + 8 in this example) to aggregate them. In this case, the mask for level 13 + 8 is 0x3FFFFFFFFFF00000 or 4611686018426339328):
SELECT
bitAnd(quadkey, 4611686018426339328) AS quadkey_g,
count(1)
FROM table
WHERE (quadkey >= 912774627363651584) AND (quadkey <= 912774696083128319)
GROUP BY quadkey_g
-----------quadkey_g---count(1)--
| 912774691671769088 | 5 |
| 912774695507460096 | 16 |
...
51561 rows in set. Elapsed: 0.797 sec. Processed 2.13 million rows, 17.01 MB (2.67 million rows/s., 21.33 MB/s.)
Note that 0x0CAAD3BFFFF00000 would also be a valid mask in this case and that, since we only encode 31 levels of zoom, the first 2 bits of any quadkey are always 0.
For big datasets, there is another nice Clickhouse feature that came in handy: sampling. You can SAMPLE for an approximate part of the data (e.g. SAMPLE 0.5 for 50%) for the number of rows read from the database (e.g SAMPLE 1000000 for 1M rows), thus reducing the reads from disk.
SAMPLE is highly related to the database index, as the sampling expression is part of the primary key and that leave us with 2 options: Use a sampling function that makes sure close points are also close in the index (for example using some form of bitAnd(quadkey, appropiate_zoom_level)
) or use a sampling expression to make sure that, when we wanted to take a sample of the data, it is heterogeneous so we can quickly extrapolate results for the whole table.
In our case we initially went for grouping the nearest points together:
ENGINE = MergeTree(date_index, bitAnd(quadkey, 65535), (quadkey, date_index, bitAnd(quadkey, 65535)), 8192)
But after some testing, and considering that we would generate each tile only once (that is what caches are there for) and widgets like histograms or categories are recalculated with every screen movement, we chose to disperse near points so sampling worked better and allowed us to read less rows:
ENGINE = MergeTree(date_index, intHash64(quadkey), (quadkey, date_index, intHash64(quadkey)), 8192)
This change alone made tile calculations around 2-5% slower, but histogram queries became ~30% faster.
Our way to make up for that loss in tile calculation was to introduce a SAMPLE based on the zoom level, but instead of using it to read a percentage of the database we limited the amount of rows read to 256 per possible pixel at that zoom level. For example, at zoom level 3 we have 2^3 tiles, each of those have 65k pixels, so: 256 rows/pixel * 2^16 pixel/tile * 2^3 tiles = 2^(8+16+3) = 134217728 rows:
SELECT
bitAnd(quadkey, 4611684918915760128) AS quadkey_g,
count(1)
FROM table
SAMPLE 134217728
WHERE (quadkey >= 864691128455135232) AND (quadkey <= 936748722493063167)
GROUP BY quadkey_g
28 rows in set. Elapsed: 1.515 sec. Processed 291.40 million rows, 2.33 GB (192.33 million rows/s., 1.54 GB/s.)
This is always dependent of your data and sampling function, but in this specific query we reduced the number of read rows by ~25% which meant around 10% faster response with an output good enough for tile data.
Once we managed to have fast enough tiles, it became clear that we needed to speed up those queries related to the widgets. We have to show information next to the map as they where, not only slower than tiles, but sometimes even taking minutes to complete.
A histogram is just a division in equal parts of a certain column and counting the number of rows associated to each bucket. To do that you need to know the min and max values (to know the bucket width) and then generate the buckets.
To calculate the min and max values we decided to use quantiles to get rid of possible odd values in the data, and since we needed some extra information we used a query like this:
SELECT
quantiles(0.01, 0.99)(assumeNotNull(acc)) AS min_max_val,
avg(acc) AS avg_val,
count(1) * 10 AS total_rows,
countIf(isNull(acc)) AS nulls_count
FROM table
--min_max_val---avg_val---total_rows---nulls_count--
| [0,1449.36] | 78.31 | 3871776840 | 0 |
----------------------------------------------------
1 rows in set. Elapsed: 3.024 sec. Processed 387.18 million rows, 3.48 GB (128.05 million rows/s., 1.15 GB/s.)
The first thought to optimise this query is to use sampling to cut the read rows, but it is simpler and faster if you store the return values you really need (min and max) and calculate the others (total and average) from the following query. This way, except for the first time, we don’t have to make this query and we can simply reuse the min_max_val we have stored.
Once you have these values, you can request the actual buckets:
SELECT
greatest(1, least(floor((10 * (acc - 0)) / (1449.3600000000006 - 0)) + 1, 10)) - 1 AS bin,
count(*) * 10 AS freq
FROM table
WHERE (((wm_x >= -9776066.60659)
AND (wm_x <= -9733567.61886))
AND ((wm_y >= 5097050.35714)
AND (wm_y <= 5132517.13826)))
AND isNotNull(acc)
GROUP BY bin
ORDER BY bin ASC
10 rows in set. Elapsed: 2.384 sec. Processed 387.18 million rows, 3.86 GB (162.41 million rows/s., 1.62 GB/s.)
With a 10% sampling the query is ~25% faster (1.716 sec, 283.64M rows) but we can be even faster if we do some of the math beforehand and, using the fact that quadkeys are sampled and in the index keys, we can add a prefilter by the quadkeys that intersect with our coordinates (OR conditions):
SELECT
floor((acc * 0.0070512910914) - 0.) AS bin,
count(1) * 10 AS freq
FROM table
SAMPLE 1 / 10
WHERE
(((wm_x >= -9776066.60659) AND (wm_x <= -9733567.61886)) AND
((wm_y >= 5097050.35714) AND (wm_y <= 5132517.13826))) AND
(
((quadkey >= 912772771937779712) AND (quadkey <= 912774970961035263)) OR
((quadkey >= 912777169984290816) AND (quadkey <= 912778269495918591)) OR
((quadkey >= 912779369007546368) AND (quadkey <= 912784866565685247)) OR
((quadkey >= 912785966077313024) AND (quadkey <= 912787065588940799))
) AND (isNotNull(acc) AND ((acc >= 0) AND (acc < 1418.18)))
GROUP BY bin
ORDER BY bin ASC
10 rows in set. Elapsed: 1.088 sec. Processed 121.23 million rows, 1.90 GB (111.40 million rows/s., 1.74 GB/s.)
So, before all the changes we were at 3.0 + 2.3 and now we are just a merely above 1 second, which is ~80% less time per query (except the first one, as we don’t have the needed max and min values yet) or 400% more queries in the same time.
One of the useful widgets we have is the option to aggregate some column from the data to see what the main categories are and how many points are included. Here we initially followed this line of work:
SELECT
count(1) AS total_rows,
countIf(isNull(browser)) AS nulls_count
FROM table
WHERE (((wm_x >= -9848911.09454)
AND (wm_x <= -9678915.14364))
AND ((wm_y >= 5070068.08616)
AND (wm_y <= 5159499.40925)))
AND (((quadkey >= 912752980728479744)
AND (quadkey <= 912823349472657407))
OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983)))
--total_rows---nulls_count--
| 381047705 | 0 |
----------------------------
1 rows in set. Elapsed: 5.168 sec. Processed 387.18 million rows, 11.48 GB (74.92 million rows/s., 2.22 GB/s.)
SELECT
max(value) AS max_val,
min(value) AS min_val
FROM
(
SELECT
browser AS category,
count(1) AS value,
FROM table
WHERE (((wm_x >= -9848911.09454)
AND (wm_x <= -9678915.14364))
AND ((wm_y >= 5070068.08616)
AND (wm_y <= 5159499.40925)))
AND (((quadkey >= 912752980728479744)
AND (quadkey <= 912823349472657407))
OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983))) AND isNotNull(browser)
GROUP BY category
ORDER BY value DESC
)
----max_val-----min_val--
| 208418569 | 172629136 |
-------------------------
1 rows in set. Elapsed: 7.792 sec. Processed 387.18 million rows, 11.62 GB (49.69 million rows/s., 1.49 GB/s.)
SELECT
browser AS category,
count(1) AS value
FROM table
WHERE (((wm_x >= -9848911.09454)
AND (wm_x <= -9678915.14364))
AND ((wm_y >= 5070068.08616)
AND (wm_y <= 5159499.40925)))
AND (((quadkey >= 912752980728479744)
AND (quadkey <= 912823349472657407))
OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983)))
AND isNotNull(browser)
GROUP BY category
ORDER BY value DESC
--category-------value--
| cat1 | 208418569 |
| cat2 | 172629136 |
------------------------
2 rows in set. Elapsed: 7.837 sec. Processed 387.18 million rows, 11.62 GB (49.40 million rows/s., 1.48 GB/s.)
This procedure is not very efficient, but with small datasets (in the thousands of points) it is not that important as you end up hitting caches and the calculations are fast enough. As you might guess, the bigger the database, the bigger the problem, so now we have a 1000x bigger problem. Easy enough, all three queries can be simplified to just the last one, use the first row as the max and the last row as the min value and adding everything for the full count. From 5.1 + 7.8 + 7.8 to just 7.8 seconds: 60% less time.
Although this was an important issue, it wasn’t the most noticeable one as, even after reduction, there were some moments where the connection with the database would slow to a crawl. Here is way: Suppose you have a table with 400 million rows and you decide to count the different values of a column that are present in an average of 500 rows each; that gives us 800,000 different categories. Now, let’s suppose that to send this output you use JSON and that, each generated output row is about 50 Bytes (some kind of id plus the count, both in UTF-8), so the full output is 40 MB of data that needs to be transmited, out of which you are going to discard 99% of it as you only show the top 6.
While reading the values from the column and choosing the top categories is not avoidable, we can make sure we return just what we really need with WITH TOTALS
or LIMIT
the query to a reasonable amount of categories:
SELECT
id AS category,
count(1) * 2. AS value
FROM table
SAMPLE 5 / 10
WHERE (((wm_x >= -9848911.09454)
AND (wm_x <= -9678915.14364))
AND ((wm_y >= 5070068.08616)
AND (wm_y <= 5159499.40925)))
AND (((quadkey >= 912752980728479744)
AND (quadkey <= 912823349472657407))
OR ((quadkey >= 912964086961012736) AND (quadkey <= 912981679147057151))
OR ((quadkey >= 912999271333101568) AND (quadkey <= 913016863519145983)))
GROUP BY category
WITH TOTALS
ORDER BY value DESC
LIMIT 6
--category---------------------------------value--
| 00000000-0000-0000-0000-000000000000 | 2989142 |
| 00000000-0000-0000-0000-000000000001 | 2441722 |
| 00000000-0000-0000-0000-000000000002 | 2258138 |
| 00000000-0000-0000-0000-000000000003 | 1326528 |
| 00000000-0000-0000-0000-000000000004 | 1209742 |
| 00000000-0000-0000-0000-000000000005 | 1138014 |
--------------------------------------------------
Totals:
--category-------value--
| \N | 333010618 |
-----------┴------------
6 rows in set. Elapsed: 12.023 sec. Processed 307.31 million rows, 18.74 GB (25.56 million rows/s., 1.56 GB/s.)
In this kind of cases we are now over 90% faster, since we went from over 2 minutes and a half to 12 seconds, which is still high but considering we are using a standalone server for these tests, we’re happy with the initial results.
If you liked this post you might be interested in joining us.
Here at CARTO we are constantly updating our platform so that our customers gain in system performance, reliability and security.
Core TechOne of my favorite parts about working at CARTO is getting to test out the new features that our product team is constantly shipping. A month or so ago, it was our Kepler c...
Core TechOur primary goal at CARTO is to be the world’s leading Location Intelligence platform, empowering our clients with the best data and the best spatial analysis. We frequentl...
Core TechPlease fill out the below form and we'll be in touch real soon.