Manually Batching your SQL Jobs
Are your SQL UPDATE
queries producing time out errors, or taking a very long time to finish? You can improve performance by executing your query in batches! For the purpose of this article, we will use geocoding as an example, but this method applies to all SQL UPDATE
queries.
Using the SQL Pane from a dataset
Let’s say you have a dataset of 500,000 points, and they all need to be geocoded. The SQL for that, typically, would look something like this:
UPDATE {table}
SET the_geom = cdb_geocode_street_point({streetAddress}, {city}, {state}, {country})
The problem with the query above is that it may quickly hit your account query time limit. Instead of running this in a single shot you can query your database in batches using a WHERE
clause. We can use the modulo (remainder) operator to break the dataset into chunks, then run the UPDATE
query sequentially on each chunk. If we wanted to break our dataset into 5 chunks, the queries would look something like this:
UPDATE {table}
SET the_geom = cdb_geocode_street_point({streetAddress}, {city}, {state}, {country})
WHERE the_geom IS NULL AND cartodb_id % 5 = 0;
UPDATE {table}
SET the_geom = cdb_geocode_street_point({streetAddress}, {city}, {state}, {country})
WHERE the_geom IS NULL AND cartodb_id % 5 = 1;
UPDATE {table}
SET the_geom = cdb_geocode_street_point({streetAddress}, {city}, {state}, {country})
WHERE the_geom IS NULL AND cartodb_id % 5 = 2;
UPDATE {table}
SET the_geom = cdb_geocode_street_point({streetAddress}, {city}, {state}, {country})
WHERE the_geom IS NULL AND cartodb_id % 5 = 3;
UPDATE {table}
SET the_geom = cdb_geocode_street_point({streetAddress}, {city}, {state}, {country})
WHERE the_geom IS NULL AND cartodb_id % 5 = 4;
In this example, we included WHERE the_geom IS NULL
in order to skip over any geometries that may have already been geocoded in the dataset.
If you do this from a SQL pane in your account, you will need to send these queries one at a time to avoid encountering a SQL API timeout.
Using the Batch SQL API
You can couple this manual batching with CARTO’s Batch SQL API in order to further automate this process. For easy access to this, check out this tool, built in a python notebook using our CARTOframes python package and our python SDK. To use it, select File > Open in Playground Mode
from the menu.
In order to use any CARTO APIs, you will need an Enterprise, Engine enabled account. You can read more on CARTO API key access here.