At CARTO we challenge ourselves to use our platform as our users do. For us, this serves several purposes:

  • Because we care a lot about our users experience, this way we understand better the pains and gains of using our platform.
  • We still keep learning a lot: from SQL to React, going through WebGL to projections, spatial algorithms and mapping in general.

Since use cases at CARTO go from very simple visualizations to complex geospatial solutions, one of the challenges I set for myself was to solve the graph coloring problem with CARTO.

Graph coloring is a technique to assign colors to the vertices of a graph such that no two adjacent vertices share the same color.

But what does graph coloring have to do with maps?

Map coloring is an application of graph coloring so each two adjacent polygons (countries, provinces, etc.) are assigned different colors.

Map coloring helps to better understand maps and solve other kind of problems like mobile radio frequency assignment or other scheduling problems.

Having said that, graph coloring is a very interesting topic covered by several theorems and algorithms, like the 4-color theorem which basically states that any map can be colored using 4 colors.

Coloring the world map in 4 colors

Let’s put ourselves in the boots of a CARTO user that wants to draw the world map in 4 colors. We don’t have much programming skills, but we know a bit of SQL and spatial concepts.

For this case we’ll work with the world_borders layer that can be imported into any CARTO account from our Data Library.

Modelling a graph in PostGIS

The graph coloring problem needs two mathematical artifacts to be solved: a model and an algorithm.

So we have to model a PostGIS table into a graph. This may sound like complex stuff but it can be actually solved in less than 10 lines of SQL by creating an adjacency list:

SELECT DISTINCT a.name,
                array_agg(b.name) over (PARTITION BY a.cartodb_id) AS adjacent,
                count(b.*) over (PARTITION BY a.cartodb_id) AS valence,
                0 AS color
FROM world_borders a,
     world_borders b
WHERE ST_INTERSECTS(a.the_geom, b.the_geom)
  AND a.cartodb_id != b.cartodb_id
ORDER BY valence DESC,
         a.name ASC

With this query we obtain the world map adjacency list having for each country, the list of adjacent countries and its valence (the number of adjacent countries).

Learning point: Note the use of PostgreSQL Window functions to aggregate and count the adjacent countries in a single column. Window functions are a really handy resource to have in your SQL tool box.

We can generalize this query by wrapping it as a PostgreSQL function so that it can be executed for any of our datasets and store the results in a table:

CREATE OR REPLACE FUNCTION adjacency_list(table_name regclass, user_name text) RETURNS void AS $$
BEGIN
    EXECUTE format('DROP TABLE IF EXISTS %s_adjacency_list;
                    CREATE TABLE %s_adjacency_list AS
                    SELECT DISTINCT a.cartodb_id,
                    array_agg(b.cartodb_id) over (PARTITION BY a.cartodb_id) AS adjacent,
                    count(b.*) over (PARTITION BY a.cartodb_id) AS valence,
                    0 AS color
                    FROM %s a,
                         %s b
                    WHERE ST_INTERSECTS(a.the_geom, b.the_geom)
                    ORDER BY valence DESC,
                             a.cartodb_id ASC;
                    SELECT CDB_CartoDBFyTable(''' || user_name || ''',''' || table_name || '_adjacency_list'');', table_name, table_name, table_name, table_name);
END
$$ LANGUAGE plpgsql;

Learning point: By wrapping a query into a function we are able to re-use it and even provide of a geospatial framework to the users in our CARTO organization.

Note as well the use of the EXECUTE and format functions to avoid missusing of the function or SQL injection issues.

The Welsh-Powell algorithm: a greedy coloring approach

Let’s start by implementing the most simpler algorithm for graph coloring, the Welsh-Powell one. This algorithm is as follows:

  • Find the adjacency list and valence for each vertex (in this case for each country)
  • List the vertices in order of descending valence
  • Color the first vertex in the list with color 1
  • Go down the list and color every vertex not connected to the colored vertices above the same color. Then cross out all colored vertices in the list.
  • Repeat on the uncolored vertices with a new color, always working in descending order of valence until all the vertices have been colored.

Let’s see how we can implement the Welsh-Powell algorithm as a PostGIS function.

CREATE OR REPLACE FUNCTION greedy(table_name regclass, user_name text) RETURNS void AS $$
    DECLARE
        item RECORD;
        current_color int = 1;
        cc int;
        guard int;
    BEGIN
        PERFORM adjacency_list(table_name, user_name);
        EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0', table_name) INTO guard;
        WHILE guard > 0 LOOP
            EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = (SELECT cartodb_id FROM %s_adjacency_list WHERE color = 0 ORDER BY cartodb_id ASC LIMIT 1);', table_name, table_name) USING current_color;
            FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE color = 0', table_name) LOOP
                EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = $1 AND $2 = ANY(adjacent)', table_name) INTO cc USING current_color, item.cartodb_id;
                IF cc = 0 THEN
                    EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2;', table_name) USING current_color, item.cartodb_id;
                END IF;
            END loop;
            current_color = current_color + 1;
            EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0', table_name) into guard;
        END loop;
    END;
$$ LANGUAGE plpgsql;

A deep view on the Welsh-Powell algorithm implementation

In this case we have two different sections in our function. First we DECLARE temporary variables needed to store results and second we have the actual algorithm implementation between a BEGIN and END clause.

DECLARE
    item RECORD;
    current_color int = 1;
    cc int;
    guard int;

Since we need to model our dataset as an adjacency list we start by calling our adjacency_list function:

PERFORM adjacency_list(table_name, user_name);

Then we need to know the number of rows in the dataset and start an iterative algorithm:

EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0', table_name) INTO guard;
        WHILE guard > 0 LOOP

Let’s color the first vertex in the list with color 1

EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = (SELECT cartodb_id FROM %s_adjacency_list WHERE color = 0 ORDER BY cartodb_id ASC LIMIT 1);', table_name, table_name) USING current_color;

Go down the list and color every vertex not connected to the colored vertices above the same color

FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE color = 0', table_name) LOOP
    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = $1 AND $2 = ANY(adjacent)', table_name) INTO cc USING current_color, item.cartodb_id;
    IF cc = 0 THEN
        EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2;', table_name) USING current_color, item.cartodb_id;
    END IF;
END loop;

Finally, repeat on the uncolored vertices with a new color, always working in descending order of valence until all the vertices have been colored.

current_color = current_color + 1;
EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0', table_name) into guard;

Now we have two functions that can be stored in our CARTO account by running them into the SQL console in our BUILDER dashboard, but how do we run this map coloring algorithm?

Best option here is using our batch SQL API, this allows us to run any SQL that could take several seconds or minutes safely. In this case we just have to do a SELECT to our greedy function passing the table_name and our user_name. We can do this directly from a terminal:

curl -X POST -H "Content-Type: application/json" -d "{
  \"query\": \"SELECT greedy('world_borders', 'aromeu')\"
}" https://aromeu.carto.com/api/v2/sql/job?api_key={api_key}

The resulting table world_borders_adjacency_list contains a color assigned for each cartodb_id, now we just can join this table with the original world_borders table and apply a category thematic to visualize the result (plus a bit of CartoCSS magic):


In this case we have colored every adjacent country with a different color, in a total of 5 colors, but can we do it better?

Note: for the sake of simplicity in the examples above and below, we are not working with multipolygons; so that, countries composed of several polygons are assigned different colors to each polygon.

The Kempe’s graph coloring algorithm

In 1879, Alfred B. Kempe tried to prove the 4-color theorem and while years later it was demonstrated that it didn’t solved the problem for all cases, the algorithm he designed still can be used to color the world map using just 4 colors.

The Kempe’s graph color algorithm is as follows:

  • Convert the map to a graph (in this case an adjacency list)
  • Choose a vertex (polygon) with less than five neighbors and remove it from the graph. This may cause some vertices that previously had five or more neighbors to now have less than five.
  • Choose another vertex from the updated graph with less than five neighbors and remove it.
  • Continue until you’ve removed all the vertices from the graph.
  • Add the nodes back the graph in reverse order from which you removed them.
  • Color the added node with a color that is not used by any of its current neighbors.
  • Continue until you’ve colored in the entire graph.

This is a little bit more complex algorithm that still can be solved using a pure PostgreSQL function:

CREATE OR REPLACE FUNCTION kempe(table_name regclass, user_name text) RETURNS void AS $$
DECLARE
    item RECORD;
    cc int;
    count int = 1;
    d int;
    temp_cartodb_id int;
    guard int;
BEGIN
    PERFORM adjacency_list(table_name, user_name);
    EXECUTE format('DROP TABLE IF EXISTS %s_temp', table_name);
    EXECUTE format('CREATE TABLE %s_temp (cartodb_id integer, ord integer);', table_name);
    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE valence < 5 AND valence > 0', table_name) INTO guard;
    WHILE guard > 0 LOOP
        EXECUTE format('SELECT cartodb_id FROM %s_adjacency_list WHERE valence < 5 AND valence > 0 ORDER BY valence DESC LIMIT 1', table_name) into temp_cartodb_id;
        EXECUTE format('INSERT INTO %s_temp (cartodb_id, ord) VALUES ($1, $2)', table_name) USING temp_cartodb_id, count;
        count = count + 1;
        EXECUTE format('UPDATE %s_adjacency_list SET valence = 0 WHERE cartodb_id = $1', table_name) USING temp_cartodb_id;
        FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE $1 = ANY(adjacent)', table_name) USING temp_cartodb_id LOOP
            EXECUTE format('UPDATE %s_adjacency_list SET valence = valence - 1 WHERE cartodb_id = $1', table_name) USING item.cartodb_id;
        END LOOP;
    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE valence < 5 AND valence > 0', table_name) INTO guard;
    END LOOP;

    EXECUTE format('UPDATE %s_adjacency_list set valence = array_length(adjacent, 1)', table_name);
    EXECUTE format('UPDATE %s_adjacency_list set color = 0', table_name);

    FOR item IN EXECUTE format('SELECT * FROM %s_temp ORDER BY ord DESC', table_name) LOOP
        EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color != 0 AND $1 = ANY(adjacent)', table_name) INTO cc USING item.cartodb_id;
            IF cc = 0 THEN
                EXECUTE format('UPDATE %s_adjacency_list SET color = 1 WHERE cartodb_id = $1', table_name) USING item.cartodb_id;
            ELSE
                EXECUTE format('select (array_agg(elements))[1]
                                from (
                                  select unnest(array[1, 2, 3, 4, 5])
                                  except
                                  SELECT unnest(ARRAY_AGG(color)) FROM %s_adjacency_list WHERE color != 0 AND $1 = ANY(adjacent)
                                ) t (elements)', table_name) INTO d USING item.cartodb_id;
                EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2', table_name) USING d, item.cartodb_id;
            END IF;
    END LOOP;
    EXECUTE format('DROP TABLE IF EXISTS %s_temp', table_name);
END;
$$ LANGUAGE plpgsql;

Again we can create the function from the SQL console, execute it for the world_borders dataset using the batch SQL API and then map it with BUILDER. Let’s see the result:


In this case we have colored the world map in 4 colors, challenge accomplished!

Learning point: We have not only learned how to solve the graph coloring problem with CARTO but we have ended up creating the basis for a geospatial framework by creating PostgreSQL functions into our CARTO account.

More map coloring

So, let’s finish by applying these map coloring functions that now are part of our own geospatial framework inside CARTO to some other of our datasets.

The 4 color theorem applied to the US states dataset


A greedy approach to map color the US counties dataset


Another greedy example with the Spain municipalities


Note that the map coloring algorithm implementations presented in this blog post are totally naive and don’t pretend to be exact or to be used under a production environment, they just pretend to showcase a user workflow to solve a geospatial problem with CARTO.

For reference, all these functions are available here. Feel free to add any comment or improve them.

If you like the kind of stuff we are involved in you may want to join us :)