A SQL approach to graph coloring applied to maps

Summary

How to solve the graph coloring problem implementing several map coloring algorithms in PostGIS

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
A SQL approach to graph coloring applied to maps

At CARTO we challenge ourselves to use our platform as our users do. For us  this serves several purposes:

  • Because we care a lot about our users experience  this way we understand better the pains and gains of using our platform.
  • We still keep learning a lot: from SQL to React  going through WebGL to projections  spatial algorithms and mapping in general.

Since use cases at CARTO go from very simple visualizations to complex geospatial solutions  one of the challenges I set for myself was to solve the graph coloring problem with CARTO.

 Graph coloring is a technique to assign colors to the vertices of a graph such that no two adjacent vertices share the same color.

But what does graph coloring have to do with maps?

 Map coloring is an application of graph coloring so each two adjacent polygons (countries  provinces  etc.) are assigned different colors.

Map coloring helps to better understand maps and solve other kind of problems like mobile radio frequency assignment or other scheduling problems.

Having said that  graph coloring is a very interesting topic covered by several theorems and algorithms  like the 4-color theorem which basically states that any map can be colored using 4 colors.

Coloring the world map in 4 colors

 Let's put ourselves in the boots of a CARTO user that wants to draw the world map in 4 colors. We don't have much programming skills  but we know a bit of SQL and spatial concepts.

For this case we'll work with the world_borders layer that can be imported into any CARTO account from our Data Library.

Modelling a graph in PostGIS

The graph coloring problem needs two mathematical artifacts to be solved: a model and an algorithm.

So we have to model a PostGIS table into a graph. This may sound like complex stuff but it can be actually solved in less than 10 lines of SQL by creating an adjacency list:

{% highlight sql %}SELECT DISTINCT a.name                 array_agg(b.name) over (PARTITION BY a.cartodb_id) AS adjacent                 count(b.*) over (PARTITION BY a.cartodb_id) AS valence                 0 AS colorFROM world_borders a      world_borders bWHERE ST_INTERSECTS(a.the_geom  b.the_geom)  AND a.cartodb_id != b.cartodb_idORDER BY valence DESC          a.name ASC{% endhighlight %}

With this query we obtain the world map adjacency list having for each country  the list of adjacent countries and its valence (the number of adjacent countries).

Learning point: Note the use of PostgreSQL Window functions to aggregate and count the adjacent countries in a single column. Window functions are a really handy resource to have in your SQL tool box.

We can generalize this query by wrapping it as a PostgreSQL function so that it can be executed for any of our datasets and store the results in a table:

{% highlight sql %}CREATE OR REPLACE FUNCTION adjacency_list(table_name regclass  user_name text) RETURNS void AS $$BEGIN    EXECUTE format('DROP TABLE IF EXISTS %s_adjacency_list;                    CREATE TABLE %s_adjacency_list AS                    SELECT DISTINCT a.cartodb_id                     array_agg(b.cartodb_id) over (PARTITION BY a.cartodb_id) AS adjacent                     count(b.*) over (PARTITION BY a.cartodb_id) AS valence                     0 AS color                    FROM %s a                          %s b                    WHERE ST_INTERSECTS(a.the_geom  b.the_geom)                    ORDER BY valence DESC                              a.cartodb_id ASC;                    SELECT CDB_CartoDBFyTable(''' || user_name || ''' ''' || table_name || '_adjacency_list'');'  table_name  table_name  table_name  table_name);END$$ LANGUAGE plpgsql;{% endhighlight %}

Learning point: By wrapping a query into a function we are able to re-use it and even provide of a geospatial framework to the users in our CARTO organization.

Note as well the use of the EXECUTE and format functions to avoid missusing of the function or SQL injection issues.

The Welsh-Powell algorithm: a greedy coloring approach

Let's start by implementing the most simpler algorithm for graph coloring  the Welsh-Powell one. This algorithm is as follows:

  • Find the adjacency list and valence for each vertex (in this case for each country)
  • List the vertices in order of descending valence
  • Color the first vertex in the list with color 1
  • Go down the list and color every vertex not connected to the colored vertices above the same color. Then cross out all colored vertices in the list.
  • Repeat on the uncolored vertices with a new color  always working in descending order of valence until all the vertices have been colored.

Let's see how we can implement the Welsh-Powell algorithm as a PostGIS function.

{% highlight sql %}CREATE OR REPLACE FUNCTION greedy(table_name regclass  user_name text) RETURNS void AS $$    DECLARE        item RECORD;        current_color int = 1;        cc int;        guard int;    BEGIN        PERFORM adjacency_list(table_name  user_name);        EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0'  table_name) INTO guard;        WHILE guard > 0 LOOP            EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = (SELECT cartodb_id FROM %s_adjacency_list WHERE color = 0 ORDER BY cartodb_id ASC LIMIT 1);'  table_name  table_name) USING current_color;            FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE color = 0'  table_name) LOOP                EXECUTE format('SELECT COUNT() FROM %s_adjacency_list WHERE color = $1 AND $2 = ANY(adjacent)'  table_name) INTO cc USING current_color  item.cartodb_id;                IF cc = 0 THEN                    EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2;'  table_name) USING current_color  item.cartodb_id;                END IF;            END loop;            current_color = current_color + 1;            EXECUTE format('SELECT COUNT() FROM %s_adjacency_list WHERE color = 0'  table_name) into guard;        END loop;    END;$$ LANGUAGE plpgsql;{% endhighlight %}

A deep view on the Welsh-Powell algorithm implementation

In this case we have two different sections in our function. First we DECLARE temporary variables needed to store results and second we have the actual algorithm implementation between a BEGIN and END clause.

{% highlight sql %}DECLARE    item RECORD;    current_color int = 1;    cc int;    guard int;{% endhighlight %}

Since we need to model our dataset as an adjacency list we start by calling our adjacency_list function:

{% highlight sql %}PERFORM adjacency_list(table_name  user_name);{% endhighlight %}

Then we need to know the number of rows in the dataset and start an iterative algorithm:

{% highlight sql %}EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0'  table_name) INTO guard;        WHILE guard > 0 LOOP{% endhighlight %}

Let's color the first vertex in the list with color 1

{% highlight sql %}EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = (SELECT cartodb_id FROM %s_adjacency_list WHERE color = 0 ORDER BY cartodb_id ASC LIMIT 1);'  table_name  table_name) USING current_color;{% endhighlight %}

Go down the list and color every vertex not connected to the colored vertices above the same color

{% highlight sql %}FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE color = 0'  table_name) LOOP    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = $1 AND $2 = ANY(adjacent)'  table_name) INTO cc USING current_color  item.cartodb_id;    IF cc = 0 THEN        EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2;'  table_name) USING current_color  item.cartodb_id;    END IF;END loop;{% endhighlight %}

Finally  repeat on the uncolored vertices with a new color  always working in descending order of valence until all the vertices have been colored.

{% highlight sql %}current_color = current_color + 1;EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0'  table_name) into guard;{% endhighlight %}

Now we have two functions that can be stored in our CARTO account by running them into the SQL console in our BUILDER dashboard  but how do we run this map coloring algorithm?

Best option here is using our batch SQL API  this allows us to run any SQL that could take several seconds or minutes safely. In this case we just have to do a SELECT to our greedy function passing the table_name and our user_name. We can do this directly from a terminal:

curl -X POST -H "Content-Type: application/json" -d "{
  \"query\": \"SELECT greedy('world_borders'  'aromeu')\"
}" https://aromeu.carto.com/api/v2/sql/job?api_key={api_key}

The resulting table world_borders_adjacency_list contains a color assigned for each cartodb_id  now we just can join this table with the original world_borders table and apply a category thematic to visualize the result (plus a bit of CartoCSS magic):


In this case we have colored every adjacent country with a different color  in a total of 5 colors  but can we do it better?

Note: for the sake of simplicity in the examples above and below  we are not working with multipolygons; so that  countries composed of several polygons are assigned different colors to each polygon.

The Kempe's graph coloring algorithm

In 1879  Alfred B. Kempe tried to prove the 4-color theorem and while years later it was demonstrated that it didn't solved the problem for all cases  the algorithm he designed still can be used to color the world map using just 4 colors.

The Kempe's graph color algorithm is as follows:

  • Convert the map to a graph (in this case an adjacency list)
  • Choose a vertex (polygon) with less than five neighbors and remove it from the graph. This may cause some vertices that previously had five or more neighbors to now have less than five.
  • Choose another vertex from the updated graph with less than five neighbors and remove it.
  • Continue until you've removed all the vertices from the graph.
  • Add the nodes back the graph in reverse order from which you removed them.
  • Color the added node with a color that is not used by any of its current neighbors.
  • Continue until you've colored in the entire graph.

This is a little bit more complex algorithm that still can be solved using a pure PostgreSQL function:

{% highlight sql %}CREATE OR REPLACE FUNCTION kempe(table_name regclass  user_name text) RETURNS void AS $$DECLARE    item RECORD;    cc int;    count int = 1;    d int;    temp_cartodb_id int;    guard int;BEGIN    PERFORM adjacency_list(table_name  user_name);    EXECUTE format('DROP TABLE IF EXISTS %s_temp'  table_name);    EXECUTE format('CREATE TABLE %s_temp (cartodb_id integer  ord integer);'  table_name);    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE valence < 5 AND valence > 0'  table_name) INTO guard;    WHILE guard > 0 LOOP        EXECUTE format('SELECT cartodb_id FROM %s_adjacency_list WHERE valence < 5 AND valence > 0 ORDER BY valence DESC LIMIT 1'  table_name) into temp_cartodb_id;        EXECUTE format('INSERT INTO %s_temp (cartodb_id  ord) VALUES ($1  $2)'  table_name) USING temp_cartodb_id  count;        count = count + 1;        EXECUTE format('UPDATE %s_adjacency_list SET valence = 0 WHERE cartodb_id = $1'  table_name) USING temp_cartodb_id;        FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE $1 = ANY(adjacent)'  table_name) USING temp_cartodb_id LOOP            EXECUTE format('UPDATE %s_adjacency_list SET valence = valence - 1 WHERE cartodb_id = $1'  table_name) USING item.cartodb_id;        END LOOP;    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE valence < 5 AND valence > 0'  table_name) INTO guard;    END LOOP;

EXECUTE format('UPDATE %s_adjacency_list set valence = array_length(adjacent  1)'  table_name);
EXECUTE format('UPDATE %s_adjacency_list set color = 0'  table_name);

FOR item IN EXECUTE format('SELECT * FROM %s_temp ORDER BY ord DESC'  table_name) LOOP
    EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color != 0 AND $1 = ANY(adjacent)'  table_name) INTO cc USING item.cartodb_id;
        IF cc = 0 THEN
            EXECUTE format('UPDATE %s_adjacency_list SET color = 1 WHERE cartodb_id = $1'  table_name) USING item.cartodb_id;
        ELSE
            EXECUTE format('select (array_agg(elements))[1]
                            from (
                              select unnest(array[1  2  3  4  5])
                              except
                              SELECT unnest(ARRAY_AGG(color)) FROM %s_adjacency_list WHERE color != 0 AND $1 = ANY(adjacent)
                            ) t (elements)'  table_name) INTO d USING item.cartodb_id;
            EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2'  table_name) USING d  item.cartodb_id;
        END IF;
END LOOP;
EXECUTE format('DROP TABLE IF EXISTS %s_temp'  table_name);

END;$$ LANGUAGE plpgsql;{% endhighlight %}

Again we can create the function from the SQL console  execute it for the world_borders dataset using the batch SQL API and then map it with BUILDER. Let's see the result:


In this case we have colored the world map in 4 colors  challenge accomplished!

Learning point: We have not only learned how to solve the graph coloring problem with CARTO but we have ended up creating the basis for a geospatial framework by creating PostgreSQL functions into our CARTO account.

More map coloring

So  let's finish by applying these map coloring functions that now are part of our own geospatial framework inside CARTO to some other of our datasets.

The 4 color theorem applied to the US states dataset


A greedy approach to map color the US counties dataset


Another greedy example with the Spain municipalities


Note that the map coloring algorithm implementations presented in this blog post are totally naive and don't pretend to be exact or to be used under a production environment  they just pretend to showcase a user workflow to solve a geospatial problem with CARTO.

For reference  all these functions are available here. Feel free to add any comment or improve them.

If you like the kind of stuff we are involved in you may want to join us :)