A SQL approach to graph coloring applied to maps
At CARTO we challenge ourselves to use our platform as our users do. For us this serves several purposes:
- Because we care a lot about our users experience this way we understand better the pains and gains of using our platform.
- We still keep learning a lot: from SQL to React going through WebGL to projections spatial algorithms and mapping in general.
Since use cases at CARTO go from very simple visualizations to complex geospatial solutions one of the challenges I set for myself was to solve the graph coloring problem with CARTO.
Graph coloring is a technique to assign colors to the vertices of a graph such that no two adjacent vertices share the same color.
But what does graph coloring have to do with maps?
Map coloring is an application of graph coloring so each two adjacent polygons (countries provinces etc.) are assigned different colors.
Map coloring helps to better understand maps and solve other kind of problems like mobile radio frequency assignment or other scheduling problems.
Having said that graph coloring is a very interesting topic covered by several theorems and algorithms like the 4-color theorem which basically states that any map can be colored using 4 colors.
Coloring the world map in 4 colors
Let's put ourselves in the boots of a CARTO user that wants to draw the world map in 4 colors. We don't have much programming skills but we know a bit of SQL and spatial concepts.
For this case we'll work with the world_borders layer that can be imported into any CARTO account from our Data Library.
Modelling a graph in PostGIS
The graph coloring problem needs two mathematical artifacts to be solved: a model and an algorithm.
So we have to model a PostGIS table into a graph. This may sound like complex stuff but it can be actually solved in less than 10 lines of SQL by creating an adjacency list:
{% highlight sql %}SELECT DISTINCT a.name array_agg(b.name) over (PARTITION BY a.cartodb_id) AS adjacent count(b.*) over (PARTITION BY a.cartodb_id) AS valence 0 AS colorFROM world_borders a world_borders bWHERE ST_INTERSECTS(a.the_geom b.the_geom) AND a.cartodb_id != b.cartodb_idORDER BY valence DESC a.name ASC{% endhighlight %}
With this query we obtain the world map adjacency list having for each country the list of adjacent countries and its valence (the number of adjacent countries).
Learning point: Note the use of PostgreSQL Window functions to aggregate and count the adjacent countries in a single column. Window functions are a really handy resource to have in your SQL tool box.
We can generalize this query by wrapping it as a PostgreSQL function so that it can be executed for any of our datasets and store the results in a table:
{% highlight sql %}CREATE OR REPLACE FUNCTION adjacency_list(table_name regclass user_name text) RETURNS void AS $$BEGIN EXECUTE format('DROP TABLE IF EXISTS %s_adjacency_list; CREATE TABLE %s_adjacency_list AS SELECT DISTINCT a.cartodb_id array_agg(b.cartodb_id) over (PARTITION BY a.cartodb_id) AS adjacent count(b.*) over (PARTITION BY a.cartodb_id) AS valence 0 AS color FROM %s a %s b WHERE ST_INTERSECTS(a.the_geom b.the_geom) ORDER BY valence DESC a.cartodb_id ASC; SELECT CDB_CartoDBFyTable(''' || user_name || ''' ''' || table_name || '_adjacency_list'');' table_name table_name table_name table_name);END$$ LANGUAGE plpgsql;{% endhighlight %}
Learning point: By wrapping a query into a function we are able to re-use it and even provide of a geospatial framework to the users in our CARTO organization.
Note as well the use of the EXECUTE and format functions to avoid missusing of the function or SQL injection issues.
The Welsh-Powell algorithm: a greedy coloring approach
Let's start by implementing the most simpler algorithm for graph coloring the Welsh-Powell one. This algorithm is as follows:
- Find the adjacency list and valence for each vertex (in this case for each country)
- List the vertices in order of descending valence
- Color the first vertex in the list with color 1
- Go down the list and color every vertex not connected to the colored vertices above the same color. Then cross out all colored vertices in the list.
- Repeat on the uncolored vertices with a new color always working in descending order of valence until all the vertices have been colored.
Let's see how we can implement the Welsh-Powell algorithm as a PostGIS function.
{% highlight sql %}CREATE OR REPLACE FUNCTION greedy(table_name regclass user_name text) RETURNS void AS $$ DECLARE item RECORD; current_color int = 1; cc int; guard int; BEGIN PERFORM adjacency_list(table_name user_name); EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0' table_name) INTO guard; WHILE guard > 0 LOOP EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = (SELECT cartodb_id FROM %s_adjacency_list WHERE color = 0 ORDER BY cartodb_id ASC LIMIT 1);' table_name table_name) USING current_color; FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE color = 0' table_name) LOOP EXECUTE format('SELECT COUNT() FROM %s_adjacency_list WHERE color = $1 AND $2 = ANY(adjacent)' table_name) INTO cc USING current_color item.cartodb_id; IF cc = 0 THEN EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2;' table_name) USING current_color item.cartodb_id; END IF; END loop; current_color = current_color + 1; EXECUTE format('SELECT COUNT() FROM %s_adjacency_list WHERE color = 0' table_name) into guard; END loop; END;$$ LANGUAGE plpgsql;{% endhighlight %}
A deep view on the Welsh-Powell algorithm implementation
In this case we have two different sections in our function. First we DECLARE temporary variables needed to store results and second we have the actual algorithm implementation between a BEGIN and END clause.
{% highlight sql %}DECLARE item RECORD; current_color int = 1; cc int; guard int;{% endhighlight %}
Since we need to model our dataset as an adjacency list we start by calling our adjacency_list function:
{% highlight sql %}PERFORM adjacency_list(table_name user_name);{% endhighlight %}
Then we need to know the number of rows in the dataset and start an iterative algorithm:
{% highlight sql %}EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0' table_name) INTO guard; WHILE guard > 0 LOOP{% endhighlight %}
Let's color the first vertex in the list with color 1
{% highlight sql %}EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = (SELECT cartodb_id FROM %s_adjacency_list WHERE color = 0 ORDER BY cartodb_id ASC LIMIT 1);' table_name table_name) USING current_color;{% endhighlight %}
Go down the list and color every vertex not connected to the colored vertices above the same color
{% highlight sql %}FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE color = 0' table_name) LOOP EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = $1 AND $2 = ANY(adjacent)' table_name) INTO cc USING current_color item.cartodb_id; IF cc = 0 THEN EXECUTE format('UPDATE %s_adjacency_list SET color = $1 WHERE cartodb_id = $2;' table_name) USING current_color item.cartodb_id; END IF;END loop;{% endhighlight %}
Finally repeat on the uncolored vertices with a new color always working in descending order of valence until all the vertices have been colored.
{% highlight sql %}current_color = current_color + 1;EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE color = 0' table_name) into guard;{% endhighlight %}
Now we have two functions that can be stored in our CARTO account by running them into the SQL console in our BUILDER dashboard but how do we run this map coloring algorithm?
Best option here is using our batch SQL API this allows us to run any SQL that could take several seconds or minutes safely. In this case we just have to do a SELECT to our greedy function passing the table_name and our user_name. We can do this directly from a terminal:
The resulting table world_borders_adjacency_list contains a color assigned for each cartodb_id now we just can join this table with the original world_borders table and apply a category thematic to visualize the result (plus a bit of CartoCSS magic):
In this case we have colored every adjacent country with a different color in a total of 5 colors but can we do it better?
Note: for the sake of simplicity in the examples above and below we are not working with multipolygons; so that countries composed of several polygons are assigned different colors to each polygon.
The Kempe's graph coloring algorithm
In 1879 Alfred B. Kempe tried to prove the 4-color theorem and while years later it was demonstrated that it didn't solved the problem for all cases the algorithm he designed still can be used to color the world map using just 4 colors.
The Kempe's graph color algorithm is as follows:
- Convert the map to a graph (in this case an adjacency list)
- Choose a vertex (polygon) with less than five neighbors and remove it from the graph. This may cause some vertices that previously had five or more neighbors to now have less than five.
- Choose another vertex from the updated graph with less than five neighbors and remove it.
- Continue until you've removed all the vertices from the graph.
- Add the nodes back the graph in reverse order from which you removed them.
- Color the added node with a color that is not used by any of its current neighbors.
- Continue until you've colored in the entire graph.
This is a little bit more complex algorithm that still can be solved using a pure PostgreSQL function:
{% highlight sql %}CREATE OR REPLACE FUNCTION kempe(table_name regclass user_name text) RETURNS void AS $$DECLARE item RECORD; cc int; count int = 1; d int; temp_cartodb_id int; guard int;BEGIN PERFORM adjacency_list(table_name user_name); EXECUTE format('DROP TABLE IF EXISTS %s_temp' table_name); EXECUTE format('CREATE TABLE %s_temp (cartodb_id integer ord integer);' table_name); EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE valence < 5 AND valence > 0' table_name) INTO guard; WHILE guard > 0 LOOP EXECUTE format('SELECT cartodb_id FROM %s_adjacency_list WHERE valence < 5 AND valence > 0 ORDER BY valence DESC LIMIT 1' table_name) into temp_cartodb_id; EXECUTE format('INSERT INTO %s_temp (cartodb_id ord) VALUES ($1 $2)' table_name) USING temp_cartodb_id count; count = count + 1; EXECUTE format('UPDATE %s_adjacency_list SET valence = 0 WHERE cartodb_id = $1' table_name) USING temp_cartodb_id; FOR item IN EXECUTE format('SELECT * FROM %s_adjacency_list WHERE $1 = ANY(adjacent)' table_name) USING temp_cartodb_id LOOP EXECUTE format('UPDATE %s_adjacency_list SET valence = valence - 1 WHERE cartodb_id = $1' table_name) USING item.cartodb_id; END LOOP; EXECUTE format('SELECT COUNT(*) FROM %s_adjacency_list WHERE valence < 5 AND valence > 0' table_name) INTO guard; END LOOP;
END;$$ LANGUAGE plpgsql;{% endhighlight %}
Again we can create the function from the SQL console execute it for the world_borders dataset using the batch SQL API and then map it with BUILDER. Let's see the result:
In this case we have colored the world map in 4 colors challenge accomplished!
Learning point: We have not only learned how to solve the graph coloring problem with CARTO but we have ended up creating the basis for a geospatial framework by creating PostgreSQL functions into our CARTO account.
More map coloring
So let's finish by applying these map coloring functions that now are part of our own geospatial framework inside CARTO to some other of our datasets.
The 4 color theorem applied to the US states dataset
A greedy approach to map color the US counties dataset
Another greedy example with the Spain municipalities
Note that the map coloring algorithm implementations presented in this blog post are totally naive and don't pretend to be exact or to be used under a production environment they just pretend to showcase a user workflow to solve a geospatial problem with CARTO.
For reference all these functions are available here. Feel free to add any comment or improve them.
If you like the kind of stuff we are involved in you may want to join us :)