The NYPD recently made waves with an open data release, sharing fairly detailed crime data about the “seven major felonies” for the first 3 quarters of 2015. The dataset has a row for every incident, including location (as lat/lon), date/time, etc. However, as with many crime datasets, the point locations have been adjusted to mid-block or nearest intersection so that you can’t determine exactly where the reported crime occurred. As you might expect, there are many locations in NYC where multiple crimes occurred on the same block/intersection, so we end up with many rows that have the exact same point coordinates. This is a mapper’s conundrum, as point markers will simply overlap each other, only allowing you to see the one point when there are many beneath it.
Some Workarounds to the ‘Overlapping Points Problem’
We’ll use a simple dummy dataset that has 10 points in one location, and 5 points in another location. When rendered by CartoDB, we see 2 points when there are actually 15:
A heatmap overlay “heats up” an area on the map based on the number of points, or based on some aggregation of a numeric attribute. This obscures the point markers themselves, but at least the data are all visually accounted for.
Clustering replaces many markers with a single marker, either with a label or other visual representation of the represented points. When you zoom in closer, the algorithm determines whether there is still overlap. If there is none, it breaks up the cluster. Of course, if the points have the exact same coordinates, the cluster will never split up. Go ahead zoom all the way in… the clusters will remain:
Spidering, or spiderifying is a visualization technique where overlapping points can be offset around their mapped location, and visually linked to it with a line drawn on the map. This is sort of an “exploded view”, when there are multiple markers in the same area, allowing the user to interact with each one, but indicating that the marker has been shifted away from the location it represents.
This functionality is built into the leaflet.MarkerCluster plugin, which makes pretty animated clusters. Here’s an example using the same dummy data (click each cluster to see the spiderify animations):
Crime Wafers, Mentos, Chip Stacks - A SQL Hack in CartoDB
While tinkering with the NYC Felonies data shortly after its release, I was struggling with the overlapping points problem, and tried to implement a CartoDB-based approach to spidering based on this example by our very own Andrew Hill.
Because of the density of the felonies data, the results of trying this technique were a huge mess. Even after reigning in the spirals, the clusters were still too close together and the map was unusable.
I eventually figured out a way to modify the example so that instead of spreading out the points in a spiral pattern, I could simply offset each point along the y axis. The result was a felony map that shows “stacks” of markers at each location, allowing both visual styling and interactivity with every point in the dataset. Take a look:
So what’s the secret sauce? The example below shows the same dummy data from before rendered using this method. I’ll walk through through each step of the complex SQL query, explaining what each part does.
This step groups the points by their geometry, so points with identical locations will be grouped. It stashes the cartodb_ids for each group into an array using array_agg(). The groups are then sorted by their y coordinate (ST_Y is used to generate a new column with the y value), so that groups that are further south appear first. This will ensure that the more southern “stacks” are rendered in the foreground.
Now we’ve got one row per location, each with an array of unique ids from the source data associated with that location.
With this step, we’re going to un-group the data. generate_series() assigns a number for each row within its group, which we’ll use later to offset the geometries. The cartodb_ids are pulled out of the array we stashed them in, so now we’ve got a row for each felony in the original dataset, with it’s original geometry, its cartodb_id, and its number within its previous location group.
The last step generates the result set that is actually used in the map. ST_Translate() is used to create new geometries for each point based on the “location group” numbering we added in step 3. The x value of the new geometry remains the same as the original, but the y value is calculated as
ST_Y(f.the_geom_webmercator) + f.p*12. (12 is the y-offset in web mercator units, you can modify this to tighen or spread out the points) The first point in the location group will be 12 web mercator units north of the original point, the second one will be 24, and so on.
A little CartoCSS makes the markers into ovals instead of circles for a 3D-ish look of stacked chips.
This approach with a hard-coded offset will only work at a single zoom level. If you zoom in on the map above, you’ll see the points appear to spread out. This is due to the fact that 12 web mercator units takes up double the distance on your screen each time you zoom in. To overcome this in the felony map, we must dynamically define the offset based on the zoom level.
For each zoom level above zero, the y-offset is cut in half. The offset is then used with a SQL template in cartodb.js to re-render the map whenever the user zooms, and the “stacks” appear the same no matter what zoom level you’re on Check out the full code for the felonies map here.
Next time you find yourself facing the overlapping points problem, give this method a try and stack your chips!