Calculate Clusters of Points

Analysis Guides

Calculate Clusters of Points

This guide describes how to find natural groupings of points based on their proximity to one another. The analysis partitions each point into a group, so that the point lies closer to the center of each group than the center of any other group.

This analysis produces a new column, cluster_no (cluster number). Each row of your dataset will be classified from 0 to n-1, where n is the number of clusters that were chosen in the Builder.

k-means is a statistical clustering technique which aims to find k means, (in this case mean latitude and longitude) for the n data points in a dataset.

In its determination of closeness, this method uses "as the crow flies" distances, instead of using an underlying transit network.

Example

To determine store locations, the Calculate cluster of points analysis demonstrates the grouping of proximal points based on the classifications given.

  1. Import the template .carto file packaged from the "Download resources" of this guide and create the map. Builder opens with Customer Clusters as the first and only map layer.

    Click on "Download resources" from this guide to download the zip file to your local machine. Extract the zip file to view the .carto file(s) used for this guide.
  2. Select the Customer Clusters map layer.

  3. Click the ANALYSIS tab.

  4. Apply the Calculate clusters of points analysis, entering 6 as the # OF CLUSTERS.

    Number of clusters for analysis grouping

The results of this analysis show distinct regions in the city of Portland. The centers of these regions, which can be found by applying the Find centroid of geometries analysis to the workflow, represent locations that, by distance, are the optimal position for a store to service all of the points classified by that cluster number.

Cartography Tip

To better visualize the results of the analysis, let's style the layer by the cluster_no value, and change the classification method of grouping data.

  1. From the Customer Clusters map layer, click the STYLE tab.

  2. Click the COLOR to open the color properties for the map layer. The SOLID tab opens by default.

  3. Click BY VALUE and select cluster_no. A default color scheme is applied.

    Default color scheme

  4. Change the classification method to apply color properties using categories, as described in the following steps.

    • Select the context menu next to the default classification method, Quantiles. This enables you to change the classification for the selected column.
    • Select Category.

    Change the classification method

    Since this method classifies each point by the cluster number that it falls into, a qualitative color scheme is applied, where each output point is assigned a unique color based on the cluster_no attribute.

    Style your clusters of points by cluster number

Advanced Styling and Filtering

Apply custom CartoCSS to enhance the styling even more, and add a Category widget to filter data from your dashboard.

  1. Switch the slider button, located at the bottom of the STYLE tab, from VALUES to CARTOCSS and apply the following custom styling.

    #layer {
     marker-width: 7;
     marker-fill: ramp([cluster_no], cartocolor(Pastel), category(6));
     marker-line-width: 1;
     marker-line-color: #555;
     marker-line-opacity: 1;
     marker-allow-overlap: true;
    }
    

    CartoCSS view

  2. Add the cluster_no column as a widget.

    • Click the DATA tab.
    • Click the checkbox next to Add as a widget for the cluster_no column.
    • Click EDIT next to the selected column from the DATA tab.

    Edit selected widget from DATA tab of map layer

  3. Edit the widget details.

    • Edit the widget TYPE to be a CATEGORY widget.
    • Change the OPERATION to MAX by the cluster_no column.
    • Rename the widget to Cluster Number.

Try filtering your dashboard by selected cluster numbers to visualize selected categories of data.

Filtered category widget

External Resources