Find Centroid of Geometries

Analysis Guides

Find Centroid of Geometries

A centroid is a geometric center that is calculated from geometries in a map layer. You can define whether centroids are calculated using all geometries, a group of geometries, or by a singular geometry.

This guide describes how to apply the Find centroid of geometries analysis to find unweighted, or weighted, centers of polygons for geometry groupings.

  • Specifically, we will use bike share usage data, that is aggregated by station.
  • Our goal is to find the geographical, or weighted centroid, of their service cluster to optimize the geographical assignment of vans; which re-balance bike docks if they are running low, or completely filled.

This multi-step process locates the service areas (by applying the Calculate clusters with points analysis), then calculates the weight, by usage, within the identified service areas (by applying the Find centroid of geometries analysis).

For this analysis, a well-formed, coherent research question helps to identify the variables to be used for weights, categories, and aggregation. If working with individual polygons, categorizing by cartodb_id or the_geom allows you to collapse the polygons to their individual centroids.

Clustering Points

Let's explore the Citi Bike data for New York City taken from June 2016. This dataset contains a column called count, that contains the number of times users ended their trip in the month of June. The end_station_name column contains the bike station where they ended their trip in the month of June.

Suppose the New York City Department of Transportation has a budget to position seven vans around the bike share network to reallocate excess bikes to empty stations. This example finds the optimal seven points, given our basic assumptions, according to the dataset.

  1. Import the template .carto file packaged from "Download resources" of this guide and create the map. Builder opens with Citi Bike data as the first and only map layer.

    Click on "Download resources" from this guide to download the zip file to your local machine. Extract the zip file to view the .carto file(s) used for this guide.
    Since our research question states the positioning of seven vans, we would want to form seven clusters of our station points. This analysis, which uses k-means clustering behind the scenes, minimizes the internal distance between points within a cluster while maximizing the distance between clusters.
  2. From the LAYERS pane, click the Citi Bike map layer.

  3. Click the ANALYSIS tab and apply the Calculate clusters of points option. The source layer must be the layer for which you need the clusters.

  4. Enter 7 as the number of clusters.

  5. Click APPLY.

    Number of cluster for Calculate clusters of points

  6. Switch to the Data View of the Citi Bike map layer, you will see that this step gives a cluster_no to each station, from one to seven.

    The Data View and Map View appear as icons on your map visualization when a map layer is selected. Click to switch between viewing your connected dataset as a table, or show the map view of your data.

    Data View shows cluster_no analysis results

Centroid of Clusters

Now that the cluster assignment of each station has been calculated (the result being the cluster_no column of the map layer), locate a weighted center for these clusters to optimally position the Citi Bike re-balance van.

By definition, a centroid is the geometric center of any geometry. When a centroid is weighted, the "center" is pulled towards areas with higher weights (For example, if one side of a city has a higher population, its weighted center would be closer to the side with more people). It uses a defined data field as weights for each point, and calculates the arithmetic mean position of the points with the bias.
  1. From the Citi Bike layer ANALYSIS tab, click + from Your worflow, to add a second analysis to the chain.

  2. Select the Find centroid of geometries analysis.

    • Enable the CATEGORIZE option and choose cluster_no.
    • Enable the WEIGHTED BY option and choose count.
    • Enable the AGGREGATE option from the Value aggregation section, and choose SUM by the count column. This is the count of user station visits for each bike station.
    • APPLY the analysis.

The analysis confirmation dialog displays which columns from your dataset were updated. The result is seven points as weighted centers of clusters for servicing the most frequented stations with the vans.

Centroid analysis option

Are you having trouble visualizing the results? Change the basemap style to DARK MATTER (LITE) and notice how the seven points are much more visible.

Styling Results

To apply even more style enhancements, you can style these centroids BY VALUE and choose one of the updated columns from your dataset. This helps display where the weighted centers are located, in relation to the most frequently used bike stations.

To visualize and style analysis results as a separate layer, there is a shortcut to create a new layer directly from the LAYERS pane in Builder.
  • Click and drag A1 Clusters from the Citi Bike map layer and drop it below the Citi Bike layer.

    Create new layer from analysis

A new map layer is created, B Citi Bike, displaying all of the centroids in the cluster of stations. Style this layer BY VALUE using the cluster_no column to visualize how the regions are defined geographically, according to the most frequented stations.

Download the final .carto file from the "Download resources" of this guide, and explore the advanced cartography applied. The Connect with lines analysis was also applied, and used to create a new layer, to clearly link between the stations that would be serviced (clusters), and the re-balance van that they would be serviced by (centroids).

External Resources

If you are interested in using the underlying functions in the SQL view of Builder, you can find centroids of polygon geometries (or clusters) with the SQL ST_Centroid query. View the PostGIS ST_Centroid documentation for details.