Like many people who love trees and work in the geospatial field, I was fascinated (and disheartened) by a recent article I read in the New York Times called Since When Have Trees Existed Only for Rich Americans? This op-ed and data visualization explores how city trees are far more likely to exist in wealthier neighborhoods vs. poorer neighborhoods. They show how this is generally true across many U.S. cities and describe how the lack of trees is linked to higher temperatures and more violent crime. It’s disheartening to realize that something as simple as a tree on the side of the street is a luxury good.
This scale of spatial analysis wasn’t always so easy. Only recently have we had access to the types of libraries, frameworks, and platforms that allow us to query across multiple large datasets to be able to better understand the built environment. Naturally, I wanted to try this out and show how it could be done using Google Cloud BigQuery and CARTO’s BigQuery Spatial Extension.
I built out a demo application that gives a tree score (based on number of trees and tree width) for the approx 6000 census block groups in New York City. This tutorial will explore the SQL queries I used to do this and hopefully inspire others to create custom index scores with similar datasets.
With two long SQL queries, I was able to generate a custom tree score for each census block group in New York City. Thanks to BigQuery’s speed, this was calculated in roughly 5-6 seconds. The datasets I worked with were publicly available in BigQuery:
The first query has several Common Table Expression (CTEs) but it’s fairly easy to follow. Here are the CTEs:
I now have a table with rich ACS and tree information for each block group in the city. Here’s the query with many inline comments:
I’ll now show how I created a query that gave each block group a custom ‘tree score’. This tree score gives us an overall view at tree quantity and size within an area and allows for comparison to other areas. We can also see if tree scores correlate with things like median income or other ACS sociodemographics, similar to what the New York Times piece did.
This query also has several CTEs but I’ll break it all down and add many comments:
The weighting part in the 2nd step is an important one. Index scores usually include multiple indicators and datasets. The index score creator can choose how they want to weight the variables. I decided to weight the quantity of trees a bit more than tree width but you might decide to do this differently. It would be interesting to include overall tree health or diversity of trees as part of the score. The index score creator should strive to be transparent about their weighting.
Here’s the query with comments:
I used our CARTO for React framework to build a simple dashboard to showcase the results of my tree index score. I included both the census block groups as well as all of the approx 600,000 trees from the New York City tree census. It’s amazing to see how trees cluster in little pockets around the city.
The dashboard is available here: https://nyctreescore.carto.io/indexscore
One of the core arguments of the New York Times piece (I mentioned above) was that trees tend to be located in wealthier neighborhoods. I used the Seaborn plotting library in Python (using CARTOframes) to determine if there was a correlation between the tree index score and the median income, as well as the tree score and different racial demographics. Here’s what it looks like for median income:
There was a weak positive correlation (.20) between median income and the tree score I calculated.
There was a weak negative correlation (-0.14) between the black population percentage and the tree score:
There was a weak positive correlation (0.19) for the white population percentage and the tree score
There was a weak negative correlation (-0.07) between hispanic population percentage and the tree score
While the positive correlation between median income and abundance of trees is not ideal, it’s not as dramatic as I might have expected. It’s also reassuring to see but there are not strong correlations between major racial demographics and trees. In many ways, it’s fair to say that New York City has a mostly egalitarian distribution of trees.
Most Data Scientists and Analysts understand that visualizing datasets can be a crucial way for users to engage with data. Knowing where median household income is across a...Use Cases
The urban growth of metropolitan areas around the world can be affected by a number of factors. During the industrial revolution the explosion in job availability fueled mu...Use Cases
This post was originally published on eSpatially New York by Sam Wear and has been reproduced with permission.Use Cases
Please fill out the below form and we'll be in touch real soon.