Google’s Open Building dataset includes 516 million buildings covering 64% of the African continent. Learn how to analyze this dataset using CARTO's Spatial Analytics in BigQuery.
Google recently released the Open Buildings dataset, including 516 million building footprints covering 64% of the African continent. Each building is described as a polygon with a confidence score indicating the degree of confidence that the building identified is indeed a building. There are many potential use cases for the data including our recent visualization and blog post assessing the damage from a volcano on La Palma in the Spanish Canary Islands.
To analyze the entire continent, we can leverage BigQuery to store the information in one table and then use standard SQL to ask the data questions. In addition, to explore the entire continent at once we can use CARTO’s BigQuery Tiler (part of CARTO's Spatial Analytics in BigQuery) to visualize all 516 million buildings on one map efficiently.
Loading the data into BigQuery
Open Buildings has hosted all of the building polygons in a series of compressed CSV files in Google Cloud Storage, totaling 46 GB. Instead of downloading the files locally and pushing them back up into BigQuery, we can use the bq command-line tool to create a table in BigQuery directly from the files in cloud storage.
Each of the compressed CSV files are stored in the path: gs://open-buildings-data/v1/polygons_s2_level_4_gzip/
We can use the following command to take each of the compressed CSV files (the *.csv.gz denotes all files ending in that extension) and create a single table in BigQuery. It will automatically determine a schema to use and create a series of jobs to build a table.
When previewing the table in BigQuery, the schema should look like this:
Transforming the data
We now have the data of the buildings in a single BigQuery table. However, there are some refinements we can make to make geospatial analysis faster and easier.
- Select only the columns we need (in this case, the confidence score and the geometry).
- Create a column containing a GEOGRAPHY type, which is required for using geospatial functions in BigQuery and CARTO's Spatial Analytics in BigQuery (the raw table geometry column is type string from the import of the CSV files).
- Cluster the table by the GEOGRAPHY column, which will improve certain geospatial query performance.
Creating the tileset
We can create a tileset using CARTO's Spatial Analytics in BigQuery with a single SQL command. The first argument is the source table (the clean table we just created). The second argument is the desired output tileset table name. Last is an optional argument which could be null causing the default parameters to be used. However setting zoom_min to 10 and zoom_max to 20 will reduce the size and improve performance. Since we are viewing individual buildings we do not need to visualize the data at a zoom level lower than 10.
Creating the visualization
Using CARTO Builder, we can add the tileset, apply some styling based on confidence score and switch to a Google base map to see how the polygons line up versus Google Maps road and satellite imagery layers.
Click "Add source from…" and select the tileset created earlier.
Click the base map menu option and switch the base map to Google Maps. Optionally we can switch to Satellite if we wish to compare the building polygons to actual imagery.
Go into the "Layer settings", click the three vertical dots next to Fill Color, select the confidence column in the "Color Based On" field and switch the steps to 3 by clicking on the color ramp.
At the end of the steps outlined above we can see a map with all of the building polygons shown like the screenshot above. We can move around to most areas of Africa and see each building allowing for rapid exploration.
A live map can be found here.