Identifying high impact health facility locations with CARTO's BigQuery Tiler & open-sourced geospatial Big Data.
Recently we announced our BigQuery Tiler allowing geospatial Big Data to be visualized in a matter of minutes. Since then we've been amazed by the maps being created by the community and those who have signed up for the beta including by our new partners Thinking Machines.
In light of the pandemic it is very important for us to have a quick and scalable way of identifying vulnerable populations for pinpointed interventions. One example of this is how important it is to make sure that the majority of the population has access to the necessary health facilities that they can go to should they need it. Now more than ever government and health groups need to be able to quickly identify where to allocate resources in a way that allows as much impact as possible.
Luckily through the excellent work of the scientific and open-source geospatial community we have global datasets that help inform this decision making. We have global datasets on a lot of indicators such as population health facilities and mobility all of which helps in being able to quickly pinpoint these areas -- even down to the house-level granularity.
However with large datasets comes an even larger problem -- computing power. With you having to process gigabytes of information to access these datasets every single processing step becomes a blocker -- something as simple as viewing data on a map becomes a huge endeavor. And with huge processing tasks come huge resource requirements of renting out the compute capacity. Thus for groups without the technical know-how around geospatial processing options or without the resources it becomes difficult and/or costly to do very granular countrywide geospatial analysis.
Understanding this problem CARTO developed their BigQuery Tiler -- a quick and easy tool to process visualize and thereafter analyze large spatial datasets straight from BigQuery. Using this technology together with Thinking Machines' datasets and geospatial processing expertise we created a demo of how we can quickly identify healthcare gaps at scale.
Going back to the problem earlier -- how do we identify high impact locations for the construction of new health facilities. We use two very popular datasets as a proof-of-concept:
- Facebook's High-Resolution Settlement Layer which provides population estimates for up to 30m x 30m granularity.
- Health Facilities in OpenStreetMap. Crowdsourced information on locations of specific points of interest all over the world.
Our goal is to be able to identify high concentrations of settlements that do not have access to health facilities within a certain distance. For the purpose of this blog post we focus on the Philippines Malaysia and Vietnam -- a total of almost 1 million square kilometers in terms of area. The population layer alone has around 19.6M rows in its dataset and with the health facilities being a bit over a million points of interest in total.
Using BigQuery Tiler we're able to load both datasets onto a map in almost no time at all without having to worry about any ETL loading times or cost!
Basically BigQuery Tiler allows us to partition our very large datasets in BigQuery into vector tiles which makes loading and visualization of datasets much more manageable for our web maps. What this means is that we can easily view the population and health facility data of an entire country without having to worry about the dataset size or scale.
Furthermore once it's on the map we can also easily build analysis layers on top. For example we can filter out settlements that already have access to health facilities. This allows users to focus primarily on areas that are not within a certain distance to a health facility -- a distance that can be easily chosen by the user.
We can even quantify the vulnerable population within an area by using our drawing tools to select custom areas of interest and easily summarize the data based on that.
The good thing about this as well is that we can generalize the same methodology across many different types of use cases and datasets. For example at Thinking Machines we use Machine Learning and AI to extract wealth information from satellite images at scale. We're able to combine our extracted wealth information with building infrastructure to allow our telecommunications partners to identify ideal locations for cell sites based on their target wealth profiles and potential customer volume.
And now with BigQuery Tiler we're able to collect that information without having to worry about the scale and compute required to visualize and process the data.
Want to try something similar?