Google BigQuery Visualization: Mapping Big Spatial Data

Summary

The GDELT Project describe how to use Google BigQuery to visualize their dataset of online news coverage focusing on COVID-19

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Google BigQuery Visualization: Mapping Big Spatial Data

Supported by Google Jigsaw  the GDELT Project monitors the world's broadcast  print  and web news from nearly every corner of every country in over 100 languages and identifies the people  locations  organizations  themes  sources  emotions  counts  quotes  images and events driving our global society every second of every day  creating a free open platform for computing on the entire world.

A visualization showing the connections between newsmakers

A GDELT Project visualization highlighting the 25 000 newsmakers mentioned most frequently and the connections among them.

Recently they launched the GDELT Global Geographic Graph  the underlying dataset powering the GDELT GEO 2.0 API  covering more than 1.7 billion location mentions from worldwide English language online news coverage back to April 4  2017  with full details of each mention  including a 600-character contextual snippet of its context and usage.

       The GDELT Project is a realtime network diagram and database of global human society for open research    

This post  guest written by GDELT  looks into how the integration with Google BigQuery can be leveraged to map this huge dataset  focusing on news coverage in the context of COVID-19 at different points during the pandemic.

--

The Global Geographic Graph now spans more than 1.7 billion location mentions in worldwide English language news coverage back to 2017  making it possible to map the geography of the global news narrative.

Creating a map from the Global Geographic Graph traditionally involved running a BigQuery SQL query to compile the results  downloading them as a CSV file and then importing that file into CARTO to create the actual map. However  CARTO actually has a built-in BigQuery connector  meaning you can simply paste your SQL query directly into CARTO  which runs the query on BigQuery and downloads the results into your map completely seamlessly and can even rerun the query on a fixed internal to keep the map constantly synced!

Let's say we want to create a map of locations mentioned in English language news coverage in the context of COVID-19 during the month of May. Just log into your CARTO account and select "New Dataset" and click on "BigQuery" as your data source and then click on the "Connect" button. You'll get the dialog below:

A screenshot the CARTO Google BigQuery connector


The "Billing Project ID" dropdown will automatically populate with the list of your Google Cloud projects (this is the project against whose quota the BigQuery query will be billed). You then paste your query in the SQL Query box and pick a name for the new CARTO dataset and then click "Connect to BQ"! That's quite literally all there is to it! CARTO will then send your query to BigQuery to be executed and within a few moments the results will be streamed back to CARTO and populate your new dataset!

Here's the actual query we're going to run (we use "GeoType  > 1" to include country-level matches):

SELECT ST_GeogPoint(Lon  Lat) as the_geom  count(1) Cnt FROM `gdelt-bq.gdeltv2.ggg` where (ContextualText like '%COVID%' and ContextualText like '%coronavirus%')and GeoType>1 and DATE(DateTime) >= "2020-05-01" AND DATE(DateTime) <= "2020-05-31" group by Lat Lon order by Cnt desc limit 10000

The "ST_GeogPoint(Lon  Lat) as the_geom" at the beginning of the query takes our latitude and longitude coordinates and converts them into a Geography "Point" type.

Alternatively  since the Global Geographic Graph actually already has a GEOGRAPHY column called "the_geom" you could also use:

SELECT ANY_VALUE(the_geom) as the_geom  count(1) Cnt FROM `gdelt-bq.gdeltv2.ggg` where (ContextualText like '%COVID%' and ContextualText like '%coronavirus%') and GeoType>1 and DATE(DateTime) >= "2020-05-01" AND DATE(DateTime) <= "2020-05-31" group by TO_JSON_STRING(the_geom) order by Cnt desc limit 10000

It is important to remember that grouping by a geography column like we're doing here is extremely bad practice for observed datasets due to the imprecise nature of most observed/sampled geographic data. When collecting GPS points  a stationary object can appear to constantly move in a small radius around its actual location due to GPS jitter  while sampling imprecision and error can yield clusters of points all slightly different from each other.

Simply snapping to a grid doesn't solve this issue  since regardless of the grid resolution  a cluster of points will be split across the rounded coordinates. In the case of the Global Geographic Graph  all locations are drawn from a set of centroid-based gazeteers in which every reference to Paris  France will always yield precisely the same coordinate and thus here we can safely group by our coordinates  but for other kinds of data we strongly recommend against it. With observed GIS to perform this kind of "Top N" grouping you would need to cluster your points first.

Within a few moments the data appears as a standard CARTO table and you can instantly map it:

A screenshot showing a map of online news coverage


But what if we want to make an interactive clickable map that displays the top 5 articles mentioning each location?

The query below selects the top 5 articles per location and concatenates their titles and URLs into a block of HTML and selects one of their social media images as the header image.

select ST_GeogPoint(Lon  Lat) as the_geom  ARRAY_TO_STRING(ARRAY_AGG(Art ORDER BY cnt DESC LIMIT 5)  '<BR><BR>') Articles  max(SharingImage) SharingImage  APPROX_TOP_COUNT(Loc  1)[OFFSET(0)].value LocationName  Count(1) NumMentions from (
select CONCAT('<a href="'  min(URL)  '" target="blank">' max(Title) '</a>') Art  Lat  Lon  max(SharingImage) SharingImage  APPROX_TOP_COUNT(Location  1)[OFFSET(0)].value Loc  count(1) cnt FROM `gdelt-bq.gdeltv2.ggg` WHERE (ContextualText like '%COVID%' and ContextualText like '%coronavirus%') and GeoType>1 and DATE(DateTime) >= "2020-05-01" AND DATE(DateTime) <= "2020-05-31" group by LOWER(Title)  Lat  Lon
) group by Lat  Lon order by NumMentions desc limit 10000

Once again the data is imported into a CARTO dataset within moments. Following the same instructions as we used in April (but this time skipping the part about saving the data out of BigQuery as a CSV and loading back into CARTO since we don't have to do that anymore!)  we get the interactive map below.

--

The GDELT Project also created a video documenting this process which can be viewed below.

 

As has been demonstrated with our connector  you only need to enter a SQL query from BigQuery to connect your data & make it available on the CARTO platform—allowing you to easily visualize  perform spatial analyses & build applications.

Want to know more?

Click to discover all the features of the BigQuery connector

This article was originally published on GDELT's blog.