Analysis Guides

Retrieving Data Observatory Boundaries with SQL

For Enterprise account plans, the Data Observatory provides access to a searchable catalog of advanced location data, such as census block, population segments, boundaries and so on. A set of SQL functions allow you to augment your own data and broaden your analysis by discovering boundaries and measures of data from this catalog.

  • In CARTO Builder, the Enrich from Data Observatory analysis provides access to this feature.

    The Enrich from Data Observatory analysis is only available for Enterprise accounts. Request a demo if you are interested in enabling this service for your account.

  • Since CARTO includes a SQL view, you can also run a custom SQL query to apply a specific method of data enrichment to your data. A set of Data Observatory functions (prefaced with “OBS” for Observatory), allow you to retrieve boundaries and measures data through a SQL request.

This guide describes how to apply a SQL query with OBS functions in order to create a dataset. Specifically, we will retrieve boundaries of total populations for counties in the United States.

SQL Query in a Dataset

SQL (Structured Query Language) is how applications request data from a database. The CARTO geospatial database is built on the PostgreSQL platform and supports advanced PostGIS capabilities. You can run a SQL query from a selected dataset from Your dataset dashboard, or directly from CARTO Builder.

For OBS functions, we will run a SQL query from an empty dataset in order to retrieve and visualize data from the Data Observatory.

Request County Boundaries

In order to get county boundaries, let’s run a SQL query to retrieve data from the Data Observatory. For this particular workflow, it is recommended to build a new dataset from scratch, instead of modifying an existing dataset.

  1. Navigate to Your datasets dashboard in CARTO.

  2. Click NEW DATASET.

  3. From the Connect dataset options, click CREATE EMPTY DATASET.

    An empty dataset containing the default CARTO columns and indexes is created.

  4. Click the slider button from METADATA to SQL.

    Type measurement name

    This enables you to switch between viewing your data by METADATA (table) or SQL (the SQL view).

  5. Copy and paste the following SQL query using the OBS_GetBoundariesByGeometry function:

     with area as (select ST_MakeEnvelope(-180, -90, 180, 90, 4326) as world)
      select the_geom, geom_refs from area, OBS_GetBoundariesByGeometry(area.world, 'us.census.tiger.county')
    

    ST_MakeEnvelope is a PostGIS function that defines a bounding box area, based on coordinates, to create a polygon. In this case, the coordinates include the whole world.</

    OBS_GetBoundariesByGeometry is the Data Observatory function that selects the county polygons and the FIPS code identifier of each county. The value of us.census.tiger.county specifies that US Census Tracts data of county boundaries is requested. You can change this value to any of the other boundaries available from the Data Observatory.

  6. Click APPLY to retrieve US county boundaries.

    The SQL query returns the boundary data geom_ref as a single polygon geometry for each row of data.

    Type measurement name

Create Dataset from Query

When a dataset has been modified with a SQL query, you can create a duplicate copy of the results in a new table. This is useful for creating new datasets based on SQL query results.

  1. From the dataset name menu options, click Create Dataset from query.

    Create dataset from query

    The name of the new dataset appears as _untitled_table_copy. For best practices, let’s rename the untitled table to something more suited for managing our data.

  2. Double-click on the table name and rename it to county_population.

    You can also use the Rename dataset option from the dataset name context menu.

    Rename dataset

Retrieve Population Data

Our new dataset already contains US County polygons. Let’s add a new column and apply a second OBS function to retrieve population measures from the Data Observatory.

  1. Click ADD COLUMN to create a new column.

    See Adding Rows and Columns with the Data View for details about editing datasets.

  2. Format a column for population data:

    • Double-click on the column name and type total_population.

    You can also select the Rename column option from the column name context menu. Other options include the ability to change the sort order of the column data, change the data type of the column, and add or delete a column.

    • From the column name context menu, click Change Data Type and change the column type to a Number.

    Add column and change data Type

  3. Get population measures from the Data Observatory.

    • Click the slider button from METADATA to SQL.

    • Copy and paste the following SQL query using the OBS_GetMeasureById function:

     UPDATE county_population
     SET total_population = OBS_GetMeasureById(geom_refs, 'us.census.acs.B01003001', 'us.census.tiger.county', '2011 - 2015')
    
    • Click APPLY to retrieve total population measures.

      The SQL query returns the measures in the total_population column.

      The request is using the geom_refs FIPS codes in the dataset to look up the 2011-2015 ACS 5-year population estimates per county. See OBS_GetMeasureById and OBS_GetUSCensusMeasure Names for details about ids that correspond to demographics.

  4. Click CREATE MAP to visualize county population data retrieved from the Data Observatory!

    County population map

External Resources