Questions  /  Working with Data  /  Import Data

Sugar for the Data Observatory

Leveraging CARTOframes to accelerate your time to value when working with demographic data

Demographic data is defined as statistical data about the characteristics of a population, such as age, gender, and income of the people within the population. When the census assembles data about people’s age, gender, and housing status, these are examples of assembling information about demographics.

Note: The United States census has been collected since the start of the US, as outlined in Article I, Section 2 of the US Constitution. You can read more about the US Census here

Demographic data is the lifeblood of a lot of spatial data science. It helps give real world context to locations that turns them into actual places, and helps us gleam deep insights.

Consider real estate shopping: 2 homes can have the exact same floor plan and price, but how does the buyer know which to choose? You probably already know the answer to that. You need to consider other factors, like if you have young kids, what school district does the property falls in? If you are a young person and want to surround yourself with other young people, what is the age distribution of the surrounding areas? You may want to consider income level of the area to get an idea of how expensive things in the area may be. These are all demographics that are collected by law, but how accessible are they to you right now?

The tool highlighted in this article, Sugar, is a jupyter notebook which leverages CARTOframes to access CARTO’s Data Observatory.

Note: Using CARTOframes requires an API key, so to use this tool, you will need a CARTO account that includes Engine.

Accessing the Tool

CARTO Sugar

This tool is currently hosted in Colaboratory, a free Jupyter notebook environment that requires no setup and runs entirely in the cloud, courtesy of Google. This has several advantages, like being able to access your notebooks anywhere, or send them to anyone.

When you approach the notebook, it will be in what is called Playground Mode, which is a read-only copy of the notebook. Any changes you make to the notebook will not be written anywhere except your temporary copy of the notebook. You can choose to download this notebook as a .py or .ipynb format, or to save a copy of this notebook to your own google drive or github accounts if you want to save your changes or outputs.

The notebook is set up to be as approachable as possible, subdivided into 4 sections:

  1. notebook setup
  2. CARTOframes documentation
  3. setting up datasets and dataframes for enrichment
  4. selecting metrics to use for enrichment

Note: dataset refers to a table in a CARTO account that has been processed, so it has the cartodb_id, the_geom and the_geom_webmercator columns.

Note: dataframe refers to a table made in pandas to be used in your notebook environment

Notebook Outline

├── 1 | notebook setup
│   ├── 1.1 | installs and imports
│   └── 1.2 | setup CARTOframes credentials
│   │   ├── 1.2.1 | primary user (MANDATORY)
│   │   └── 1.2.2 | secondary user (OPTIONAL)
├── 2 | check out the CARTOframes docs
├── 3 | setting up datasets and dataframes
│   ├── 3.1 | boundary creation (OPTION 1)
│   │   ├── 3.1.1 | draw a polygon to retrieve a bounding box
│   │   ├── 3.1.2 | create a new CARTO dataset and dataframe
│   │   └── 3.1.3 | trim using dataset (OPTIONAL)
│   └── 3.2 | use a preexisting CARTO dataset, and import as a dataframe (OPTION 2)
└── 4 | determine desired demographic measures
    ├── 4.1 | create dataframe to hold desired metrics
    │   ├── 4.1.1 | metric 1
    │   │   ├── step 1: search for metrics
    │   │   ├── step 2: select a subset of the dataframe using index numbers
    │   │   └── step 3: commit the subset to our master enrichment dataframe
    │   ├── 4.1.2 | metric 2
    │   ├── 4.1.3 | metric 3
    │   ├── 4.1.4 | metric 4
    │   ├── 4.1.5 | metric 5
    │   ├── 4.1.6 | metric 6
    │   ├── 4.1.7 | metric 7
    │   ├── 4.1.8 | metric 8
    │   └── 4.1.9 | metric 9
    ├── 4.2 | confirm all selected metrics
    └── 4.3 | augment dataframe and write back to CARTO dataset

Section 1: Notebook Setup

While Colaboratory is a hosted environment that requires no setup, that does not mean we cannot customize. These first cells optimize the notebook for our viewing, imports necessary libraries, and even installs (using pip) CARTOframes. You will need to pass the notebook your CARTO username, and the master API key for your account. The master key is necessary because we will be creating, modifying, and deleting datasets in our account.

Section 2: Inline Documentation

This section was included primarily to save time from having to look at the CARTOframes documentation in a separate window. These cells demonstrate how to look at documentation for the methods we will be using in this notebook inline.

Section 3: Setting up datasets and dataframes

In this section, there are 2 paths to take.

Path 1: Boundary Creation

Why do you need this?

  • :scream: You do not have geometries to enrich
  • :eyes: You may have a reference dataset for the area you are interested in

The first cell in this function runs a code block that will open a javascript applet that uses CARTO VL, Airship, and our SQL API. The purpose of this cell is to generate a bounding box that we will use to define the geometries that we will create in our account.

bounding box

To use this:

  1. Pan and zoom to the area of the map you would like to use
  2. Click on the button with the square in the top right corner of the map
  3. On the map, begin to click to create a polygon around the area you would like to use
  4. To close the polygon, click on the first point you made again
  5. A bounding box will appear around your polygon and the value of the bounding box has appeared below the map (the value has also been copied to your clipboard)
  6. You will paste this value into the bbox variable of the next cell

On the right side, you will see the option to display a dataset. The applet code uses the credentials you used to set up your CARTOframes context to be able to display select a dataset from your account to use as reference. The textbox input you use will be completing this SQL statement:

SELECT the_geom_webmercator FROM

Just enter the name of any dataset in your account to pan and zoom to it, to be able to use it as a reference for when you draw your bounding box.

bounding box with reference

After you have your bounding box and have run the next cell, you will be able to choose what level of census geometry you would like. The most common options, and the options with the most associated metrics, are Census Tracts and Census Block Groups.

selecting geom_id

Copy the geom_id of the level of census geometry you chose, and give the dataset that will be created in your CARTO account a name. Once you execute the cell, the data_boundaries method will fetch the census level geometries you select in the area that the bounding box you made overlaps with and store the results in a dataframe. Next, the write method will write the dataframe to your CARTO account. Last, the map method will display a map of your dataset, as it is in your account.

selecting geom_id

You will notice that I got back an area larger than my reference polygon. If you would like to trim your dataset to the area of your reference geometry, you can use the next cell. Enter the name of the dataset in your account to trim to then execute the cell. The query method will use SQL to perform a spatial intersect of your new dataset and your reference dataset to create a new dataframe. The write method will overwrite your old dataset with this new one.

trimmed dataset

Path 2: Use a preexisting CARTO dataset

Why do you need this?

  • :ballot_box_with_check: You already have standard census geometries to enrich
  • :cyclone: You are using irregular geometries/datasets to enrich

To use this section, you will use the name of the dataset you have in your account, then choose the census level you would like to search for metrics at.

Section 4: Selecting demographics to enrich your dataset with

In this section, we will be selecting the metrics to enrich our dataset with. There are 3 subsections:

  1. creating our metrics dataframe
  2. confirming our selected metrics
  3. augmenting our dataset with the metrics

In our first section, 4.1, you will find 9 more subsections. These are all the same (with the exception of the first one), and you can either use 1 or all 9, it depends on your use case. When we expand our first metric, you will see these steps:

  1. search for metrics
  2. select the ones you want via index
  3. commit the subset to the master enrichment dataframe

creating our metrics dataframe

Step 1

You will search for your desired metrics here. The first metric also includes a tspan variable that you can use to narrow your results by a few popular timespans. Leave this set to None if you have want to see all timespans available. This timespan will translate to the rest of the metrics you search for.

You might encounter an error that looks something like this:

error message

In the last line, you will see: KeyError: "['suggested_name' 'numer_name' 'denom_name' 'normalization'\n 'numer_timespan' 'geom_id'] not in index". In this case, not in index means that no results were found. This could be due to a typo, or because for the census level you are on, that metric does not exist. You can browse the full data observatory catalog here.

Step 2

Type in the index number (the number in the beginning of each row on the left) for each metric you want to add to your master dataframe. If you have more than 1 index you are interested in, separate each index number with a comma. The order you list these numbers does not matter.

Step 3

This step appends the subset dataframe you just made to the master dataframe. Please note that if you run this twice, it will add the subset twice, so make sure that you have the subset dataframe as you like it in step 2 before doing this step. A print out of your master dataframe is generated after you commit your subset so that you can check as you can make sure you added the correct metrics as you go.

Confirming our selected metrics

This section will print your master dataframe to give one last check before you enrich your dataset.

Augmenting our dataset with the metrics

In this section, the how parameter will determine how your dataframe relates to your dataset.

If you are using a dataset that include a fips code (if you used section 3.1, you have this column in your dataset, called geom_refs) that is for the correct census level (if you made a census tract level dataframe, you need to use census tract fips codes(again, if you used section 3.1, these should be the correct value)) you can use the name of the fips code column to do a standard join.

If you do not have a fips code column, you will need to use the_geom. Using this option can be for irregular polygons, like an isochrone, or even if you have a census level polygon, like a census block, that you are using a block group level dataframe to enrich. For any case, when using the_geom CARTOframes will interpolate the results in an intelligent way. This is done using a spatial join.

:rocket: What’s next?

:ballot_box_with_check: Check out the webinar on Sugar for a more in depth demo

:ballot_box_with_check: Read our blog post about census oddities

:ballot_box_with_check: Check out the documentation for CARTOframes and our Python SDK