CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

CARTOframes v1.0.1 includes breaking changes from betas and 0.10 version, check the migration guide to learn how to update your code.

Data discovery

Introduction

The Data Observatory is a spatial data repository that enables data scientists to augment their data and broaden their analysis. It offers a wide range of datasets from around the globe.

This guide is intended for those who want to start augmenting their own data using CARTOframes and wish to explore CARTO’s public Data Observatory catalog to find datasets that best fit their use cases and analyses.

Note: The catalog is public and you don’t need a CARTO account to search for available datasets

Looking for demographics and financial data in the US in the catalog

In this guide we are going to filter the Data Observatory catalog looking for demographics and financial data in the US.

The catalog is comprised of thousands of curated spatial datasets, so when searching for data the easiest way to find out what you are looking for is make use of a feceted search. A faceted (or hierarchical) search allows you to narrow down search results by applying multiple filters based on faceted classification of the catalog datasets.

Datasets are organized in three main hirearchies:

  • Country
  • Category
  • Geography (or spatial resolution)

For our analysis we are looking for demographics and financial datasets in the US with a spatial resolution at the level of block groups.

Find demographic and financial data for the US

In this guide we walk through the Data Observatory catalog looking for demographics and financial data in the US.

The catalog is comprised of thousands of curated spatial datasets, so when searching for data the easiest way to find what you are looking for is to make use of a faceted search. A faceted (or hierarchical) search allows you to narrow down search results by applying multiple filters based on faceted classification of catalog datasets.

Datasets are organized in three main hierarchies:

  • Country
  • Category
  • Geography (or spatial resolution)

For our analysis we are looking for demographic and financial datasets in the US with a spatial resolution at the block group level.

We can start by discovering which available geographies (or spatial resolutions) we have for demographic data in the US, by filtering the catalog by country and category and listing the available geographies.

Let’s start exploring the available categories of data for the US:

1
2
from cartoframes.data.observatory import Catalog
Catalog().country('usa').categories
1
2
3
4
5
6
[<Category.get('road_traffic')>,
 <Category.get('points_of_interest')>,
 <Category.get('human_mobility')>,
 <Category.get('financial')>,
 <Category.get('environmental')>,
 <Category.get('demographics')>]

For the case of the US, the Data Observatory provides six different categories of datasets. Let’s discover the available spatial resolutions for the demographics category (which at a first sight will contain the population data we need).

1
2
3
from cartoframes.data.observatory import Catalog
geographies = Catalog().country('usa').category('demographics').geographies
geographies
1
2
3
4
5
6
7
8
9
10
[<Geography.get('ags_q17_4739be4f')>,
 <Geography.get('mbi_blockgroups_1ab060a')>,
 <Geography.get('mbi_counties_141b61cd')>,
 <Geography.get('mbi_county_subd_e8e6ea23')>,
 <Geography.get('mbi_pc_5_digit_4b1682a6')>,
 <Geography.get('usct_blockgroup_f45b6b49')>,
 <Geography.get('usct_censustract_bc698c5a')>,
 <Geography.get('usct_county_ec40c962')>,
 <Geography.get('usct_state_4c8090b5')>,
 <Geography.get('usct_zcta5_75071016')>]

Let’s filter the geographies by those that contain information at the level of blockgroup. For that purpose we are converting the geographies to a pandas DataFrame and search for the string blockgroup in the id of the geographies:

1
2
df = geographies.to_dataframe()
df[df['id'].str.contains('blockgroup', case=False, na=False)]
id slug name description country_id provider_id provider_name lang geom_coverage geom_type update_frequency version is_public_data
1 carto-do.mbi.geography_usa_blockgroups_2019 mbi_blockgroups_1ab060a USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... usa mbi Michael Bauer International eng 01060000005A0100000103000000010000002900000013... MULTIPOLYGON None 2019 False
5 carto-do-public-data.usa_carto.geography_usa_b... usct_blockgroup_f45b6b49 Census Block Groups (2015) - shoreline clipped Shoreline clipped TIGER/Line boundaries. More ... usa usa_carto CARTO shoreline-clipped USA Tiger geographies eng 01060000005A0100000103000000010000002900000013... MULTIPOLYGON None 2015 True

We have three available datasets, from three different providers: Michael Bauer International, Open Data and AGS. For this example, we are going to look for demographic datasets for the MBI blockgroups geography mbi_blockgroups_1ab060a:

1
2
datasets = Catalog().country('usa').category('demographics').geography('mbi_blockgroups_1ab060a').datasets
datasets
1
2
3
4
5
6
7
8
9
[<Dataset.get('mbi_households__45067b14')>,
 <Dataset.get('mbi_population_341ee33b')>,
 <Dataset.get('mbi_purchasing__53ab279d')>,
 <Dataset.get('mbi_consumer_sp_54c4abc3')>,
 <Dataset.get('mbi_sociodemogr_b5516832')>,
 <Dataset.get('mbi_education_20063878')>,
 <Dataset.get('mbi_households__c943a740')>,
 <Dataset.get('mbi_retail_spen_c31f0ba0')>,
 <Dataset.get('mbi_consumer_pr_68d1265a')>]

Let’s continue with the data discovery. We have 6 datasets in the US with demographics information at the level of MBI blockgroups:

1
datasets.to_dataframe()
id slug name description country_id geography_id geography_name geography_description category_id category_name provider_id provider_name data_source_id lang temporal_aggregation time_coverage update_frequency version is_public_data
0 carto-do.mbi.demographics_householdsbytype_usa... mbi_households__45067b14 Households By Type at Blockgroups (micro) leve... Data is country-specific. usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International households_by_type eng yearly [2019-01-01,2020-01-01) None 2019 False
1 carto-do.mbi.demographics_population_usa_block... mbi_population_341ee33b Population at Blockgroups (micro) level for USA Population figures are shown as projected aver... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International population eng yearly [2019-01-01,2020-01-01) None 2019 False
2 carto-do.mbi.demographics_purchasingpower_usa_... mbi_purchasing__53ab279d Purchasing Power at Blockgroups (micro) level ... Purchasing Power describes the disposable inco... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International purchasing_power eng yearly [2019-01-01,2020-01-01) None 2019 False
3 carto-do.mbi.demographics_consumerspending_usa... mbi_consumer_sp_54c4abc3 Consumer Spending at Blockgroups (micro) level... MBI Consumer Spending by product groups quanti... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International consumer_spending eng yearly [2019-01-01,2020-01-01) None 2019 False
4 carto-do.mbi.demographics_sociodemographics_us... mbi_sociodemogr_b5516832 Sociodemographics at Blockgroups (micro) level... MBI Sociodemographics includes:\n- Population\... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International sociodemographics eng yearly [2019-01-01,2020-01-01) None 2019 False
5 carto-do.mbi.demographics_education_usa_blockg... mbi_education_20063878 Education at Blockgroups (micro) level for USA Data is country-specific. usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International education eng yearly [2019-01-01,2020-01-01) None 2019 False
6 carto-do.mbi.demographics_householdsbyincomequ... mbi_households__c943a740 Households By Income Quintiles at Blockgroups ... On the national level the number of households... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International households_by_income_quintiles eng yearly [2019-01-01,2020-01-01) None 2019 False
7 carto-do.mbi.demographics_retailspending_usa_b... mbi_retail_spen_c31f0ba0 Retail Spending at Blockgroups (micro) level f... Retail Spending relates to the proportion of P... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International retail_spending eng yearly [2019-01-01,2020-01-01) None 2019 False
8 carto-do.mbi.demographics_consumerprofiles_usa... mbi_consumer_pr_68d1265a Consumer Profiles at Blockgroups (micro) level... The MB International Consumer Styles describe ... usa carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups MBI Digital Boundaries for USA at Blockgroups ... demographics Demographics mbi Michael Bauer International consumer_profiles eng yearly [2019-01-01,2020-01-01) None 2019 False

They comprise different information: consumer spending, retail potential, consumer profiles, etc.

At a first sight, it looks the dataset with data_source_id: sociodemographic might contain the population information we are looking for. Let’s try to understand a little bit better what data this dataset contains by looking at its variables:

1
2
3
4
from cartoframes.data.observatory import Dataset
dataset = Dataset.get('ags_sociodemogr_e92b1637')
variables = dataset.variables
variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
[<Variable.get('HINCYMED65_310bc888')> #'Median Household Income: Age 65-74 (2019A)',
 <Variable.get('HINCYMED55_1a269b4b')> #'Median Household Income: Age 55-64 (2019A)',
 <Variable.get('HINCYMED45_33daa0a')> #'Median Household Income: Age 45-54 (2019A)',
 <Variable.get('HINCYMED35_4c7c3ccd')> #'Median Household Income: Age 35-44 (2019A)',
 <Variable.get('HINCYMED25_55670d8c')> #'Median Household Income: Age 25-34 (2019A)',
 <Variable.get('HINCYMED24_22603d1a')> #'Median Household Income: Age < 25 (2019A)',
 <Variable.get('HINCYGT200_e552a738')> #'Household Income > $200000 (2019A)',
 <Variable.get('HINCY6075_1933e114')> #'Household Income $60000-$74999 (2019A)',
 <Variable.get('HINCY4550_f7ad7d79')> #'Household Income $45000-$49999 (2019A)',
 <Variable.get('HINCY4045_98177a5c')> #'Household Income $40000-$44999 (2019A)',
 <Variable.get('HINCY3540_73617481')> #'Household Income $35000-$39999 (2019A)',
 <Variable.get('HINCY2530_849c8523')> #'Household Income $25000-$29999 (2019A)',
 <Variable.get('HINCY2025_eb268206')> #'Household Income $20000-$24999 (2019A)',
 <Variable.get('HINCY1520_8f321b8c')> #'Household Income $15000-$19999 (2019A)',
 <Variable.get('HINCY12550_f5b5f848')> #'Household Income $125000-$149999 (2019A)',
 <Variable.get('HHSCYMCFCH_9bddf3b1')> #'Families married couple w children (2019A)',
 <Variable.get('HHSCYLPMCH_e844cd91')> #'Families male no wife w children (2019A)',
 <Variable.get('HHSCYLPFCH_e4112270')> #'Families female no husband children (2019A)',
 <Variable.get('HHDCYMEDAG_69c53f22')> #'Median Age of Householder (2019A)',
 <Variable.get('HHDCYFAM_85548592')> #'Family Households (2019A)',
 <Variable.get('HHDCYAVESZ_f4a95c6f')> #'Average Household Size (2019A)',
 <Variable.get('HHDCY_23e8e012')> #'Households (2019A)',
 <Variable.get('EDUCYSHSCH_5c444deb')> #'Pop 25+ 9th-12th grade no diploma (2019A)',
 <Variable.get('EDUCYLTGR9_cbcfcc89')> #'Pop 25+ less than 9th grade (2019A)',
 <Variable.get('EDUCYHSCH_b236c803')> #'Pop 25+ HS graduate (2019A)',
 <Variable.get('EDUCYGRAD_d0179ccb')> #'Pop 25+ graduate or prof school degree (2019A)',
 <Variable.get('EDUCYBACH_c2295f79')> #'Pop 25+ Bachelors degree (2019A)',
 <Variable.get('DWLCYVACNT_4d5e33e9')> #'Housing units vacant (2019A)',
 <Variable.get('DWLCYRENT_239f79ae')> #'Occupied units renter (2019A)',
 <Variable.get('DWLCYOWNED_a34794a5')> #'Occupied units owner (2019A)',
 <Variable.get('AGECYMED_b6eaafb4')> #'Median Age (2019A)',
 <Variable.get('AGECYGT85_b9d8a94d')> #'Population age 85+ (2019A)',
 <Variable.get('AGECYGT25_433741c7')> #'Population Age 25+ (2019A)',
 <Variable.get('AGECYGT15_681a1204')> #'Population Age 15+ (2019A)',
 <Variable.get('AGECY8084_b25d4aed')> #'Population age 80-84 (2019A)',
 <Variable.get('AGECY7579_15dcf822')> #'Population age 75-79 (2019A)',
 <Variable.get('AGECY7074_6da64674')> #'Population age 70-74 (2019A)',
 <Variable.get('AGECY6064_cc011050')> #'Population age 60-64 (2019A)',
 <Variable.get('AGECY5559_8de3522b')> #'Population age 55-59 (2019A)',
 <Variable.get('AGECY5054_f599ec7d')> #'Population age 50-54 (2019A)',
 <Variable.get('AGECY4549_2c44040f')> #'Population age 45-49 (2019A)',
 <Variable.get('AGECY4044_543eba59')> #'Population age 40-44 (2019A)',
 <Variable.get('AGECY3034_86a81427')> #'Population age 30-34 (2019A)',
 <Variable.get('AGECY2529_5f75fc55')> #'Population age 25-29 (2019A)',
 <Variable.get('AGECY1519_66ed0078')> #'Population age 15-19 (2019A)',
 <Variable.get('AGECY0509_c74a565c')> #'Population age 5-9 (2019A)',
 <Variable.get('AGECY0004_bf30e80a')> #'Population age 0-4 (2019A)',
 <Variable.get('EDUCYSCOLL_1e8c4828')> #'Pop 25+ college no diploma (2019A)',
 <Variable.get('MARCYMARR_26e07b7')> #'Now Married (2019A)',
 <Variable.get('AGECY2024_270f4203')> #'Population age 20-24 (2019A)',
 <Variable.get('AGECY1014_1e97be2e')> #'Population age 10-14 (2019A)',
 <Variable.get('AGECY3539_fed2aa71')> #'Population age 35-39 (2019A)',
 <Variable.get('EDUCYASSOC_fa1bcf13')> #'Pop 25+ Associate degree (2019A)',
 <Variable.get('HINCY1015_d2be7e2b')> #'Household Income $10000-$14999 (2019A)',
 <Variable.get('HINCYLT10_745f9119')> #'Household Income < $10000 (2019A)',
 <Variable.get('POPPY_946f4ed6')> #'Population (2024A)',
 <Variable.get('INCPYMEDHH_e8930404')> #'Median household income (2024A)',
 <Variable.get('AGEPYMED_91aa42e6')> #'Median Age (2024A)',
 <Variable.get('DWLPY_819e5af0')> #'Housing units (2024A)',
 <Variable.get('INCPYAVEHH_6e0d7b43')> #'Average household Income (2024A)',
 <Variable.get('INCPYPCAP_ec5fd8ca')> #'Per capita income (2024A)',
 <Variable.get('HHDPY_4207a180')> #'Households (2024A)',
 <Variable.get('VPHCYNONE_22cb7350')> #'Households: No Vehicle Available (2019A)',
 <Variable.get('VPHCYGT1_a052056d')> #'Households: Two or More Vehicles Available (2019A)',
 <Variable.get('VPHCY1_53dc760f')> #'Households: One Vehicle Available (2019A)',
 <Variable.get('UNECYRATE_b3dc32ba')> #'Unemployment Rate (2019A)',
 <Variable.get('SEXCYMAL_ca14d4b8')> #'Population male (2019A)',
 <Variable.get('SEXCYFEM_d52acecb')> #'Population female (2019A)',
 <Variable.get('RCHCYWHNHS_9206188d')> #'Non Hispanic White (2019A)',
 <Variable.get('RCHCYOTNHS_d8592ce9')> #'Non Hispanic Other Race (2019A)',
 <Variable.get('RCHCYMUNHS_1a2518ec')> #'Non Hispanic Multiple Race (2019A)',
 <Variable.get('RCHCYHANHS_dbe5754')> #'Non Hispanic Hawaiian/Pacific Islander (2019A)',
 <Variable.get('RCHCYBLNHS_b5649728')> #'Non Hispanic Black (2019A)',
 <Variable.get('RCHCYASNHS_fabeaa31')> #'Non Hispanic Asian (2019A)',
 <Variable.get('RCHCYAMNHS_4a788a9d')> #'Non Hispanic American Indian (2019A)',
 <Variable.get('POPCYGRPI_147af7a9')> #'Institutional Group Quarters Population (2019A)',
 <Variable.get('POPCYGRP_74c19673')> #'Population in Group Quarters (2019A)',
 <Variable.get('POPCY_f5800f44')> #'Population (2019A)',
 <Variable.get('MARCYWIDOW_7a2977e0')> #'Widowed (2019A)',
 <Variable.get('MARCYSEP_9024e7e5')> #'Separated (2019A)',
 <Variable.get('MARCYNEVER_c82856b0')> #'Never Married (2019A)',
 <Variable.get('MARCYDIVOR_32a11923')> #'Divorced (2019A)',
 <Variable.get('LNIEXSPAN_9a19f7f7')> #'SPANISH SPEAKING HOUSEHOLDS',
 <Variable.get('LNIEXISOL_d776b2f7')> #'LINGUISTICALLY ISOLATED HOUSEHOLDS (NON-ENGLISH SP...',
 <Variable.get('LBFCYUNEM_1e711de4')> #'Pop 16+ civilian unemployed (2019A)',
 <Variable.get('LBFCYNLF_c4c98350')> #'Pop 16+ not in labor force (2019A)',
 <Variable.get('INCCYMEDHH_bea58257')> #'Median household income (2019A)',
 <Variable.get('INCCYMEDFA_59fa177d')> #'Median family income (2019A)',
 <Variable.get('INCCYAVEHH_383bfd10')> #'Average household Income (2019A)',
 <Variable.get('HUSEXAPT_988f452f')> #'UNITS IN STRUCTURE: 20 OR MORE',
 <Variable.get('HUSEX1DET_3684405c')> #'UNITS IN STRUCTURE: 1 DETACHED',
 <Variable.get('HOOEXMED_c2d4b5b')> #'Median Value of Owner Occupied Housing Units',
 <Variable.get('HISCYHISP_f3b3a31e')> #'Population Hispanic (2019A)',
 <Variable.get('HINCYMED75_2810f9c9')> #'Median Household Income: Age 75+ (2019A)',
 <Variable.get('HINCY15020_21e894dd')> #'Household Income $150000-$199999 (2019A)',
 <Variable.get('BLOCKGROUP_16298bd5')> #'Geographic Identifier',
 <Variable.get('LBFCYLBF_59ce7ab0')> #'Population In Labor Force (2019A)',
 <Variable.get('LBFCYARM_8c06223a')> #'Pop 16+ in Armed Forces (2019A)',
 <Variable.get('DWLCY_e0711b62')> #'Housing units (2019A)',
 <Variable.get('LBFCYPOP16_53fa921c')> #'Population Age 16+ (2019A)',
 <Variable.get('LBFCYEMPL_c9c22a0')> #'Pop 16+ civilian employed (2019A)',
 <Variable.get('INCCYPCAP_691da8ff')> #'Per capita income (2019A)',
 <Variable.get('RNTEXMED_2e309f54')> #'Median Cash Rent',
 <Variable.get('HINCY3035_4a81d422')> #'Household Income $30000-$34999 (2019A)',
 <Variable.get('HINCY5060_62f78b34')> #'Household Income $50000-$59999 (2019A)',
 <Variable.get('HINCY10025_665c9060')> #'Household Income $100000-$124999 (2019A)',
 <Variable.get('HINCY75100_9d5c69c8')> #'Household Income $75000-$99999 (2019A)',
 <Variable.get('AGECY6569_b47bae06')> #'Population age 65-69 (2019A)']
1
2
3
from cartoframes.data.observatory import Dataset
vdf = variables.to_dataframe()
vdf
id slug name description column_name db_type dataset_id agg_method variable_group_id starred
0 carto-do.ags.demographics_sociodemographic_usa... HINCYMED65_310bc888 HINCYMED65 Median Household Income: Age 65-74 (2019A) HINCYMED65 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
1 carto-do.ags.demographics_sociodemographic_usa... HINCYMED55_1a269b4b HINCYMED55 Median Household Income: Age 55-64 (2019A) HINCYMED55 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
2 carto-do.ags.demographics_sociodemographic_usa... HINCYMED45_33daa0a HINCYMED45 Median Household Income: Age 45-54 (2019A) HINCYMED45 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
3 carto-do.ags.demographics_sociodemographic_usa... HINCYMED35_4c7c3ccd HINCYMED35 Median Household Income: Age 35-44 (2019A) HINCYMED35 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
4 carto-do.ags.demographics_sociodemographic_usa... HINCYMED25_55670d8c HINCYMED25 Median Household Income: Age 25-34 (2019A) HINCYMED25 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
... ... ... ... ... ... ... ... ... ... ...
103 carto-do.ags.demographics_sociodemographic_usa... HINCY3035_4a81d422 HINCY3035 Household Income $30000-$34999 (2019A) HINCY3035 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
104 carto-do.ags.demographics_sociodemographic_usa... HINCY5060_62f78b34 HINCY5060 Household Income $50000-$59999 (2019A) HINCY5060 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
105 carto-do.ags.demographics_sociodemographic_usa... HINCY10025_665c9060 HINCY10025 Household Income $100000-$124999 (2019A) HINCY10025 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
106 carto-do.ags.demographics_sociodemographic_usa... HINCY75100_9d5c69c8 HINCY75100 Household Income $75000-$99999 (2019A) HINCY75100 INTEGER carto-do.ags.demographics_sociodemographic_usa... AVG None False
107 carto-do.ags.demographics_sociodemographic_usa... AGECY6569_b47bae06 AGECY6569 Population age 65-69 (2019A) AGECY6569 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False

108 rows × 10 columns

We can see there are several variables related to population, so this is the Dataset we are looking for.

1
vdf[vdf['description'].str.contains('pop', case=False, na=False)]
id slug name description column_name db_type dataset_id agg_method variable_group_id starred
22 carto-do.ags.demographics_sociodemographic_usa... EDUCYSHSCH_5c444deb EDUCYSHSCH Pop 25+ 9th-12th grade no diploma (2019A) EDUCYSHSCH INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
23 carto-do.ags.demographics_sociodemographic_usa... EDUCYLTGR9_cbcfcc89 EDUCYLTGR9 Pop 25+ less than 9th grade (2019A) EDUCYLTGR9 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
24 carto-do.ags.demographics_sociodemographic_usa... EDUCYHSCH_b236c803 EDUCYHSCH Pop 25+ HS graduate (2019A) EDUCYHSCH INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
25 carto-do.ags.demographics_sociodemographic_usa... EDUCYGRAD_d0179ccb EDUCYGRAD Pop 25+ graduate or prof school degree (2019A) EDUCYGRAD INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
26 carto-do.ags.demographics_sociodemographic_usa... EDUCYBACH_c2295f79 EDUCYBACH Pop 25+ Bachelors degree (2019A) EDUCYBACH INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
31 carto-do.ags.demographics_sociodemographic_usa... AGECYGT85_b9d8a94d AGECYGT85 Population age 85+ (2019A) AGECYGT85 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
32 carto-do.ags.demographics_sociodemographic_usa... AGECYGT25_433741c7 AGECYGT25 Population Age 25+ (2019A) AGECYGT25 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
33 carto-do.ags.demographics_sociodemographic_usa... AGECYGT15_681a1204 AGECYGT15 Population Age 15+ (2019A) AGECYGT15 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
34 carto-do.ags.demographics_sociodemographic_usa... AGECY8084_b25d4aed AGECY8084 Population age 80-84 (2019A) AGECY8084 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
35 carto-do.ags.demographics_sociodemographic_usa... AGECY7579_15dcf822 AGECY7579 Population age 75-79 (2019A) AGECY7579 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
36 carto-do.ags.demographics_sociodemographic_usa... AGECY7074_6da64674 AGECY7074 Population age 70-74 (2019A) AGECY7074 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
37 carto-do.ags.demographics_sociodemographic_usa... AGECY6064_cc011050 AGECY6064 Population age 60-64 (2019A) AGECY6064 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
38 carto-do.ags.demographics_sociodemographic_usa... AGECY5559_8de3522b AGECY5559 Population age 55-59 (2019A) AGECY5559 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
39 carto-do.ags.demographics_sociodemographic_usa... AGECY5054_f599ec7d AGECY5054 Population age 50-54 (2019A) AGECY5054 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
40 carto-do.ags.demographics_sociodemographic_usa... AGECY4549_2c44040f AGECY4549 Population age 45-49 (2019A) AGECY4549 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
41 carto-do.ags.demographics_sociodemographic_usa... AGECY4044_543eba59 AGECY4044 Population age 40-44 (2019A) AGECY4044 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
42 carto-do.ags.demographics_sociodemographic_usa... AGECY3034_86a81427 AGECY3034 Population age 30-34 (2019A) AGECY3034 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
43 carto-do.ags.demographics_sociodemographic_usa... AGECY2529_5f75fc55 AGECY2529 Population age 25-29 (2019A) AGECY2529 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
44 carto-do.ags.demographics_sociodemographic_usa... AGECY1519_66ed0078 AGECY1519 Population age 15-19 (2019A) AGECY1519 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
45 carto-do.ags.demographics_sociodemographic_usa... AGECY0509_c74a565c AGECY0509 Population age 5-9 (2019A) AGECY0509 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
46 carto-do.ags.demographics_sociodemographic_usa... AGECY0004_bf30e80a AGECY0004 Population age 0-4 (2019A) AGECY0004 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
47 carto-do.ags.demographics_sociodemographic_usa... EDUCYSCOLL_1e8c4828 EDUCYSCOLL Pop 25+ college no diploma (2019A) EDUCYSCOLL INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
49 carto-do.ags.demographics_sociodemographic_usa... AGECY2024_270f4203 AGECY2024 Population age 20-24 (2019A) AGECY2024 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
50 carto-do.ags.demographics_sociodemographic_usa... AGECY1014_1e97be2e AGECY1014 Population age 10-14 (2019A) AGECY1014 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
51 carto-do.ags.demographics_sociodemographic_usa... AGECY3539_fed2aa71 AGECY3539 Population age 35-39 (2019A) AGECY3539 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
52 carto-do.ags.demographics_sociodemographic_usa... EDUCYASSOC_fa1bcf13 EDUCYASSOC Pop 25+ Associate degree (2019A) EDUCYASSOC INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
55 carto-do.ags.demographics_sociodemographic_usa... POPPY_946f4ed6 POPPY Population (2024A) POPPY FLOAT carto-do.ags.demographics_sociodemographic_usa... SUM None False
66 carto-do.ags.demographics_sociodemographic_usa... SEXCYMAL_ca14d4b8 SEXCYMAL Population male (2019A) SEXCYMAL INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
67 carto-do.ags.demographics_sociodemographic_usa... SEXCYFEM_d52acecb SEXCYFEM Population female (2019A) SEXCYFEM INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
75 carto-do.ags.demographics_sociodemographic_usa... POPCYGRPI_147af7a9 POPCYGRPI Institutional Group Quarters Population (2019A) POPCYGRPI INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
76 carto-do.ags.demographics_sociodemographic_usa... POPCYGRP_74c19673 POPCYGRP Population in Group Quarters (2019A) POPCYGRP INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
77 carto-do.ags.demographics_sociodemographic_usa... POPCY_f5800f44 POPCY Population (2019A) POPCY INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
84 carto-do.ags.demographics_sociodemographic_usa... LBFCYUNEM_1e711de4 LBFCYUNEM Pop 16+ civilian unemployed (2019A) LBFCYUNEM INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
85 carto-do.ags.demographics_sociodemographic_usa... LBFCYNLF_c4c98350 LBFCYNLF Pop 16+ not in labor force (2019A) LBFCYNLF INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
92 carto-do.ags.demographics_sociodemographic_usa... HISCYHISP_f3b3a31e HISCYHISP Population Hispanic (2019A) HISCYHISP INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
96 carto-do.ags.demographics_sociodemographic_usa... LBFCYLBF_59ce7ab0 LBFCYLBF Population In Labor Force (2019A) LBFCYLBF INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
97 carto-do.ags.demographics_sociodemographic_usa... LBFCYARM_8c06223a LBFCYARM Pop 16+ in Armed Forces (2019A) LBFCYARM INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
99 carto-do.ags.demographics_sociodemographic_usa... LBFCYPOP16_53fa921c LBFCYPOP16 Population Age 16+ (2019A) LBFCYPOP16 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
100 carto-do.ags.demographics_sociodemographic_usa... LBFCYEMPL_c9c22a0 LBFCYEMPL Pop 16+ civilian employed (2019A) LBFCYEMPL INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False
107 carto-do.ags.demographics_sociodemographic_usa... AGECY6569_b47bae06 AGECY6569 Population age 65-69 (2019A) AGECY6569 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM None False

We can follow the very same process to discover financial datasets, let’s see how it works by first listing the geographies available for the category financial in the US:

1
Catalog().country('usa').category('financial').geographies
1
2
3
4
5
6
[<Geography.get('mc_block_9ebc626c')>,
 <Geography.get('mc_blockgroup_c4b8da4c')>,
 <Geography.get('mc_county_31cde2d')>,
 <Geography.get('mc_state_cc31b9d1')>,
 <Geography.get('mc_tract_3704a85c')>,
 <Geography.get('mc_zipcode_263079e3')>]

We can clearly identify a geography at the blockgroup resolution, provided by Mastercard:

1
2
from cartoframes.data.observatory import Geography
Geography.get('mc_blockgroup_c4b8da4c').to_dict()
1
2
3
4
5
6
7
8
9
10
11
12
{'id': 'carto-do.mastercard.geography_usa_blockgroup_2019',
 'slug': 'mc_blockgroup_c4b8da4c',
 'name': 'USA Census Block Groups',
 'description': None,
 'country_id': 'usa',
 'provider_id': 'mastercard',
 'provider_name': 'Mastercard',
 'lang': 'eng',
 'geom_type': 'MULTIPOLYGON',
 'update_frequency': None,
 'version': '2019',
 'is_public_data': False}

Now we can list the available datasets provided by Mastercard for the US Census blockgroups spatial resolution:

1
Catalog().country('usa').category('financial').geography('mc_blockgroup_c4b8da4c').datasets.to_dataframe()
id slug name description country_id geography_id geography_name geography_description category_id category_name provider_id provider_name data_source_id lang temporal_aggregation time_coverage update_frequency version is_public_data
0 carto-do.mastercard.financial_mrli_usa_blockgr... mc_mrli_35402a9d MRLI Data for Census Block Groups MRLI scores validate, evaluate and benchmark t... usa carto-do.mastercard.geography_usa_blockgroup_2019 USA Census Block Groups None financial Financial mastercard Mastercard mrli eng monthly None monthly 2019 False

Let’s finally inspect the variables available in the dataset:

1
Dataset.get('mc_mrli_35402a9d').variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[<Variable.get('transactions_st_d22b3489')> #'Same as transactions_score, but only comparing ran...',
 <Variable.get('region_id_3c7d0d92')> #'Region identifier (construction varies depending o...',
 <Variable.get('category_8c84b3a7')> #'Industry/sector categories (Total Retail, Retail e...',
 <Variable.get('month_57cd6f80')> #'Name of the month the data refers to',
 <Variable.get('region_type_d875e9e7')> #'Administrative boundary type (block, block group, ...',
 <Variable.get('stability_state_8af6b92')> #'Same as stability_score, but only comparing rankin...',
 <Variable.get('sales_score_49d02f1e')> #'Rank based on the average monthly sales for the pr...',
 <Variable.get('stability_score_6756cb72')> #'Rank based on the change in merchants between the ...',
 <Variable.get('ticket_size_sta_3bfd5114')> #'Same as ticket_size_score, but only comparing rank...',
 <Variable.get('sales_metro_sco_e088134d')> #'Same as sales_score, but only comparing ranking wi...',
 <Variable.get('transactions_me_628f6065')> #'Same as transactions_score, but only comparing ran...',
 <Variable.get('growth_score_68b3f9ac')> #'Rank based on the percent change in sales between ...',
 <Variable.get('ticket_size_met_8b5905f8')> #'Same as ticket_size_score, but only comparing rank...',
 <Variable.get('ticket_size_sco_21f7820a')> #'Rank based on the average monthly sales for the pr...',
 <Variable.get('growth_state_sc_11870b1c')> #'Same as growth_score, but only comparing ranking w...',
 <Variable.get('stability_metro_b80b3f7e')> #'Same as stability_score, but only comparing rankin...',
 <Variable.get('growth_metro_sc_a1235ff0')> #'Same as growth_score, but only comparing ranking w...',
 <Variable.get('sales_state_sco_502c47a1')> #'Same as sales_score, but only comparing ranking wi...',
 <Variable.get('transactions_sc_ee976f1e')> #'Rank based on the average number of transactions f...']

Dataset and variables metadata

The Data Observatory catalog is not only a repository of curated spatial datasets, it also contains valuable information that helps on understanding better the underlying data for every dataset, so you can take an informed decision on what data best fits your problem.

Some of the augmented metadata you can find for each dataset in the catalog is:

  • head and tail methods to get a glimpse of the actual data. This helps you to understand the available columns, data types, etc. To start modelling your problem right away.
  • geom_coverage to visualize on a map the geographical coverage of the data in the Dataset.
  • counts, fields_by_type and a full describe method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.

You don’t need a subscription to a dataset to be able to query the augmented metadata, it’s just publicly available for anyone exploring the Data Observatory catalog.

Let’s overview some of that information, starting by getting a glimpse of the ten first or last rows of the actual data of the dataset:

1
2
from cartoframes.data.observatory import Dataset
dataset = Dataset.get('ags_sociodemogr_e92b1637')
1
dataset.head()
DWLCY HHDCY POPCY VPHCY1 AGECYMED HHDCYFAM HOOEXMED HUSEXAPT LBFCYARM LBFCYLBF ... MARCYDIVOR MARCYNEVER MARCYWIDOW RCHCYAMNHS RCHCYASNHS RCHCYBLNHS RCHCYHANHS RCHCYMUNHS RCHCYOTNHS RCHCYWHNHS
0 5 5 6 0 64.00 1 63749 0 0 0 ... 0 0 0 0 0 0 0 0 0 6
1 2 2 5 1 36.50 2 124999 0 0 2 ... 0 1 0 0 0 3 0 0 0 2
2 0 0 0 0 0.00 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 21 11 22 4 64.00 6 74999 0 0 10 ... 4 13 2 0 0 22 0 0 0 0
4 0 0 959 0 18.91 0 0 0 0 378 ... 0 959 0 5 53 230 0 25 0 609
5 0 0 0 0 0.00 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0.00 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0.00 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0.00 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0.00 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

10 rows × 101 columns

Alternatively, you can get the last ten ones with dataset.tail()

An overview of the coverage of the dataset

1
dataset.geom_coverage()

Some stats about the dataset:

1
dataset.counts()
1
2
3
4
5
rows                    217182
cells                 22369746
null_cells                   0
null_cells_percent           0
dtype: int64
1
dataset.fields_by_type()
1
2
3
4
float       4
string      1
integer    96
dtype: int64
1
dataset.describe()
HINCYMED65 HINCYMED55 HINCYMED45 HINCYMED35 HINCYMED25 HINCYMED24 HINCYGT200 HINCY6075 HINCY4550 HINCY4045 ... DWLCY LBFCYPOP16 LBFCYEMPL INCCYPCAP RNTEXMED HINCY3035 HINCY5060 HINCY10025 HINCY75100 AGECY6569
avg 6.195559e+04 7.513449e+04 8.297294e+04 7.907689e+04 6.610137e+04 4.765168e+04 4.236225e+01 5.938193e+01 2.406235e+01 2.483668e+01 ... 6.420374e+02 1.218212e+03 7.402907e+02 3.451758e+04 9.315027e+02 2.416786e+01 4.542230e+01 4.876603e+01 8.272891e+01 8.051784e+01
max 3.500000e+05 3.500000e+05 3.500000e+05 3.500000e+05 3.500000e+05 3.500000e+05 4.812000e+03 3.081000e+03 9.530000e+02 1.293000e+03 ... 2.800700e+04 4.707100e+04 3.202300e+04 2.898428e+06 3.999000e+03 7.290000e+02 1.981000e+03 3.231000e+03 4.432000e+03 7.777000e+03
min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
sum 1.345564e+10 1.631786e+10 1.802023e+10 1.717408e+10 1.435603e+10 1.034909e+10 9.200319e+06 1.289669e+07 5.225909e+06 5.394080e+06 ... 1.394390e+08 2.645738e+08 1.607778e+08 7.496597e+09 2.023056e+08 5.248825e+06 9.864907e+06 1.059110e+07 1.796723e+07 1.748702e+07
range 3.500000e+05 3.500000e+05 3.500000e+05 3.500000e+05 3.500000e+05 3.500000e+05 4.812000e+03 3.081000e+03 9.530000e+02 1.293000e+03 ... 2.800700e+04 4.707100e+04 3.202300e+04 2.898428e+06 3.999000e+03 7.290000e+02 1.981000e+03 3.231000e+03 4.432000e+03 7.777000e+03
stdev 3.377453e+04 4.102797e+04 4.392996e+04 3.932575e+04 2.741347e+04 2.948443e+04 7.601699e+01 4.940854e+01 2.227745e+01 2.245616e+01 ... 4.051570e+02 8.107703e+02 5.421818e+02 2.302276e+04 4.772473e+02 2.167522e+01 3.882000e+01 4.946218e+01 7.159705e+01 5.888055e+01
q1 3.625000e+04 4.285700e+04 4.785700e+04 4.833300e+04 4.454500e+04 2.625000e+04 0.000000e+00 2.400000e+01 8.000000e+00 8.000000e+00 ... 3.740000e+02 6.930000e+02 3.920000e+02 1.910900e+04 5.520000e+02 7.000000e+00 1.700000e+01 1.500000e+01 3.400000e+01 4.300000e+01
q3 6.228300e+04 7.596200e+04 8.415200e+04 8.030300e+04 6.890600e+04 4.916700e+04 2.600000e+01 5.900000e+01 2.400000e+01 2.500000e+01 ... 6.230000e+02 1.172000e+03 7.150000e+02 3.351600e+04 9.250000e+02 2.400000e+01 4.500000e+01 4.600000e+01 8.000000e+01 7.800000e+01
median 4.937500e+04 5.916700e+04 6.571400e+04 6.375000e+04 5.700000e+04 3.750000e+04 8.000000e+00 4.000000e+01 1.500000e+01 1.600000e+01 ... 4.860000e+02 9.090000e+02 5.350000e+02 2.615000e+04 7.190000e+02 1.500000e+01 3.000000e+01 3.000000e+01 5.600000e+01 5.900000e+01
interquartile_range 2.603300e+04 3.310500e+04 3.629500e+04 3.197000e+04 2.436100e+04 2.291700e+04 2.600000e+01 3.500000e+01 1.600000e+01 1.700000e+01 ... 2.490000e+02 4.790000e+02 3.230000e+02 1.440700e+04 3.730000e+02 1.700000e+01 2.800000e+01 3.100000e+01 4.600000e+01 3.500000e+01

10 rows × 107 columns

Every Dataset instance in the catalog contains other useful metadata:

  • slug: A short ID
  • name and description: Free text attributes
  • country
  • geography: Every dataset is related to a Geography instance
  • category
  • provider
  • data source
  • lang
  • temporal aggregation
  • time coverage
  • update frequency
  • version
  • is_public_data: whether you need a license to use the dataset for enrichment purposes or not
1
dataset.to_dict()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{'id': 'carto-do.ags.demographics_sociodemographic_usa_blockgroup_2015_yearly_2019',
 'slug': 'ags_sociodemogr_e92b1637',
 'name': 'Sociodemographic',
 'description': 'Census and ACS sociodemographic data estimated for the current year and data projected to five years. Projected fields are general aggregates (total population, total households, median age, avg income etc.)',
 'country_id': 'usa',
 'geography_id': 'carto-do-public-data.usa_carto.geography_usa_blockgroup_2015',
 'geography_name': 'Census Block Groups (2015) - shoreline clipped',
 'geography_description': 'Shoreline clipped TIGER/Line boundaries. More info: https://carto.com/blog/tiger-shoreline-clip/',
 'category_id': 'demographics',
 'category_name': 'Demographics',
 'provider_id': 'ags',
 'provider_name': 'Applied Geographic Solutions',
 'data_source_id': 'sociodemographic',
 'lang': 'eng',
 'temporal_aggregation': 'yearly',
 'time_coverage': '[2019-01-01,2020-01-01)',
 'update_frequency': None,
 'version': '2019',
 'is_public_data': False}

There’s also some intersting metadata, for each variable in the dataset:

  • id
  • slug: A short ID
  • name and description
  • column_name: Actual column name in the table that contains the data
  • db_type: SQL type in the database
  • dataset_id
  • agg_method: Aggregation method used
  • temporal aggregation and time coverage

Variables are the most important asset in the catalog and when exploring datasets in the Data Observatory catalog it’s very important that you understand clearly what variables are available to enrich your own data.

For each Variable in each dataset, the Data Observatory provides (as it does with datasets) a set of methods and attributes to understand their underlaying data.

Some of them are:

  • head and tail methods to get a glimpse of the actual data and start modelling your problem right away.
  • counts, quantiles and a full describe method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.
  • an histogram plot with the distribution of the values on each variable.

Let’s overview some of that augmented metadata for the variables in the AGS population dataset.

1
2
3
from cartoframes.data.observatory import Variable
variable = Variable.get('POPPY_946f4ed6')
variable
1
<Variable.get('POPPY_946f4ed6')> #'Population (2024A)'
1
variable.to_dict()
1
2
3
4
5
6
7
8
9
10
{'id': 'carto-do.ags.demographics_sociodemographic_usa_blockgroup_2015_yearly_2019.POPPY',
 'slug': 'POPPY_946f4ed6',
 'name': 'POPPY',
 'description': 'Population (2024A)',
 'column_name': 'POPPY',
 'db_type': 'FLOAT',
 'dataset_id': 'carto-do.ags.demographics_sociodemographic_usa_blockgroup_2015_yearly_2019',
 'agg_method': 'SUM',
 'variable_group_id': None,
 'starred': False}

There’s also some utility methods ot understand the underlying data for each variable:

1
variable.head()
1
2
3
4
5
6
7
8
9
10
11
0     0
1     0
2     8
3     0
4     0
5     0
6     4
7     0
8     2
9    59
dtype: int64
1
variable.counts()
1
2
3
4
5
6
7
8
9
10
11
12
all                 217182.000000
null                     0.000000
zero                   303.000000
extreme               9380.000000
distinct              6947.000000
outliers             27571.000000
null_percent             0.000000
zero_percent             0.139514
extreme_percent          0.043190
distinct_percent         3.198700
outliers_percent         0.126949
dtype: float64
1
variable.quantiles()
1
2
3
4
5
q1                      867
q3                     1490
median                 1149
interquartile_range     623
dtype: int64
1
variable.histogram()
1
<Figure size 1200x700 with 1 Axes>
1
variable.describe()
1
2
3
4
5
6
7
8
9
10
11
avg                    1.564793e+03
max                    7.127400e+04
min                    0.000000e+00
sum                    3.398448e+08
range                  7.127400e+04
stdev                  1.098193e+03
q1                     8.670000e+02
q3                     1.490000e+03
median                 1.149000e+03
interquartile_range    6.230000e+02
dtype: float64

Subscribe to a Dataset in the catalog

Once you have explored the catalog and have detected a dataset with the variables you need for your analysis and the right spatial resolution, you have to look at the is_public_data to know if you can just use it from CARTOframes or you first need to subscribe for a license.

Subscriptions to datasets allow you to use them from CARTOframes to enrich your own data or to download them. See the enrichment guides for more information about this.

Let’s see the dataset and geography in our previous example:

1
dataset = Dataset.get('ags_sociodemogr_e92b1637')
1
dataset.is_public_data
1
False
1
2
from cartoframes.data.observatory import Geography
geography = Geography.get(dataset.geography)
1
geography.is_public_data
1
True

Both dataset and geography are not public data, that means you need a subscription to be able to use them to enrich your own data.

To subscribe to data in the Data Observatory catalog you need a CARTO account with access to Data Observatory

1
2
3
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')
1
dataset.subscribe()
1
HTML(value='\n        <h3>Subscription already purchased</h3>\n        The dataset <b>carto-do.ags.demographic…
1
geography.subscribe()
1
HTML(value='\n        <h3>Subscription already purchased</h3>\n        The geography <b>carto-do-public-data.u…

Licenses to data in the Data Observatory grant you the right to use the data subscribed for the period of one year. Every dataset or geography you want to use to enrich your own data, as lons as they are not public data, require a valid license.

You can check the actual status of your subscriptions directly from the catalog.

1
Catalog().subscriptions()
1
2
Datasets: [<Dataset.get('ags_businesscou_df363a87')>, <Dataset.get('ags_retailpoten_aaf25a8c')>, <Dataset.get('ags_sociodemogr_e92b1637')>, <Dataset.get('ags_crimerisk_e9cfa4d4')>]
Geographies: [<Geography.get('usct_blockgroup_f45b6b49')>, <Geography.get('ags_blockgroup_1c63771c')>]

About nested filters in the Catalog instance

Note that every time you search the catalog you create a new instance of the Catalog class. Alternatively, when applying country, category and geography filters a catalog instance, you can reuse the same instance of the catalog by using the catalog.clean_filters() method.

So for example, if you’ve filtered the catalog this way:

1
2
catalog = Catalog()
catalog.country('usa').category('demographics').datasets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
[<Dataset.get('acs_sociodemogr_b758e778')>,
 <Dataset.get('mbi_population_678f3375')>,
 <Dataset.get('acs_sociodemogr_97b40c3b')>,
 <Dataset.get('acs_sociodemogr_8074b548')>,
 <Dataset.get('acs_sociodemogr_f01e41c7')>,
 <Dataset.get('acs_sociodemogr_197de4f2')>,
 <Dataset.get('acs_sociodemogr_6917107d')>,
 <Dataset.get('acs_sociodemogr_6e7ad464')>,
 <Dataset.get('acs_sociodemogr_31d9e865')>,
 <Dataset.get('acs_sociodemogr_69f1cc12')>,
 <Dataset.get('acs_sociodemogr_d4b2cf03')>,
 <Dataset.get('ags_businesscou_df363a87')>,
 <Dataset.get('mbi_households__109a963')>,
 <Dataset.get('ags_retailpoten_aaf25a8c')>,
 <Dataset.get('acs_sociodemogr_beb27e5f')>,
 <Dataset.get('acs_sociodemogr_20d6ebfc')>,
 <Dataset.get('acs_sociodemogr_57d1db6a')>,
 <Dataset.get('acs_sociodemogr_ced88ad0')>,
 <Dataset.get('acs_sociodemogr_b9dfba46')>,
 <Dataset.get('acs_sociodemogr_2960a7d7')>,
 <Dataset.get('ags_consumerpro_9f337eb8')>,
 <Dataset.get('ags_consumerspe_895a369c')>,
 <Dataset.get('ags_sociodemogr_e92b1637')>,
 <Dataset.get('mbi_households__45067b14')>,
 <Dataset.get('ags_sociodemogr_e128078d')>,
 <Dataset.get('acs_sociodemogr_1b4fe990')>,
 <Dataset.get('acs_sociodemogr_5128f0b6')>,
 <Dataset.get('acs_sociodemogr_4a7136dd')>,
 <Dataset.get('acs_sociodemogr_162ffb')>,
 <Dataset.get('acs_sociodemogr_583e0b8c')>,
 <Dataset.get('acs_sociodemogr_125912aa')>,
 <Dataset.get('acs_sociodemogr_ccf039c0')>,
 <Dataset.get('acs_sociodemogr_869720e6')>,
 <Dataset.get('acs_sociodemogr_700c213c')>,
 <Dataset.get('acs_sociodemogr_e0b33cad')>,
 <Dataset.get('acs_sociodemogr_f77385de')>,
 <Dataset.get('acs_sociodemogr_f9a80dec')>,
 <Dataset.get('acs_sociodemogr_844c94b6')>,
 <Dataset.get('acs_sociodemogr_60e73728')>,
 <Dataset.get('acs_sociodemogr_7bbef143')>,
 <Dataset.get('acs_sociodemogr_2396d534')>,
 <Dataset.get('acs_sociodemogr_fd3ffe5e')>,
 <Dataset.get('acs_sociodemogr_9ed5d625')>,
 <Dataset.get('acs_sociodemogr_858c104e')>,
 <Dataset.get('acs_sociodemogr_cfeb0968')>,
 <Dataset.get('acs_sociodemogr_97c32d1f')>,
 <Dataset.get('acs_sociodemogr_dda43439')>,
 <Dataset.get('acs_sociodemogr_30d1f53')>,
 <Dataset.get('acs_sociodemogr_496a0675')>,
 <Dataset.get('acs_sociodemogr_11fe9c96')>,
 <Dataset.get('acs_sociodemogr_5b9985b0')>,
 <Dataset.get('acs_sociodemogr_40c043db')>,
 <Dataset.get('acs_sociodemogr_aa75afd')>,
 <Dataset.get('acs_sociodemogr_528f7e8a')>,
 <Dataset.get('acs_sociodemogr_18e867ac')>,
 <Dataset.get('acs_sociodemogr_c6414cc6')>,
 <Dataset.get('acs_sociodemogr_8c2655e0')>,
 <Dataset.get('acs_sociodemogr_a0c48b07')>,
 <Dataset.get('acs_sociodemogr_307b9696')>,
 <Dataset.get('acs_sociodemogr_477ca600')>,
 <Dataset.get('acs_sociodemogr_27bb2fe5')>,
 <Dataset.get('acs_sociodemogr_50bc1f73')>,
 <Dataset.get('acs_sociodemogr_c9b54ec9')>,
 <Dataset.get('acs_sociodemogr_2a802e0e')>,
 <Dataset.get('acs_sociodemogr_87197151')>,
 <Dataset.get('acs_sociodemogr_1e1020eb')>,
 <Dataset.get('acs_sociodemogr_9f1552dd')>,
 <Dataset.get('acs_sociodemogr_d5724bfb')>,
 <Dataset.get('acs_sociodemogr_8d5a6f8c')>,
 <Dataset.get('acs_sociodemogr_c73d76aa')>,
 <Dataset.get('mbi_retail_spen_e2c1988e')>,
 <Dataset.get('mbi_retail_spen_14142fb4')>,
 <Dataset.get('acs_sociodemogr_19945dc0')>,
 <Dataset.get('acs_sociodemogr_53f344e6')>,
 <Dataset.get('mbi_population_341ee33b')>,
 <Dataset.get('ags_crimerisk_e9cfa4d4')>,
 <Dataset.get('mbi_households__981be2e8')>,
 <Dataset.get('mbi_purchasing__53ab279d')>,
 <Dataset.get('mbi_purchasing__d7fd187')>,
 <Dataset.get('mbi_consumer_sp_54c4abc3')>,
 <Dataset.get('mbi_sociodemogr_b5516832')>,
 <Dataset.get('mbi_education_20063878')>,
 <Dataset.get('mbi_households__c943a740')>,
 <Dataset.get('mbi_households__d75b838')>,
 <Dataset.get('mbi_population_d3c82409')>,
 <Dataset.get('mbi_education_53d49ab0')>,
 <Dataset.get('mbi_education_5139bb8a')>,
 <Dataset.get('mbi_education_ecd69207')>,
 <Dataset.get('mbi_consumer_sp_b6a3b235')>,
 <Dataset.get('mbi_consumer_sp_9f31484d')>,
 <Dataset.get('mbi_households__1de12da2')>,
 <Dataset.get('mbi_households__b277b08f')>,
 <Dataset.get('mbi_consumer_pr_8e977645')>,
 <Dataset.get('mbi_retail_spen_ab162703')>,
 <Dataset.get('mbi_retail_spen_c31f0ba0')>,
 <Dataset.get('mbi_retail_cent_eab3bd00')>,
 <Dataset.get('mbi_retail_turn_705247a')>,
 <Dataset.get('mbi_purchasing__31cd621')>,
 <Dataset.get('mbi_purchasing__b27dd930')>,
 <Dataset.get('mbi_consumer_pr_31957ef2')>,
 <Dataset.get('mbi_consumer_pr_55b2234f')>,
 <Dataset.get('mbi_consumer_pr_68d1265a')>,
 <Dataset.get('mbi_population_d88d3bc2')>,
 <Dataset.get('mbi_retail_cent_55b1b5b7')>,
 <Dataset.get('mbi_sociodemogr_285eaf93')>,
 <Dataset.get('mbi_sociodemogr_bd619b07')>,
 <Dataset.get('mbi_retail_turn_b8072ccd')>,
 <Dataset.get('mbi_sociodemogr_975ca724')>,
 <Dataset.get('mbi_consumer_sp_9a1ba82')>,
 <Dataset.get('mbi_households__be0ba1d4')>]

And now you want to take the financial datasets for the use, you should:

  1. Create a new instance of the catalog: catalog = Catalog()
  2. Call to catalog.clean_filters() over the existing instance.

Another point to remark is that, altough a recommended way to discover data is nesting filters over a Catalog instance, you don’t need to follow the complete hierarchy (country, category, geography) to list the available datasets.

Alternatively, you can just list all the datasets in the US or list all the datasets for the demographics category, and continue exploring the catalog locally with pandas.

Let’s see an example of that, in which we filter public data for the demographics category world wide:

1
2
df = Catalog().category('demographics').datasets.to_dataframe()
df[df['is_public_data'] == True]
id slug name description country_id geography_id geography_name geography_description category_id category_name provider_id provider_name data_source_id lang temporal_aggregation time_coverage update_frequency version is_public_data
4 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_b758e778 5-yr ACS data at Census Block Groups level (20... The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_b... Census Block Groups (2015) - shoreline clipped Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng 5yrs [2013-01-01,2018-01-01) None 20132017 True
61 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_97b40c3b 1-yr ACS data at States level (2009) The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_s... States (2015) - shoreline clipped Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng yearly [2009-01-01,2010-01-01) None 2009 True
64 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_8074b548 1-yr ACS data at States level (2011) The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_s... States (2015) - shoreline clipped Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng yearly [2011-01-01,2012-01-01) None 2011 True
65 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_f01e41c7 1-yr ACS data at States level (2014) The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_s... States (2015) - shoreline clipped Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng yearly [2014-01-01,2015-01-01) None 2014 True
68 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_197de4f2 1-yr ACS data at States level (2012) The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_s... States (2015) - shoreline clipped Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng yearly [2012-01-01,2013-01-01) None 2012 True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
194 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_d5724bfb 5-yr ACS data at 5-digit Zip Code Tabluation A... The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_z... 5-digit Zip Code Tabluation Areas (2015) - sho... Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng 5yrs [2009-01-01,2014-01-01) None 20092013 True
195 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_8d5a6f8c 5-yr ACS data at 5-digit Zip Code Tabluation A... The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_z... 5-digit Zip Code Tabluation Areas (2015) - sho... Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng 5yrs [2010-01-01,2015-01-01) None 20102014 True
196 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_c73d76aa 5-yr ACS data at 5-digit Zip Code Tabluation A... The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_z... 5-digit Zip Code Tabluation Areas (2015) - sho... Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng 5yrs [2011-01-01,2016-01-01) None 20112015 True
243 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_19945dc0 5-yr ACS data at 5-digit Zip Code Tabluation A... The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_z... 5-digit Zip Code Tabluation Areas (2015) - sho... Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng 5yrs [2012-01-01,2017-01-01) None 20122016 True
244 carto-do-public-data.usa_acs.demographics_soci... acs_sociodemogr_53f344e6 5-yr ACS data at 5-digit Zip Code Tabluation A... The American Community Survey (ACS) is an ongo... usa carto-do-public-data.usa_carto.geography_usa_z... 5-digit Zip Code Tabluation Areas (2015) - sho... Shoreline clipped TIGER/Line boundaries. More ... demographics Demographics usa_acs USA American Community Survey sociodemographics eng 5yrs [2013-01-01,2018-01-01) None 20132017 True

63 rows × 19 columns

Conclusion

In this guide you’ve seen how to explore the Data Observatory catalog to identify variables of datasets that you can use to enrich your own data.

You’ve learned how to:

  • Explore the catalog using nested hierarchical filters.
  • Describe the three main entities in the catalog: Geography, Dataset and their Variables.
  • Taken a look at the data and stats taken from the actual repository, to make a more informed decision on which variables to choose.
  • How to subscribe to the chosen dataset to get a license that grants the right to enrich your own data.

We also recommend checking out the resources below to learn more about the Data Observatory catalog: