CARTOframes

A Python package for integrating CARTO maps, analysis, and data services into data science workflows.

CARTOframes v1.0.1 includes breaking changes from betas and 0.10 version, check the migration guide to learn how to update your code.

CARTOframes API Reference

Introduction

The CARTOframes API is organized into three parts: auth, data, and viz.

Authentication

It is possible to use CARTOframes without having a CARTO account. However, to have access to data enrichment or to discover useful datasets, being a CARTO user offers many advantages. This module is responsible for connecting the user with its CARTO account through given user credentials.

Manage Data

From discovering and enriching data to applying data analyisis and geocoding methods, the CARTOframes API is built with the purpose of managing data without leaving the context of your notebook.

Visualize Data

The viz API is designed to create useful, beautiful and straightforward visualizations. It is both predefined and flexible, giving advanced users the possibility of building specific visualizations, but also offering multiple built-in methods to work faster with a few lines of code.

Auth

Auth namespace contains the class to manage authentication: cartoframes.auth.Credentials. It also includes the utility functions cartoframes.auth.set_default_credentials() and cartoframes.auth.get_default_credentials().

class cartoframes.auth.Credentials(username=None, api_key='default_public', base_url=None, session=None)

Bases: object

Credentials class is used for managing and storing user CARTO credentials. The arguments are listed in order of precedence: Credentials instances are first, key and base_url/username are taken next, and config_file (if given) is taken last. The config file is cartocreds.json by default. If no arguments are passed, then there will be an attempt to retrieve credentials from a previously saved session. One of the above scenarios needs to be met to successfully instantiate a Credentials object.

Parameters
  • username (str, optional) – Username of CARTO account.

  • api_key (str, optional) – API key of user’s CARTO account. If the dataset is public, it can be set to ‘default_public’.

  • base_url (str, optional) – Base URL used for API calls. This is usually of the form https://johnsmith.carto.com/ for user johnsmith. On premises installations (and others) have a different URL pattern.

  • session (requests.Session, optional) – requests session. See requests documentation for more information.

Raises

ValueError – if not available username or base_url are found.

Example

>>> creds = Credentials(username='johnsmith', api_key='abcdefg')
property api_key

Credentials api_key

property username

Credentials username

property base_url

Credentials base_url

property session

Credentials session

property user_id

Credentials user ID

classmethod from_file(config_file=None, session=None)

Retrives credentials from a file. Defaults to the user config directory.

Parameters
  • config_file (str, optional) – Location where credentials are loaded from. If no argument is provided, it will be loaded from the default location.

  • session (requests.Session, optional) – requests session. See requests documentation for more information.

Returns

A (Credentials) instance.

Example

>>> creds = Credentials.from_file('creds.json')
classmethod from_credentials(credentials)

Retrives credentials from another Credentials object.

Parameters

credentials (Credentials) –

Returns

A (Credentials) instance.

Raises

ValueError – if the credentials argument is not an instance of Credentials.

Example

>>> creds = Credentials.from_credentials(orig_creds)
save(config_file=None)

Saves current user credentials to user directory.

Parameters

config_file (str, optional) – Location where credentials are to be stored. If no argument is provided, it will be send to the default location.

Example

>>> credentials = Credentials(username='johnsmith', api_key='abcdefg')
>>> credentials.save('creds.json')
User credentials for `johnsmith` were successfully saved to `creds.json`
classmethod delete(config_file=None)

Deletes the credentials file specified in config_file. If no file is specified, it deletes the default user credential file (cartocreds.json)

Parameters

config_file (str) – Path to configuration file. Defaults to delete the user default location if None.

Tip

To see if there is a default user credential file stored, do the following:

>>> print(Credentials.from_file())
Credentials(username='johnsmith', api_key='abcdefg',
base_url='https://johnsmith.carto.com/')
get_do_credentials()

Returns the Data Observatory v2 credentials

cartoframes.auth.get_default_credentials()

Retrieve the default credentials if previously set with cartoframes.auth.set_default_credentials() in Python session.

Example

>>> set_default_credentials('creds.json')
>>> current_creds = get_default_credentials()
Returns

Default credentials previously set in current Python session. None will be returned if default credentials were not previously set.

Return type

cartoframes.auth.Credentials

cartoframes.auth.set_default_credentials(first=None, second=None, credentials=None, filepath=None, username=None, base_url=None, api_key=None, session=None)

Set default credentials for all operations that require authentication against a CARTO account.

Parameters
  • credentials (Credentials, optional) – A Credentials instance can be used in place of a username | base_url/api_key combination.

  • base_url (str, optional) – Base URL of CARTO user account. Cloud-based accounts should use the form https://{username}.carto.com (e.g., https://johnsmith.carto.com for user johnsmith) whether on a personal or multi-user account. On-premises installation users should ask their admin.

  • api_key (str, optional) – CARTO API key. Depending on the application, this can be a project API key or the account master API key.

  • username (str, optional) – CARTO user name of the account.

  • filepath (str, optional) – Location where credentials are stored as a JSON file.

  • session (requests.Session, optional) – requests session. See requests documentation for more information.

Note

The recommended way to authenticate in CARTOframes is to read user credentials from a JSON file that is structured like this:

{
    "username": "your user name",
    "api_key": "your api key",
    "base_url": "https://your_username.carto.com"
}

Note that the ``base_url`` will be different for on premises installations.

By using the cartoframes.auth.Credentials.save() method, this file will automatically be created for you in a default location depending on your operating system. A custom location can also be specified as an argument to the method.

This file can then be read in the following ways:

>>> set_default_credentials('./carto-project-credentials.json')

Example

Create Credentials from a username, api_key pair.

>>> set_default_credentials('johnsmith', 'your api key')

Create credentials from only a username (only works with public datasets and those marked public with link). If the API key is not provided, the public API key default_public is used. With this setting, only read-only operations can occur (e.g., no publishing of maps, reading data from the Data Observatory, or creating new hosted datasets).

>>> set_default_credentials('johnsmith')

From a pair base_url, api_key.

>>> set_default_credentials('https://johnsmith.carto.com', 'your api key')

From a base_url (for public datasets). The API key default_public is used by default.

>>> set_default_credentials('https://johnsmith.carto.com')

From a Credentials class.

>>> credentials = Credentials(
...     base_url='https://johnsmith.carto.com',
...     api_key='your api key')
>>> set_default_credentials(credentials)
cartoframes.auth.unset_default_credentials()

Unset the default credentials if previously set with cartoframes.auth.set_default_credentials() in Python session.

Example

>>> set_default_credentials('creds.json')
>>> unset_default_credentials()

I/O functions

cartoframes.read_carto(source, credentials=None, limit=None, retry_times=3, schema=None, index_col=None, decode_geom=True)

Read a table or a SQL query from the CARTO account.

Parameters
  • source (str) – table name or SQL query.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • limit (int, optional) – The number of rows to download. Default is to download all rows.

  • retry_times (int, optional) – Number of time to retry the download in case it fails. Default is 3.

  • schema (str, optional) – prefix of the table. By default, it gets the current_schema() using the credentials.

  • index_col (str, optional) – name of the column to be loaded as index. It can be used also to set the index name.

  • decode_geom (bool, optional) – convert the “the_geom” column into a valid geometry column.

Returns

geopandas.GeoDataFrame

Raises

ValueError – if the source is not a valid table_name or SQL query.

cartoframes.to_carto(dataframe, table_name, credentials=None, if_exists='fail', geom_col=None, index=False, index_label=None, cartodbfy=True, log_enabled=True)

Upload a DataFrame to CARTO.

Parameters
  • dataframe (pandas.DataFrame, geopandas.GeoDataFrame`) – data to be uploaded.

  • table_name (str) – name of the table to upload the data.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • if_exists (str, optional) – ‘fail’, ‘replace’, ‘append’. Default is ‘fail’.

  • geom_col (str, optional) – name of the geometry column of the dataframe.

  • index (bool, optional) – write the index in the table. Default is False.

  • index_label (str, optional) – name of the index column in the table. By default it uses the name of the index from the dataframe.

  • cartodbfy (bool, optional) – convert the table to CARTO format. Default True. More info here <https://carto.com/developers/sql-api/guides/creating-tables/#create-tables>.

Raises

ValueError – if the dataframe or table name provided are wrong or the if_exists param is not valid.

cartoframes.has_table(table_name, credentials=None, schema=None)

Check if the table exists in the CARTO account.

Parameters
  • table_name (str) – name of the table.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • schema (str, optional) – prefix of the table. By default, it gets the current_schema() using the credentials.

Returns

True if the table exists, False otherwise.

Return type

bool

Raises

ValueError – if the table name is not a valid table name.

cartoframes.delete_table(table_name, credentials=None, log_enabled=True)

Delete the table from the CARTO account.

Parameters
  • table_name (str) – name of the table.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

Raises

ValueError – if the table name is not a valid table name.

cartoframes.rename_table(table_name, new_table_name, credentials=None, if_exists='fail', log_enabled=True)

Rename a table in the CARTO account.

Parameters
  • table_name (str) – name of the table.

  • new_table_name (str) – new name for the table.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • if_exists (str, optional) – ‘fail’, ‘replace’. Default is ‘fail’.

Raises

ValueError – if the table names provided are wrong or the if_exists param is not valid.

cartoframes.copy_table(table_name, new_table_name, credentials=None, if_exists='fail', log_enabled=True)

Copy a table into a new table in the CARTO account.

Parameters
  • table_name (str) – name of the original table.

  • new_table_name (str, optional) – name for the new table.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • if_exists (str, optional) – ‘fail’, ‘replace’, ‘append’. Default is ‘fail’.

Raises

ValueError – if the table names provided are wrong or the if_exists param is not valid.

cartoframes.create_table_from_query(query, new_table_name, credentials=None, if_exists='fail', log_enabled=True)

Create a new table from an SQL query in the CARTO account.

Parameters
  • query (str) – SQL query

  • new_table_name (str) – name for the new table.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • if_exists (str, optional) – ‘fail’, ‘replace’, ‘append’. Default is ‘fail’.

Raises

ValueError – if the query or table name provided is wrong or the if_exists param is not valid.

cartoframes.describe_table(table_name, credentials=None, schema=None)

Describe the table in the CARTO account.

Parameters
  • table_name (str) – name of the table.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

  • schema (str, optional) – prefix of the table. By default, it gets the current_schema() using the credentials.

Returns

A dict with the privacy, num_rows and geom_type of the table.

Raises

ValueError – if the table name is not a valid table name.

cartoframes.update_privacy_table(table_name, privacy, credentials=None, log_enabled=True)

Update the table information in the CARTO account.

Parameters
  • table_name (str) – name of the table.

  • privacy (str) – privacy of the table: ‘private’, ‘public’, ‘link’.

  • credentials (Credentials, optional) – instance of Credentials (username, api_key, etc).

Raises

ValueError – if the table name is wrong or the privacy name is not ‘private’, ‘public’, or ‘link’.

Data Observatory

With CARTOframes it is possible to enrich your data by using our Data Observatory Catalog through the enrichment methods.

Important: The new Data Observatory 2.0 is accessible to selected CARTO Enterprise users in a private beta. We’re still open to more beta testers, so if you’re interested, please get in touch.

class cartoframes.data.observatory.Catalog

Bases: object

This class represents the Data Observatory metadata Catalog.

The catalog contains metadata that helps to discover and understand the data available in the Data Observatory for Dataset.download and Enrichment purposes.

You can get more information about the Data Observatory catalog from the CARTO website and in your CARTO user account dashboard.

The Catalog has three main purposes:
  • Explore and discover the datasets available in the repository (both public and premium datasets).

  • Subscribe to some premium datasets and manage your datasets licenses.

  • Download data and use your licensed datasets and variables to enrich your own data by means of the Enrichment functions.

The Catalog is public and can be explored without a CARTO account. Once you discover a Dataset of interest and want to acquire a license to use it, you’ll need a CARTO account to subscribe to it, by means of the Dataset.subscribe or Geography.subscribe functions.

The Catalog is composed of three main entities:
  • Dataset: It is the main CatalogEntity. It contains metadata of the actual data you can use to Dataset.download or for Enrichment purposes.

  • Geography: Datasets in the Data Observatory are aggregated by different geographic boundaries. The Geography entity contains metadata to understand the boundaries of a Dataset. It’s used for enrichment and you can also Geography.download the underlying data.

  • Variable: Variables contain metadata about the columns available in each dataset for enrichment. Let’s say you explore a dataset with demographic data for the whole US at the Census tract level. The variables give you information about the actual columns you have available, such as: total_population, total_males, etc. On the other hand, you can use lists of Variable instances, Variable.id, or Variable.slug to enrich your own data.

Every Dataset is related to a Geography. You can have for example, demographics data at the Census tract, block groups or blocks levels.

When subscribing to a premium dataset, you should subscribe to both the Dataset.subscribe and the Geography.subscribe to be able to access both tables to enrich your own data.

The two main entities of the Catalog (Dataset and Geography) are related to other entities, that are useful for a hierarchical categorization and discovery of available data in the Data Observatory:

  • Category: Groups datasets of the same topic, for example, demographics, financial, etc.

  • Country: Groups datasets available by country

  • Provider: Gives you information about the provider of the source data

You can just list all the grouping entities. Take into account this is not the preferred way to discover the catalog metadata, since there can be thousands of entities on it:

>>> Category.get_all()
[<Category.get('demographics')>, ...]
>>> Country.get_all()
[<Country.get('usa')>, ...]
>>> Provider.get_all()
[<Provider.get('mrli')>, ...]

Or you can get them by ID:

>>> Category.get('demographics')
<Category.get('demographics')>
>>> Country.get('usa')
<Country.get('usa')>
>>> Provider.get('mrli')
<Provider.get('mrli')>

Examples

The preferred way of discover the available datasets in the Catalog is through nested filters

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').datasets
[<Dataset.get('acs_sociodemogr_b758e778')>, ...]

You can include the geography as part of the nested filter like this:

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets

If a filter is already applied to a Catalog instance and you want to do a new hierarchical search, clear the previous filters with the Catalog().clear_filters() method:

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
>>> catalog.clear_filters()
>>> catalog.country('esp').category('demographics').datasets

Otherwise the filters accumulate and you’ll get unexpected results.

During the discovery process, it’s useful to understand the related metadata to a given Geography or Dataset. A useful way of reading or filtering by metadata values consists on converting the entities to a pandas DataFrame:

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets.to_dataframe()

For each dataset in the Catalog, you can explore its variables, get a summary of its stats, etc.

>>> dataset = Dataset.get('od_acs_13345497')
>>> dataset.variables
[<Variable.get('dwellings_2_uni_fb8f6cfb')> #'Two-family (two unit) dwellings', ...]

See the Catalog guides and examples in our public documentation website for more information.

property countries

Get all the countries with datasets available in the Catalog.

Returns

CatalogList

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

property categories

Get all the categories in the Catalog.

Returns

CatalogList

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

property datasets

Get all the datasets in the Catalog.

Returns

CatalogList

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

property geographies

Get all the geographies in the Catalog.

Returns

CatalogList

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

country(country_id)

Add a country filter to the current Catalog instance.

Parameters

country_id (str) – Id value of the country to be used for filtering the Catalog.

Returns

Catalog

category(category_id)

Add a category filter to the current Catalog instance.

Parameters

category_id (str) – Id value of the category to be used for filtering the Catalog.

Returns

Catalog

geography(geography_id)

Add a geography filter to the current Catalog instance.

Parameters

geography_id (str) – Id or slug value of the geography to be used for filtering the Catalog

Returns

Catalog

provider(provider_id)

Add a provider filter to the current Catalog instance

Parameters

provider_id (str) – Id value of the provider to be used for filtering the Catalog.

Returns

CatalogList

clear_filters()

Remove the current filters from this Catalog instance.

subscriptions(credentials=None)

Get all the subscriptions in the Catalog. You’ll get all the Dataset or Geography instances you have previously subscribed to.

Parameters

credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

Returns

Subscriptions

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

datasets_filter(filter_dataset)

Get all the datasets in the Catalog filtered :returns: Dataset

class cartoframes.data.observatory.Category(data)

Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity

This class represents a Category in the Catalog. Catalog datasets (Dataset class) are grouped by categories, so you can filter available datasets and geographies that belong (or are related) to a given Category.

Examples

List the available categories in the Catalog

>>> catalog = Catalog()
>>> categories = catalog.categories

Get a Category from the Catalog given its ID

>>> category = Category.get('demographics')
property datasets

Get the list of Dataset related to this category.

Returns

CatalogList List of Dataset instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

Examples

Get all the datasets Dataset available in the catalog for a Category instance

>>> category = Category.get('demographics')
>>> datasets = category.datasets

Same example as above but using nested filters:

>>> catalog = Catalog()
>>> datasets = catalog.category('demographics').datasets

You can perform other operations with a CatalogList:

>>> catalog = Catalog()
>>> datasets = catalog.category('demographics').datasets
>>> # convert the list of datasets into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = datasets.to_dataframe()
>>> # get a dataset by ID or slug
>>> dataset = Dataset.get(A_VALID_ID_OR_SLUG)
property geographies

Get the list of Geography related to this category.

Returns

CatalogList List of Geography instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

Examples

Get all the geographies Dataset available in the catalog for a Category instance

>>> category = Category.get('demographics')
>>> geographies = category.geographies

Same example as above but using nested filters:

>>> catalog = Catalog()
>>> geographies = catalog.category('demographics').geographies

You can perform these other operations with a CatalogList:

>>> catalog = Catalog()
>>> geographies = catalog.category('demographics').geographies
>>> # convert the list of datasets into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = geographies.to_dataframe()
>>> # get a geography by ID or slug
>>> dataset = Geography.get(A_VALID_ID_OR_SLUG)
property name

Name of this category instance.

class cartoframes.data.observatory.Country(data)

Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity

This class represents a Country in the Catalog. Catalog datasets (Dataset class) belong to a country, so you can filter available datasets and geographies that belong (or are related) to a given Country.

Examples

List the available countries in the Catalog

>>> catalog = Catalog()
>>> countries = catalog.countries

Get a Country from the Catalog given its ID

>>> # country ID is a lowercase ISO Alpha 3 Code
>>> country = Country.get('usa')
property datasets

Get the list of Dataset covering data for this country.

Returns

CatalogList List of Dataset instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

Examples

Get all the datasets Dataset available in the catalog for a Country instance

>>> country = Country.get('usa')
>>> datasets = country.datasets

Same example as above but using nested filters:

>>> catalog = Catalog()
>>> datasets = catalog.country('usa').datasets

You can perform these other operations with a CatalogList:

>>> datasets = catalog.country('usa').datasets
>>> # convert the list of datasets into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = datasets.to_dataframe()
>>> # get a dataset by ID or slug
>>> dataset = Dataset.get(A_VALID_ID_OR_SLUG)
property geographies

Get the list of Geography covering data for this country.

Returns

CatalogList List of Geography instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no geographies are found.

Examples

Get all the geographies Geography available in the catalog for a Country instance

>>> country = Country.get('usa')
>>> geographies = country.geographies

Same example as above but using nested filters:

>>> catalog = Catalog()
>>> geographies = catalog.country('usa').geographies

You can perform these other operations with a CatalogList:

>>> geographies = catalog.country('usa').geographies
>>> # convert the list of geographies into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = geographies.to_dataframe()
>>> # get a geography by ID or slug
>>> geography = Geography.get(A_VALID_ID_OR_SLUG)
property categories

Get the list of Category that are assigned to Dataset that cover data for this country.

Returns

CatalogList List of Category instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

Examples

Get all the categories Category available in the catalog for a Country instance

>>> country = Country.get('usa')
>>> categories = country.categories

Same example as above but using nested filters:

>>> catalog = Catalog()
>>> category = catalog.country('usa').categories
class cartoframes.data.observatory.Dataset(data)

Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity

A Dataset represents the metadata of a particular dataset in the catalog.

If you have Data Observatory enabled in your CARTO account you can:

  • Use any public dataset to enrich your data with the variables in it by means of the Enrichment functions.

  • Subscribe (Dataset.subscribe) to any premium dataset to get a license that grants you the right to enrich your data with the variables (Variable) in it.

See the enrichment guides for more information about datasets, variables and enrichment functions.

The metadata of a dataset allows you to understand the underlying data, from variables (the actual columns in the dataset, data types, etc.), to a description of the provider, source, country, geography available, etc.

See the attributes reference in this class to understand the metadata available for each dataset in the catalog.

Examples:

There are many different ways to explore the available datasets in the catalog.

You can just list all the available datasets:

>>> catalog = Catalog()
>>> datasets = catalog.datasets

Since the catalog contains thousands of datasets, you can convert the list of datasets to a pandas DataFrame for further filtering:

>>> catalog = Catalog()
>>> dataframe = catalog.datasets.to_dataframe()

The catalog supports nested filters for a hierarchical exploration. This way you could list the datasets available for different hierarchies: country, provider, category, geography, or a combination of them.

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
property variables

Get the list of Variable that corresponds to this dataset. Variables are used in the Enrichment functions to augment your local DataFrames with columns from a Dataset in the Data Observatory.

Returns

CatalogList List of Variable instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog.

property variables_groups

Get the list of VariableGroup related to this dataset.

Returns

CatalogList List of VariableGroup instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog.

property name

Name of this dataset.

property description

Description of this dataset.

property provider

Id of the Provider of this dataset.

property provider_name

Name of the Provider of this dataset.

property category

Get the Category ID assigned to this dataset.sets

property category_name

Name of the Category assigned to this dataset.

property data_source

Id of the data source of this dataset.

property country

ISO 3166-1 alpha-3 code of the Country of this dataset. More info in: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3.

property language

ISO 639-3 code of the language that corresponds to the data of this dataset. More info in: https://en.wikipedia.org/wiki/ISO_639-3.

property geography

Get the Geography ID associated to this dataset.

property geography_name

Get the name of the Geography associated to this dataset.

property geography_description

Description of the Geography associated to this dataset.

property temporal_aggregation

Time amount in which data is aggregated in this dataset.

This is a free text field in this form: seconds, daily, hourly, monthly, yearly, etc.

property time_coverage

Time range that covers the data of this dataset.

Returns

List of str

Example: [2015-01-01,2016-01-01)

property update_frequency

Frequency in which the dataset is updated.

Returns

str

Example: monthly, yearly, etc.

property version

Internal version info of this dataset.

Returns

str

property is_public_data

Allows to check if the content of this dataset can be accessed with public credentials or if it is a premium dataset that needs a subscription.

Returns

  • True if the dataset is public

  • False if the dataset is premium

    (it requires to Dataset.subscribe)

Return type

A boolean value

property summary

JSON object with extra metadata that summarizes different properties of the dataset content.

head()

Returns a sample of the 10 first rows of the dataset data.

If a dataset has fewer than 10 rows (e.g., zip codes of small countries), this method will return None

Returns

pandas.DataFrame

tail()

“Returns the last ten rows of the dataset”

If a dataset has fewer than 10 rows (e.g., zip codes of small countries), this method will return None

Returns

pandas.DataFrame

counts()

Returns a summary of different counts over the actual dataset data.

Returns

pandas.Series

Example

# rows:         number of rows in the dataset
# cells:        number of cells in the dataset (rows * columns)
# null_cells:   number of cells with null value in the dataset
# null_cells_percent:   percent of cells with null value in the dataset
fields_by_type()

Returns a summary of the number of columns per data type in the dataset.

Returns

pandas.Series

Example

# float        number of columns with type float in the dataset
# string       number of columns with type string in the dataset
# integer      number of columns with type integer in the dataset
geom_coverage()

Shows a map to visualize the geographical coverage of the dataset.

Returns

Map

describe()

Shows a summary of the actual stats of the variables (columns) of the dataset. Some of the stats provided per variable are: avg, max, min, sum, range, stdev, q1, q3, median and interquartile_range

Returns

pandas.DataFrame

Example

# avg                    average value
# max                    max value
# min                    min value
# sum                    sum of all values
# range
# stdev                  standard deviation
# q1                     first quantile
# q3                     third quantile
# median                 median value
# interquartile_range
classmethod get_all(filters=None, credentials=None)

Get all the Dataset instances that comply with the indicated filters (or all of them if no filters are passed). If credentials are given, only the datasets granted for those credentials are returned.

Parameters
  • credentials (Credentials, optional) – credentials of CARTO user account. If provided, only datasets granted for those credentials are returned.

  • filters (dict, optional) – Dict containing pairs of dataset properties and its value to be used as filters to query the available datasets. If none is provided, no filters will be applied to the query.

Returns

CatalogList List of Dataset instances.

Raises
  • CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

  • DOError – if DO is not enabled.

to_csv(file_path, credentials=None, limit=None)

Download dataset data as a local csv file. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.

For premium datasets (those with is_public_data set to False), you need a subscription to the dataset. Check the subscription guides for more information.

Parameters
  • file_path (str) – the file path where save the dataset (CSV).

  • credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

  • limit (int, optional) – number of rows to be downloaded.

Raises
  • DOError – if you have not a valid license for the dataset being downloaded, DO is not enabled or there is an issue downloading the data.

  • ValueError – if the credentials argument is not valid.

to_dataframe(credentials=None, limit=None)

Download dataset data as a pandas.DataFrame. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.

For premium datasets (those with is_public_data set to False), you need a subscription to the dataset. Check the subscription guides for more information.

Parameters
  • credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

  • limit (int, optional) – number of rows to be downloaded.

Returns

pandas.DataFrame

Raises
  • DOError – if you have not a valid license for the dataset being downloaded, DO is not enabled or there is an issue downloading the data.

  • ValueError – if the credentials argument is not valid.

subscribe(credentials=None)

Subscribe to a dataset. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.

Datasets with is_public_data set to True do not need a license (i.e., a subscription) to be used. Datasets with is_public_data set to False do need a license (i.e., a subscription) to be used. You’ll get a license to use this dataset depending on the estimated_delivery_days set for this specific dataset.

See subscription_info for more info

Once you subscribe to a dataset, you can download its data by Dataset.to_csv or Dataset.to_dataframe and use the Enrichment functions. See the enrichment guides for more info.

You can check the status of your subscriptions by calling the subscriptions method in the Catalog with your CARTO Credentials.

Parameters

credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

Raises
  • CatalogError – if there’s a problem when connecting to the catalog.

  • DOError – if DO is not enabled.

subscription_info(credentials=None)

Get the subscription information of a Dataset, which includes the license, Terms of Service, rights, price, and estimated time of delivery, among other metadata of interest during the Dataset.subscription process.

Parameters

credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

Returns

SubscriptionInfo SubscriptionInfo instance.

Raises
  • CatalogError – if there’s a problem when connecting to the catalog.

  • DOError – if DO is not enabled.

class cartoframes.data.observatory.Geography(data)

Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity

A Geography represents the metadata of a particular geography dataset in the catalog.

If you have Data Observatory enabled in your CARTO account you can:

  • Use any public geography to enrich your data with the variables in it by means of the Enrichment functions.

  • Subscribe (Geography.subscribe) to any premium geography to get a license that grants you the right to enrich your data with the variables in it.

See the enrichment guides for more information about geographies, variables, and enrichment functions.

The metadata of a geography allows you to understand the underlying data, from variables (the actual columns in the geography, data types, etc.), to a description of the provider, source, country, geography available, etc.

See the attributes reference in this class to understand the metadata available for each geography in the catalog.

Examples

There are many different ways to explore the available geographies in the catalog.

You can just list all the available geographies:

>>> catalog = Catalog()
>>> geographies = catalog.geographies

Since the catalog contains thousands of geographies, you can convert the list of geographies to a pandas DataFrame for further filtering:

>>> catalog = Catalog()
>>> dataframe = catalog.geographies.to_dataframe()

The catalog supports nested filters for a hierarchical exploration. This way you could list the geographies available for different hierarchies: country, provider, category or a combination of them.

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geographies

Usually you use a geography ID as an intermediate filter to get a list of datasets with aggregate data for that geographical resolution

>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
property datasets

Get the list of datasets related to this geography.

Returns

CatalogList List of Dataset instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

property name

Name of this geography.

property description

Description of this geography.

property country

Code (ISO 3166-1 alpha-3) of the country of this geography.

property language

Code (ISO 639-3) of the language that corresponds to the data of this geography.

property provider

Id of the Provider of this geography.

property provider_name

Name of the Provider of this geography.

property geom_coverage

Geographical coverage geometry encoded in WKB.

property geom_type

Info about the type of geometry of this geography.

property update_frequency

Frequency in which the geography data is updated.

Example: monthly, yearly, etc.

property version

Internal version info of this geography.

property is_public_data

Allows to check if the content of this geography can be accessed with public credentials or if it is a premium geography that needs a subscription.

Returns

  • True if the geography is public

  • False if the geography is premium

    (it requires to Geography.subscribe)

Return type

A boolean value

property summary

dict with extra metadata that summarizes different properties of the geography content.

classmethod get_all(filters=None, credentials=None)

Get all the Geography instances that comply with the indicated filters (or all of them if no filters are passed. If credentials are given, only the geographies granted for those credentials are returned.

Parameters
  • credentials (Credentials, optional) – credentials of CARTO user account. If provided, only geographies granted for those credentials are returned.

  • filters (dict, optional) – Dict containing pairs of geography properties and its value to be used as filters to query the available geographies. If none is provided, no filters will be applied to the query.

Returns

CatalogList List of Geography instances.

Raises
  • CatalogError – if there’s a problem when connecting to the catalog or no geographies are found.

  • DOError – if DO is not enabled.

to_csv(file_path, credentials=None, limit=None)

Download geography data as a local csv file. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.

For premium geographies (those with is_public_data set to False), you need a subscription to the geography. Check the subscription guides for more information.

Parameters
  • file_path (str) – the file path where save the dataset (CSV).

  • credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

  • limit (int, optional) – number of rows to be downloaded.

Raises
  • DOError – if you have not a valid license for the geography being downloaded, DO is not enabled or there is an issue downloading the data.

  • ValueError – if the credentials argument is not valid.

to_dataframe(credentials=None, limit=None)

Download geography data as a pandas.DataFrame. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.

For premium geographies (those with is_public_data set to False), you need a subscription to the geography. Check the subscription guides for more information.

Parameters
  • credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

  • limit (int, optional) – number of rows to be downloaded.

Returns

pandas.DataFrame

Raises
  • DOError – if you have not a valid license for the geography being downloaded, DO is not enabled or there is an issue downloading the data.

  • ValueError – if the credentials argument is not valid.

subscribe(credentials=None)

Subscribe to a Geography. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.

Geographies with is_public_data set to True, do not need a license (i.e. a subscription) to be used. Geographies with is_public_data set to False, do need a license (i.e. a subscription) to be used. You’ll get a license to use this geography depending on the estimated_delivery_days set for this specific geography.

See subscription_info for more info

Once you Geography.subscribe to a geography you can download its data by Geography.to_csv or Geography.to_dataframe and use the enrichment functions. See the enrichment guides for more info.

You can check the status of your subscriptions by calling the subscriptions method in the Catalog with your CARTO credentials.

Parameters

credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

Raises
  • CatalogError – if there’s a problem when connecting to the catalog.

  • DOError – if DO is not enabled.

subscription_info(credentials=None)

Get the subscription information of a Geography, which includes the license, TOS, rights, prize and estimated_time_of_delivery, among other metadata of interest during the subscription process.

Parameters

credentials (Credentials, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials) will be used.

Returns

SubscriptionInfo SubscriptionInfo instance.

Raises
  • CatalogError – if there’s a problem when connecting to the catalog.

  • DOError – if DO is not enabled.

class cartoframes.data.observatory.Provider(data)

Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity

This class represents a Provider of datasets and geographies in the Catalog.

Examples

List the available providers in the Catalog in combination with nested filters (categories, countries, etc.)

>>> providers = Provider.get_all()

Get a Provider from the Catalog given its ID

>>> catalog = Catalog()
>>> provider = catalog.provider('mrli')
property datasets

Get the list of datasets related to this provider.

Returns

CatalogList List of Dataset instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

Examples

>>> provider = Provider.get('mrli')
>>> datasets = provider.datasets

Same example as above but using nested filters:

>>> catalog = Catalog()
>>> datasets = catalog.provider('mrli').datasets
property name

Name of this provider.

class cartoframes.data.observatory.Variable(data)

Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity

This class represents a Variable of datasets in the Catalog.

Variables contain column names, description, data type, aggregation method, and some other metadata that is useful to understand the underlying data inside a Dataset

Examples

List the variables of a Dataset in combination with nested filters (categories, countries, etc.)

>>> dataset = Dataset.get('mbi_retail_turn_705247a')
>>> dataset.variables
[<Variable.get('RT_CI_95050c10')> #'Retail Turnover: index (country eq.100)', ...]
property datasets

Get the list of datasets related to this variable.

Returns

CatalogList List of Dataset instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.

property name

Name of this variable.

property description

Description of this variable.

property column_name

Column name of the actual table related to the variable in the Dataset.

property db_type

Type in the database.

Returns

str

Examples: INTEGER, STRING, FLOAT, GEOGRAPHY, JSON, BOOL, etc.

property dataset

ID of the Dataset to which this variable belongs.

property agg_method

Text representing a description of the aggregation method used to compute the values in this Variable

property variable_group

If any, ID of the variable group to which this variable belongs.

property starred

Boolean indicating whether this variable is a starred one or not. Internal usage only

property summary

JSON object with extra metadata that summarizes different properties of this variable.

property project_name
property schema_name
property dataset_name
describe()

Shows a summary of the actual stats of the variable (column) of the dataset. Some of the stats provided per variable are: avg, max, min, sum, range, stdev, q1, q3, median and interquartile_range

Example

# avg                    average value
# max                    max value
# min                    min value
# sum                    sum of all values
# range
# stdev                  standard deviation
# q1                     first quantile
# q3                     third quantile
# median                 median value
# interquartile_range
head()

Returns a sample of the 10 first values of the variable data.

For the cases of datasets with a content fewer than 10 rows (i.e. zip codes of small countries), this method won’t return anything

tail()

Returns a sample of the 10 last values of the variable data.

For the cases of datasets with a content fewer than 10 rows (i.e. zip codes of small countries), this method won’t return anything

counts()

Returns a summary of different counts over the actual variable values.

Example

# all               total number of values
# null              total number of null values
# zero              number of zero-valued entries
# extreme           number of values 3stdev outside the interquartile range
# distinct          number of distinct (unique) entries
# outliers          number of outliers (outside 1.5stdev the interquartile range
# zero_percent      percent of values that are zero
# distinct_percent  percent of values that are distinct
quantiles()

Returns the quantiles of the variable data.

top_values()

Returns information about the top values of the variable data.

histogram()

Plots an histogram with the variable data.

class cartoframes.data.observatory.Enrichment(credentials=None)

Bases: cartoframes.data.observatory.enrichment.enrichment_service.EnrichmentService

This is the main class to enrich your own data with data from the Data Observatory

To be able to use the Enrichment functions you need A CARTO account with Data Observatory v2 enabled. Contact us at support@carto.com for more information about this.

Please, see the Catalog discovery and subscription guides, to understand how to explore the Data Observatory repository and subscribe to premium datasets to be used in your enrichment workflows.

Parameters

credentials (Credentials, optional) – credentials of user account. If not provided, a default credentials (if set with set_default_credentials) will attempted to be used.

enrich_points(dataframe, variables, geom_col=None, filters={})

Enrich your points DataFrame with columns (Variable) from one or more Dataset in the Data Observatory, intersecting the points in the source DataFrame with the geographies in the Data Observatory.

Extra columns as area and population will be provided in the resulting DataFrame for normalization purposes.

Parameters
  • (pandas.DataFrame, geopandas.GeoDataFrame (dataframe) – a DataFrame instance to be enriched.

  • variables (Variable, list, str) – variable ID, slug or Variable instance or list of variable IDs, slugs or Variable instances taken from the Data Observatory Catalog. The maximum number of variables is 50.

  • geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.

  • filters (dict, optional) – dictionary to filter results by variable values. As a key it receives the variable id, and as value receives a SQL operator, for example: {variable1.id: “> 30”}. It works by appending the filter SQL operators to the WHERE clause of the resulting enrichment SQL with the AND operator (in the example: WHERE {variable1.column_name} > 30). If you want to filter the same variable several times you can use a list as a dict value: {variable1.id: [“> 30”, “< 100”]}. The variables used to filter results should exist in variables property list.

Returns

A geopandas.GeoDataFrame enriched with the variables passed as argument.

Raises

EnrichmentError – if there is an error in the enrichment process.

Note that if the points of the `dataframe` you provide are contained in more than one geometry in the enrichment dataset, the number of rows of the returned `GeoDataFrame` could be different than the `dataframe` argument number of rows.

Examples

Enrich a points DataFrame with Catalog classes:

>>> df = pandas.read_csv('path/to/local/csv')
>>> variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> gdf_enrich = Enrichment().enrich_points(df, variables, geom_col='the_geom')

Enrich a points dataframe with several Variables using their ids:

>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = all_variables[:2]
>>> gdf_enrich = Enrichment().enrich_points(df, variables, geom_col='the_geom')

Enrich a points dataframe with filters:

>>> df = pandas.read_csv('path/to/local/csv')
>>> variable = Catalog().country('usa').category('demographics').datasets[0].variables[0]
>>> filters = {variable.id: "= '2019-09-01'"}
>>> gdf_enrich = Enrichment().enrich_points(
...     df,
...     variables=[variable],
...     filters=filters,
...     geom_col='the_geom')
enrich_polygons(dataframe, variables, geom_col=None, filters={}, aggregation='default')

Enrich your polygons DataFrame with columns (Variable) from one or more Dataset in the Data Observatory by intersecting the polygons in the source DataFrame with geographies in the Data Observatory.

When a polygon intersects with multiple geographies, the proportional part of the intersection will be used to interpolate the quantity of the polygon value intersected, aggregating them. Most of Variable instances have a Variable.agg_method property which is used by default as an aggregation function, but you can overwrite it using the aggregation parameter (not even doing the aggregation). If a variable does not have the agg_method property set and you do not overwrite it (with the aggregation parameter), the variable column will be skipped from the enrichment.

Parameters
  • dataframe (pandas.DataFrame, geopandas.GeoDataFrame) – a DataFrame instance to be enriched.

  • variables (Variable, list, str) – variable ID, slug or Variable instance or list of variable IDs, slugs or Variable instances taken from the Data Observatory Catalog. The maximum number of variables is 50.

  • geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.

  • filters (dict, optional) – dictionary to filter results by variable values. As a key it receives the variable id, and as value receives a SQL operator, for example: {variable1.id: “> 30”}. It works by appending the filter SQL operators to the WHERE clause of the resulting enrichment SQL with the AND operator (in the example: WHERE {variable1.column_name} > 30). If you want to filter the same variable several times you can use a list as a dict value: {variable1.id: [“> 30”, “< 100”]}. The variables used to filter results should exist in variables property list.

  • aggregation (None, str, list, optional) –

    sets the data aggregation. The polygons in the source DataFrame can intersect with one or more polygons from the Data Observatory. With this method you can select how to aggregate the resulting data.

    An aggregation method can be one of these values: ‘MIN’, ‘MAX’, ‘SUM’, ‘AVG’, ‘COUNT’, ‘ARRAY_AGG’, ‘ARRAY_CONCAT_AGG’, ‘STRING_AGG’ but check this documentation for a complete list of aggregate functions.

    The options are: - str (default): ‘default’. Most Variable`s has a default aggregation method in the :py:attr:`Variable.agg_method property and it will be used to aggregate the data (a variable could not have agg_method defined and in this case, the variable will be skipped). - None: use this option to do the aggregation locally by yourself. You will receive a row of data from each polygon intersected. Also, you will receive the areas of the polygons intersection and the polygons intersected. - str: if you want to overwrite every default aggregation method, you can pass a string with the aggregation method to use. - dictionary: if you want to overwrite some default aggregation methods from your selected variables, use a dict as Variable.id: aggregation method pairs, for example: {variable1.id: ‘SUM’, variable3.id: ‘AVG’}. Or if you want to use several aggregation method for one variable, you can use a list as a dict value: {variable1.id: [‘SUM’, ‘AVG’], variable3.id: ‘AVG’}

Returns

A geopandas.GeoDataFrame enriched with the variables passed as argument.

Raises

EnrichmentError – if there is an error in the enrichment process.

Note that if the geometry of the `dataframe` you provide intersects with more than one geometry in the enrichment dataset, the number of rows of the returned `GeoDataFrame` could be different than the `dataframe` argument number of rows.

Examples

Enrich a polygons dataframe with one Variable:

>>> df = pandas.read_csv('path/to/local/csv')
>>> variable = Catalog().country('usa').category('demographics').datasets[0].variables[0]
>>> variables = [variable]
>>> gdf_enrich = Enrichment().enrich_polygons(df, variables, geom_col='the_geom')

Enrich a polygons dataframe with all Variables from a Catalog Dataset:

>>> df = pandas.read_csv('path/to/local/csv')
>>> variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> gdf_enrich = Enrichment().enrich_polygons(df, variables, geom_col='the_geom')

Enrich a polygons dataframe with several Variables using their ids:

>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = [all_variables[0].id, all_variables[1].id]
>>> cdf_enrich = Enrichment().enrich_polygons(df, variables, geom_col='the_geom')

Enrich a polygons dataframe with filters:

>>> df = pandas.read_csv('path/to/local/csv')
>>> variable = Catalog().country('usa').category('demographics').datasets[0].variables[0]
>>> filters = {variable.id: "= '2019-09-01'"}
>>> gdf_enrich = Enrichment().enrich_polygons(
...     df,
...     variables=[variable],
...     filters=filters,
...     geom_col='the_geom')

Enrich a polygons dataframe overwriting every variables aggregation method to use SUM function:

>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = all_variables[:3]
>>> gdf_enrich = Enrichment().enrich_polygons(
...     df,
...     variables,
...     aggregation='SUM',
...     geom_col='the_geom')

Enrich a polygons dataframe overwriting some of the variables aggregation methods:

>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variable1 = all_variables[0] // variable1.agg_method is 'AVG' but you want 'SUM'
>>> variable2 = all_variables[1] // variable2.agg_method is 'AVG' and it is what you want
>>> variable3 = all_variables[2] // variable3.agg_method is 'SUM' but you want 'AVG'
>>> variables = [variable1, variable2, variable3]
>>> aggregation = {
...     variable1.id: 'SUM',
...     variable3.id: 'AVG'
>>> }
>>> gdf_enrich = Enrichment().enrich_polygons(
...     df,
...     variables,
...     aggregation=aggregation,
...     geom_col='the_geom')

Enrich a polygons dataframe using several aggregation methods for a variable:

>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variable1 = all_variables[0] // variable1.agg_method is 'AVG' but you want 'SUM' and 'AVG'
>>> variable2 = all_variables[1] // variable2.agg_method is 'AVG' and it is what you want
>>> variable3 = all_variables[2] // variable3.agg_method is 'SUM' but you want 'AVG'
>>> variables = [variable1, variable2, variable3]
>>> aggregation = {
...     variable1.id: ['SUM', 'AVG'],
...     variable3.id: 'AVG'
>>> }
>>> cdf_enrich = Enrichment().enrich_polygons(df, variables, aggregation=aggregation)
Enrich a polygons dataframe without aggregating variables (because you want to it yourself, for example,

in case you want to use your custom function for aggregating the data):

>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = all_variables[:3]
>>> gdf_enrich = Enrichment().enrich_polygons(
...     df,
...     variables,
...     aggregation=None,
...     geom_col='the_geom')

The next example uses filters to calculate the SUM of car-free households Variable of the Catalog for each polygon of my_local_dataframe pandas DataFrame only for areas with more than 100 car-free households:

>>> variable = Variable.get('no_cars_d19dfd10')
>>> gdf_enrich = Enrichment().enrich_polygons(
...     my_local_dataframe,
...     variables=[variable],
...     aggregation={variable.id: 'SUM'},
...     filters={variable.id: '> 100'},
...     geom_col='the_geom')
class cartoframes.data.observatory.CatalogEntity(data)

Bases: abc.ABC

This is an internal class the rest of the classes related to the catalog discovery extend.

It contains:
  • Properties: id, slug (a shorter ID).

  • Static methods: get, get_all, get_list to retrieve elements or lists of objects in the catalog such as datasets, categories, variables, etc.

  • Instance methods to convert to pandas Series, Python dict, compare instances, etc.

As a rule of thumb you don’t directly use this class, it is documented for inheritance purposes.

id_field = 'id'
export_excluded_fields = ['summary_json', 'available_in', 'geom_coverage']
property id

The ID of the entity.

property slug

The slug (short ID) of the entity.

classmethod get(id_)

Get an instance of an entity by ID or slug.

Parameters

id_ (str) – ID or slug of a catalog entity.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no entities are found.

classmethod get_all(filters=None)

List all instances of an entity.

Parameters

filters (dict, optional) – Dict containing pairs of entity properties and its value to be used as filters to query the available entities. If none is provided, no filters will be applied to the query.

classmethod get_list(id_list)

Get a list of instance of an entity by a list of IDs or slugs.

Parameters

id_list (list) – List of sD or slugs of entities in the catalog to retrieve instances.

Raises

CatalogError – if there’s a problem when connecting to the catalog or no entities are found.

to_series()

Converts the entity instance to a pandas Series.

to_dict()

Converts the entity instance to a Python dict.

class cartoframes.data.observatory.CatalogList(data)

Bases: list

This is an internal class that represents a list of entities in the catalog of the same type.

It contains:
  • Instance methods to convert to get an instance of the entity by ID and to convert the list to a pandas DataFrame for further filtering and exploration.

As a rule of thumb you don’t directly use this class, it is documented for inheritance purposes.

to_dataframe()

Converts a list to a pandas DataFrame.

Examples

>>> catalog = Catalog()
>>> catalog.categories.to_dataframe()
class cartoframes.data.observatory.Subscriptions(datasets, geographies)

Bases: object

This class is used to list the datasets and geographies you have acquired a subscription (or valid license) for.

This class won’t show any dataset or geography tagged in the catalog as is_public_data since those data do not require a subscription.

property datasets

List of Dataset you have a subscription for.

Raises

CatalogError – if there’s a problem when connecting to the catalog.

property geographies

List of Geography you have a subscription for.

Raises

CatalogError – if there’s a problem when connecting to the catalog.

class cartoframes.data.observatory.SubscriptionInfo(raw_data)

Bases: object

This class represents a SubscriptionInfo of datasets and geographies in the Catalog

It contains private metadata (you need a CARTO account to query them) that is useful when you want a subscription license for a specific dataset or geography.

property id

The ID of the dataset or geography.

property estimated_delivery_days

Estimated days in which, once you Dataset.subscribe or Geography.subscribe, you’ll get a license.

Your licensed datasets and geographies will be returned by the catalog.subscriptions method.

For the datasets and geographies listed in the catalog.subscriptions method you can: - Dataset.download or Geography.download - Use their Dataset.variables in the Enrichment functions

property subscription_list_price

Price in $ for a one year subscription for this dataset.

property tos

Legal Terms Of Service.

Link to additional information for the legal Terms Of Service.

property licenses

Description of the licenses.

Link to additional information about the available licenses.

property rights

Rights over the dataset or geography when you buy a license by means of a subscription.

Data Services

class cartoframes.data.services.Geocoding(credentials=None)

Bases: cartoframes.data.services.service.Service

Geocoding using CARTO data services.

This requires a CARTO account with an API key that allows for using geocoding services; (through explicit argument in constructor or via the default credentials).

To prevent having to geocode records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom and carto_geocode_hash columns generated by the geocoding process. This will happen automatically if your input is a table from CARTO processed in place (i.e. without a table_name parameter) or if you save your results in a CARTO table using the table_name parameter, and only use the resulting table for any further geocoding.

In case you’re geocoding local data from a DataFrame that you plan to re-geocode again, (e.g. because you’re making your work reproducible by saving all the data preparation steps in a notebook), we advise to save the geocoding results immediately to the same store from where the data is originally taken, for example:

>>> df = pandas.read_csv('my_data')
>>> geocoded_df = Geocoding().geocode(df, 'address').data
>>> geocoded_df.to_csv('my_data')

As an alternative, you can use the cached option to store geocoding results in a CARTO table and reuse them in later geocodings. To do this, you need to use the table_name parameter with the name of the table used to cache the results.

If the same dataframe is geocoded repeatedly no credits will be spent, but note there is a time overhead related to uploading the dataframe to a temporary table for checking for changes.

>>> df = pandas.read_csv('my_data')
>>> geocoded_df = Geocoding().geocode(df, 'address', table_name='my_data', cached=True).data

If you execute the previous code multiple times it will only spend credits on the first geocoding; later ones will reuse the results stored in the my_data table. This will require extra processing time. If the CSV file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.

geocode(source, street, city=None, state=None, country=None, status={'gc_status_rel': 'relevance'}, table_name=None, if_exists='fail', dry_run=False, cached=None)

Geocode method.

Parameters
  • source (str, pandas.DataFrame, geopandas.GeoDataFrame) – table, SQL query or DataFrame object to be geocoded.

  • street (str) – name of the column containing postal addresses

  • city (dict, optional) – dictionary with either a column key with the name of a column containing the addresses’ city names or a value key with a literal city value, e.g. ‘New York’. It also accepts a string, in which case column is implied.

  • state (dict, optional) – dictionary with either a column key with the name of a column containing the addresses’ state names or a value key with a literal state value, e.g. ‘WA’. It also accepts a string, in which case column is implied.

  • country (dict, optional) – dictionary with either a column key with the name of a column containing the addresses’ country names or a value key with a literal country value, e.g. ‘US’. It also accepts a string, in which case column is implied.

  • status (dict, optional) – dictionary that defines a mapping from geocoding state attributes (‘relevance’, ‘precision’, ‘match_types’) to column names. (See https://carto.com/developers/data-services-api/reference/) Columns will be added to the result data for the requested attributes. By default a column gc_status_rel will be created for the geocoding _relevance_. The special attribute ‘*’ refers to all the status attributes as a JSON object.

  • table_name (str, optional) – the geocoding results will be placed in a new CARTO table with this name.

  • if_exists (str, optional) – Behavior for creating new datasets, only applicable if table_name isn’t None; Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’.

  • cached (bool, optional) – Use cache geocoding results, saving the results in a table. This parameter should be used along with table_name.

  • dry_run (bool, optional) – no actual geocoding will be performed (useful to check the needed quota)

Returns

A named-tuple (data, metadata) containing either a data geopandas.GeoDataFrame and a metadata dictionary with global information about the geocoding process.

The data contains a geometry column with point locations for the geocoded addresses and also a carto_geocode_hash that, if preserved, can avoid re-geocoding unchanged data in future calls to geocode.

The metadata, as described in https://carto.com/developers/data-services-api/reference/, contains the following information:

Name

Type

Description

precision

text

precise or interpolated

relevance

number

0 to 1, higher being more relevant

match_types

array

list of match type strings point_of_interest, country, state, county, locality, district, street, intersection, street_number, postal_code

By default the relevance is stored in an output column named gc_status_rel. The name of the column and in general what attributes are added as columns can be configured by using a status dictionary associating column names to status attribute.

Raises

ValueError – if chached param is set without table_name.

Examples

Geocode a DataFrame:

>>> df = pandas.DataFrame([['Gran Vía 46', 'Madrid'], ['Ebro 1', 'Sevilla']], columns=['address','city'])
>>> geocoded_gdf, metadata = Geocoding().geocode(
...     df, street='address', city='city', country={'value': 'Spain'})
>>> geocoded_gdf.head()

Geocode a table from CARTO:

>>> gdf = read_carto('table_name')
>>> geocoded_gdf, metadata = Geocoding().geocode(gdf, street='address')
>>> geocoded_gdf.head()

Geocode a query against a table from CARTO:

>>> gdf = read_carto('SELECT * FROM table_name WHERE value > 1000')
>>> geocoded_gdf, metadata = Geocoding().geocode(gdf, street='address')
>>> geocoded_gdf.head()

Obtain the number of credits needed to geocode a CARTO table:

>>> gdf = read_carto('table_name')
>>> geocoded_gdf, metadata = Geocoding().geocode(gdf, street='address', dry_run=True)
>>> print(metadata['required_quota'])

Filter results by relevance:

>>> df = pandas.DataFrame([['Gran Vía 46', 'Madrid'], ['Ebro 1', 'Sevilla']], columns=['address','city'])
>>> geocoded_gdf, metadata = Geocoding().geocode(
...     df,
...     street='address',
...     city='city',
...     country={'value': 'Spain'},
...     status=['relevance'])
>>> # show rows with relevance greater than 0.7:
>>> print(geocoded_gdf[geocoded_gdf['carto_geocode_relevance'] > 0.7, axis=1)])
class cartoframes.data.services.Isolines(credentials=None)

Bases: cartoframes.data.services.service.Service

Time and distance Isoline services using CARTO dataservices.

isochrones(source, ranges, **args)

isochrone areas.

This method computes areas delimited by isochrone lines (lines of constant travel time) based upon public roads.

Parameters
  • source (str, pandas.DataFrame, geopandas.GeoDataFrame) – table, SQL query or DataFrame containing the source points for the isochrones: travel routes from the source points are computed to determine areas within specified travel times.

  • ranges (list) – travel time values in seconds; for each range value and source point a result polygon will be produced enclosing the area within range of the source.

  • exclusive (bool, optional) – when False (the default), inclusive range areas are generated, each one containing the areas for smaller time values (so the area is reachable from the source within the given time). When True, areas are exclusive, each one corresponding time values between the immediately smaller range value (or zero) and the area range value.

  • ascending (bool, optional) – when True, the isochornes are sorted ascending by travel time, and False (default) for the opposite case.

  • table_name (str, optional) – the resulting areas will be saved in a new CARTO table with this name.

  • if_exists (str, optional) – Behavior for creating new datasets, only applicable if table_name isn’t None; Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’.

  • dry_run (bool, optional) – no actual computation will be performed, and metadata will be returned including the required quota.

  • mode (str, optional) – defines the travel mode: 'car' (the default) or 'walk'.

  • is_destination (bool, optional) – indicates that the source points are to be taken as destinations for the routes used to compute the area, rather than origins.

  • mode_type (str, optional) – type of routes computed: 'shortest' (default) or 'fastests'.

  • mode_traffic (str, optional) – use traffic data to compute routes: 'disabled' (default) or 'enabled'.

  • resolution (float, optional) – level of detail of the polygons in meters per pixel. Higher resolution may increase the response time of the service.

  • maxpoints (int, optional) – Allows to limit the amount of points in the returned polygons. Increasing the number of maxpoints may increase the response time of the service.

  • quality – (int, optional): Allows you to reduce the quality of the polygons in favor of the response time. Admitted values: 1/2/3.

  • geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.

  • source_col (str, optional) – string indicating the source column name. This column will be used to reference the generated isolines with the original geometry. By default it uses the cartodb_id column if exists, or the index of the source DataFrame.

Returns

A named-tuple (data, metadata) containing a data geopandas.GeoDataFrame and a metadata dictionary. For dry runs the data will be None. The data contains a range_data column with a numeric value and a the_geom geometry with the corresponding area. It will also contain a source_id column that identifies the source point corresponding to each area if the source has a cartodb_id column.

isodistances(source, ranges, **args)

isodistance areas.

This method computes areas delimited by isodistance lines (lines of constant travel distance) based upon public roads.

Parameters
  • source (str, pandas.DataFrame, geopandas.GeoDataFrame) – table, SQL query or DataFrame containing the source points for the isodistances: travel routes from the source points are computed to determine areas within specified travel distances.

  • ranges (list) – travel distance values in meters; for each range value and source point a result polygon will be produced enclosing the area within range of the source.

  • exclusive (bool, optional) – when False (the default), inclusive range areas are generated, each one containing the areas for smaller distance values (so the area is reachable from the source within the given distance). When True, areas are exclusive, each one corresponding distance values between the immediately smaller range value (or zero) and the area range value.

  • ascending (bool, optional) – when True, the isochornes are sorted ascending by travel time, and False (default) for the opposite case.

  • table_name (str, optional) – the resulting areas will be saved in a new CARTO table with this name.

  • if_exists (str, optional) – Behavior for creating new datasets, only applicable if table_name isn’t None; Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’.

  • dry_run (bool, optional) – no actual computation will be performed, and metadata will be returned including the required quota.

  • mode (str, optional) – defines the travel mode: 'car' (the default) or 'walk'.

  • is_destination (bool, optional) – indicates that the source points are to be taken as destinations for the routes used to compute the area, rather than origins.

  • mode_type (str, optional) – type of routes computed: 'shortest' (default) or 'fastests'.

  • mode_traffic (str, optional) – use traffic data to compute routes: 'disabled' (default) or 'enabled'.

  • resolution (float, optional) – level of detail of the polygons in meters per pixel. Higher resolution may increase the response time of the service.

  • maxpoints (int, optional) – Allows to limit the amount of points in the returned polygons. Increasing the number of maxpoints may increase the response time of the service.

  • quality – (int, optional): Allows you to reduce the quality of the polygons in favor of the response time. Admitted values: 1/2/3.

  • geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.

  • source_col (str, optional) – string indicating the source column name. This column will be used to reference the generated isolines with the original geometry. By default it uses the cartodb_id column if exists, or the index of the source DataFrame.

Returns

A named-tuple (data, metadata) containing a data geopandas.GeoDataFrame and a metadata dictionary. For dry runs the data will be None. The data contains a range_data column with a numeric value and a the_geom geometry with the corresponding area. It will also contain a source_id column that identifies the source point corresponding to each area if the source has a cartodb_id column.

Raises
  • Exception – if the available quota is less than the required quota.

  • ValueError – if there is no valid geometry found in the dataframe.

Data Clients

class cartoframes.data.clients.SQLClient(credentials=None)

Bases: object

SQLClient class is a client to run SQL queries in a CARTO account.

Parameters

credentials (Credentials) – A Credentials instance can be used in place of a username`|`base_url / api_key combination.

Example

>>> sql = SQLClient(credentials)
>>> sql.query('SELECT * FROM table_name')
>>> sql.execute('DROP TABLE table_name')
>>> sql.distinct('table_name', 'column_name')
>>> sql.count('table_name')
query(query, verbose=False)

Run a SQL query. It returns a list with content of the response. If the verbose param is True it returns the full SQL response in a dict. For more information check the SQL API documentation <https://carto.com/developers/sql-api/reference/#tag/Single-SQL-Statement>.

Parameters
  • query (str) – SQL query.

  • verbose (bool, optional) – flag to return all the response. Default False.

execute(query)

Run a long running query. It returns an object with the status and information of the job. For more information check the Batch API documentation <https://carto.com/developers/sql-api/reference/#tag/Batch-Queries>.

Parameters

query (str) – SQL query.

distinct(table_name, column_name)

Get the distict values and their count in a table for a specific column.

Parameters
  • table_name (str) – name of the table.

  • column_name (str) – name of the column.

count(table_name)

Get the number of elements of a table.

Parameters

table_name (str) – name of the table.

bounds(query)

Get the bounds of the geometries in a table.

Parameters

query (str) – SQL query containing a “the_geom” column.

schema(table_name, raw=False)

Show information about the schema of a table.

Parameters
  • table_name (str) – name of the table.

  • raw (bool, optional) – return raw dict data if set to True. Default False.

describe(table_name, column_name)

Show information about a column in a specific table. It returns the COUNT of the table. If the column type is number it also returns the AVG, MIN and MAX.

Parameters
  • table_name (str) – name of the table.

  • column_name (str) – name of the column.

create_table(table_name, columns, cartodbfy=True)

Create a table with a specific table name and columns.

Parameters
  • table_name (str) – name of the table.

  • column_name (str) – name of the column.

  • cartodbfy (bool, optional) – convert the table to CARTO format. Default True. More info here <https://carto.com/developers/sql-api/guides/creating-tables/#create-tables>.

insert_table(table_name, column_names, column_values)

Insert a row to the table.

Parameters
  • table_name (str) – name of the table.

  • column_names (str, list of str) – names of the columns.

  • column_values (str, list of str) – values of the columns.

update_table(table_name, column_name, column_value, condition)

Update the column’s value for the rows that match the condition.

Parameters
  • table_name (str) – name of the table.

  • column_name (str) – name of the column.

  • column_value (str) – value of the column.

  • condition (str) – “where” condition of the request.

rename_table(table_name, new_table_name)

Rename a table from its table name.

Parameters
  • table_name (str) – name of the original table.

  • new_table_name (str) – name of the new table.

drop_table(table_name)

Remove a table from its table name.

Parameters

table_name (str) – name of the table.

class cartoframes.data.clients.DataObsClient(credentials=None)

Bases: object

Data Observatory v1 class. Data Observatory documentation.

This class provides the following methods to interact with Data Observatory:
  • boundaries: returns a geopandas.GeoDataFrame with

    the geographic boundaries (geometries) or their metadata.

  • discovery: returns a pandas.DataFrame with the measures found.

  • augment: returns a geopandas.GeoDataFrame with the augmented data.

Parameters

credentials (Credentials) – A Credentials instance can be used in place of a username`|`base_url / api_key combination.

boundaries(boundary=None, region=None, decode_geom=False, timespan=None, include_nonclipped=False)

Find all boundaries available for the world or a region. If boundary is specified, get all available boundary polygons for the region specified (if any). This method is especially useful for getting boundaries for a region and, with DataObsClient.augment and DataObsClient.discovery, getting tables of geometries and the corresponding raw measures. For example, if you want to analyze how median income has changed in a region (see examples section for more).

Examples

Find all boundaries available for Australia. The columns geom_name gives us the name of the boundary and geom_id is what we need for the boundary argument.

>>> do = DataObsClient(credentials)
>>> au_boundaries = do.boundaries(region='Australia')
>>> au_boundaries[['geom_name', 'geom_id']]

Get the boundaries for Australian Postal Areas and map them.

>>> au_postal_areas = do.boundaries(boundary='au.geo.POA')
>>> Map(Layer(au_postal_areas))

Get census tracts around Idaho Falls, Idaho, USA, and add median income from the US census. Without limiting the metadata, we get median income measures for each census in the Data Observatory.

>>> # Note: default credentials will be supported in a future release
>>> do = DataObsClient(credentials)
>>> # will return GeoDataFrame with columns `the_geom` and `geom_ref`
>>> tracts = do.boundaries(
...     boundary='us.census.tiger.census_tract',
...     region=[-112.096642,43.429932,-111.974213,43.553539])
>>> # write geometries to a CARTO table
>>> tracts.upload('idaho_falls_tracts')
>>> # gather metadata needed to look up median income
>>> median_income_meta = do.discovery(
...     'idaho_falls_tracts',
...     keywords='median income',
...     boundaries='us.census.tiger.census_tract')
>>> # get median income data and original table as new GeoDataFrame
>>> idaho_falls_income = do.augment(
...     'idaho_falls_tracts',
...     median_income_meta,
...     how='geom_refs')
>>> # overwrite existing table with newly-enriched GeoDataFrame
>>> idaho_falls_income.upload('idaho_falls_tracts', if_exists='replace')
Parameters
  • boundary (str, optional) – Boundary identifier for the boundaries that are of interest. For example, US census tracts have a boundary ID of us.census.tiger.census_tract, and Brazilian Municipios have an ID of br.geo.municipios. Find IDs by running DataObsClient.boundaries without any arguments, or by looking in the Data Observatory catalog.

  • region (str, optional) –

    Region where boundary information or, if boundary is specified, boundary polygons are of interest. region can be one of the following:

    • table name (str):

      Name of a table in user’s CARTO account

    • bounding box (list of float):

      List of four values (two lng/lat pairs) in the following order: western longitude, southern latitude, eastern longitude, and northern latitude. For example, Switzerland fits in [5.9559111595,45.8179931641,10.4920501709,47.808380127]

  • timespan (str, optional) – Specific timespan to get geometries from. Defaults to use the most recent. See the Data Observatory catalog for more information.

  • decode_geom (bool, optional) – Whether to return the geometries as Shapely objects or keep them encoded as EWKB strings. Defaults to False.

  • include_nonclipped (bool, optional) – Optionally include non-shoreline-clipped boundaries. These boundaries are the raw boundaries provided by, for example, US Census Tiger.

Returns

If boundary is specified, then all available boundaries and accompanying geom_refs in region (or the world if region is None or not specified) are returned. If boundary is not specified, then a GeoDataFrame of all available boundaries in region (or the world if region is None).

Return type

geopandas.GeoDataFrame

discovery(region, keywords=None, regex=None, time=None, boundaries=None, include_quantiles=False)

Discover Data Observatory measures. This method returns the full Data Observatory metadata model for each measure or measures that match the conditions from the inputs. The full metadata in each row uniquely defines a measure based on the timespan, geographic resolution, and normalization (if any). Read more about the metadata response in Data Observatory documentation.

Internally, this method finds all measures in region that match the conditions set in keywords, regex, time, and boundaries (if any of them are specified). Then, if boundaries is not specified, a geographical resolution for that measure will be chosen subject to the type of region specified:

  1. If region is a table name, then a geographical resolution that is roughly equal to region size / number of subunits.

  2. If region is a country name or bounding box, then a geographical resolution will be chosen roughly equal to region size / 500.

Since different measures are in some geographic resolutions and not others, different geographical resolutions for different measures are oftentimes returned.

Tip

To remove the guesswork in how geographical resolutions are selected, specify one or more boundaries in boundaries. See the boundaries section for each region in the Data Observatory catalog.

The metadata returned from this method can then be used to create raw tables or for augmenting an existing table from these measures using DataObsClient.augment. For the full Data Observatory catalog, visit https://cartodb.github.io/bigmetadata/. When working with the metadata DataFrame returned from this method, be careful to only remove rows not columns as DataObsClient.augment <cartoframes.client.DataObsClient.augment> generally needs the full metadata.

Note

Narrowing down a discovery query using the keywords, regex, and time filters is important for getting a manageable metadata set. Besides there being a large number of measures in the DO, a metadata response has acceptable combinations of measures with demonimators (normalization and density), and the same measure from other years.

For example, setting the region to be United States counties with no filter values set will result in many thousands of measures.

Examples

Get all European Union measures that mention freight.

>>> freight_meta = do.discovery('European Union',
...                        keywords='freight',
...                        time='2010')
>>> freight_meta['numer_name'].head()
Parameters
  • region (str or list of float) –

    Information about the region of interest. region can be one of three types:

    • region name (str):

      Name of region of interest. Acceptable values are limited to: ‘Australia’, ‘Brazil’, ‘Canada’, ‘European Union’, ‘France’, ‘Mexico’, ‘Spain’, ‘United Kingdom’, ‘United States’.

    • table name (str):

      Name of a table in user’s CARTO account with geometries. The region will be the bounding box of the table.

    Note

    If a table name is also a valid Data Observatory region name, the Data Observatory name will be chosen over the table.

    • bounding box (list of float):

      List of four values (two lng/lat pairs) in the following order: western longitude, southern latitude, eastern longitude, and northern latitude. For example, Switzerland fits in [5.9559111595,45.8179931641,10.4920501709,47.808380127]

    Note

    Geometry levels are generally chosen by subdividing the region into the next smallest administrative unit. To override this behavior, specify the boundaries flag. For example, set boundaries to 'us.census.tiger.census_tract' to choose US census tracts.

  • keywords (str or list of str, optional) – Keyword or list of keywords in measure description or name. Response will be matched on all keywords listed (boolean or).

  • regex (str, optional) – A regular expression to search the measure descriptions and names. Note that this relies on PostgreSQL’s case insensitive operator ~*. See PostgreSQL docs for more information.

  • boundaries (str or list of str, optional) – Boundary or list of boundaries that specify the measure resolution. See the boundaries section for each region in the Data Observatory catalog.

  • include_quantiles (bool, optional) – Include quantiles calculations which are a calculation of how a measure compares to all measures in the full GeoDataFrame. Defaults to False. If True, quantiles columns will be returned for each column which has it pre-calculated.

Returns

A DataFrame of the complete metadata model for specific measures based on the search parameters.

Return type

pandas.DataFrame

Raises
  • ValueError – If region is a list and does not consist of four elements, or if region is not an acceptable region

  • CartoException – If region is not a table in user account

augment(table_name, metadata, persist_as=None, how='the_geom')

Get an augmented CARTO dataset with Data Observatory measures. Use DataObsClient.discovery to search for available measures, or see the full Data Observatory catalog. Optionally persist the data as a new table.

Example

Get a DataFrame with Data Observatory measures based on the geometries in a CARTO table.

>>> do = DataObsClient(credentials)
>>> median_income = do.discovery(
...     'transaction_events',
...     regex='.*median income.*',
...     time='2011 - 2015')
>>> ds = do.augment('transaction_events', median_income)

Pass in cherry-picked measures from the Data Observatory catalog. The rest of the metadata will be filled in, but it’s important to specify the geographic level as this will not show up in the column name.

>>> median_income = [{'numer_id': 'us.census.acs.B19013001',
...                   'geom_id': 'us.census.tiger.block_group',
...                   'numer_timespan': '2011 - 2015'}]
>>> ds = do.augment('transaction_events', median_income)
Parameters
  • table_name (str) – Name of table on CARTO account that Data Observatory measures are to be added to.

  • metadata (pandas.DataFrame) – List of all measures to add to table_name. See DataObsClient.discovery outputs for a full list of metadata columns.

  • persist_as (str, optional) – Output the results of augmenting table_name to persist_as as a persistent table on CARTO. Defaults to None, which will not create a table.

  • how (str, optional) – Column name for identifying the geometry from which to fetch the data. Defaults to the_geom, which results in measures that are spatially interpolated (e.g., a neighborhood boundary’s population will be calculated from underlying census tracts). Specifying a column that has the geometry identifier (for example, GEOID for US Census boundaries), results in measures directly from the Census for that GEOID but normalized how it is specified in the metadata.

Returns

A GeoDataFrame representation of table_name which has new columns for each measure in metadata.

Return type

geopandas.GeoDataFrame

Raises
  • NameError – If the columns in table_name are in the suggested_name column of metadata.

  • ValueError – If metadata object is invalid or empty, or if the number of requested measures exceeds 50.

  • CartoException – If user account consumes all of Data Observatory quota

Viz

Viz namespace contains all the classes to create visualizations based on data

Map

class cartoframes.viz.Map(layers=None, basemap='Positron', bounds=None, size=None, viewport=None, show_info=None, theme=None, title=None, description=None, is_static=None, layer_selector=False, **kwargs)

Bases: object

Map to display a data visualization. It must contain a one or multiple Map instances. It provides control of the basemap, bounds and properties of the visualization.

Parameters
  • layers (list of Layer) – List of layers. Zero or more of Layer.

  • basemap (str, optional) –

    • if a str, name of a CARTO vector basemap. One of positron,

      voyager, or darkmatter from the BaseMaps class, or a hex, rgb or named color value.

    • if a dict, Mapbox or other style as the value of the style key.

      If a Mapbox style, the access token is the value of the token key.

  • bounds (dict or list, optional) – a dict with west, south, east, north keys, or an array of floats in the following structure: [[west, south], [east, north]]. If not provided the bounds will be automatically calculated to fit all features.

  • size (tuple, optional) – a (width, height) pair for the size of the map. Default is (1024, 632).

  • viewport (dict, optional) – Properties for display of the map viewport. Keys can be bearing or pitch.

  • show_info (bool, optional) – Whether to display center and zoom information in the map or not. It is False by default.

  • is_static (bool, optional) – Default False. If True, instead of showing and interactive map, a png image will be displayed. Warning: UI components are not properly rendered in the static view, we recommend to remove legends and widgets before rendering a static map.

  • theme (string, optional) – Use a different UI theme (legends, widgets, popups). Available themes are dark and ligth. By default, it is light for Positron and Voyager basemaps and dark for DarkMatter basemap.

  • title (string, optional) – Title to label the map. and will be displayed in the default legend.

  • description (string, optional) – Text that describes the map and will be displayed in the default legend after the title.

Raises

ValueError – if input parameters are not valid.

Examples

Basic usage.

>>> Map(Layer('table in your account'))

Display more than one layer on a map.

>>> Map(layers=[
...     Layer('table1'),
...     Layer('table2')
>>> ])

Change the CARTO basemap style.

>>> Map(Layer('table in your account'), basemap=basemaps.darkmatter)

Choose a custom basemap style. Here we use the Mapbox streets style, which requires an access token.

>>> basemap = {
...     'style': 'mapbox://styles/mapbox/streets-v9',
...     'token': 'your Mapbox token'
>>> }
>>> Map(Layer('table in your account'), basemap=basemap)

Remove basemap and show a custom color.

>>> Map(Layer('table in your account'), basemap='yellow')  # None, False, 'white', 'rgb(255, 255, 0)'

Set custom bounds.

>>> bounds = {
...     'west': -10,
...     'east': 10,
...     'north': -10,
...     'south': 10
>>> } # or bounds = [[-10, 10], [10, -10]]
>>> Map(Layer('table in your account'), bounds=bounds)

Show the map center and zoom value on the map (lower left-hand corner).

>>> Map(Layer('table in your account'), show_info=True)
publish(name, password, credentials=None, if_exists='fail', maps_api_key=None)

Publish the map visualization as a CARTO custom visualization.

Parameters
  • name (str) – The visualization name on CARTO.

  • password (str) – By setting it, your visualization will be protected by password. When someone tries to show the visualization, the password will be requested. To disable password you must set it to None.

  • credentials (Credentials, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available. It is used to create the publication and also to save local data (if exists) into your CARTO account.

  • if_exists (str, optional) – ‘fail’ or ‘replace’. Behavior in case a publication with the same name already exists in your account. Default is ‘fail’.

  • maps_api_key (str, optional) – The Maps API key used for private datasets.

Example

Publishing the map visualization.

>>> tmap = Map(Layer('tablename'))
>>> tmap.publish('Custom Map Title', password=None)
delete_publication()

Delete the published map visualization.

update_publication(name, password, if_exists='fail')

Update the published map visualization.

Parameters
  • name (str) – The visualization name on CARTO.

  • password (str) – setting it your visualization will be protected by password and using None the visualization will be public.

  • if_exists (str, optional) – ‘fail’ or ‘replace’. Behavior in case a publication with the same name already exists in your account. Default is ‘fail’.

Raises

PublishError – if the map has not been published yet.

static all_publications(credentials=None)

Get all map visualization published by the current user.

Parameters

credentials (Credentials, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available.

Layer

class cartoframes.viz.Layer(source, style=None, legends=None, widgets=None, popup_hover=None, popup_click=None, credentials=None, bounds=None, geom_col=None, default_legend=True, default_widget=False, default_popup_hover=True, default_popup_click=False, title=None, parent_map=None, encode_data=True)

Bases: object

Layer to display data on a map. This class can be used as one or more layers in Map or on its own in a Jupyter notebook to get a preview of a Layer.

Note: in a Jupyter notebook, it is not required to explicitly add a Layer to a

Map if only visualizing data as a single layer.

Parameters
  • source (str, pandas.DataFrame, geopandas.GeoDataFrame) – The source data: table name, SQL query or a dataframe.

  • style (dict, or Style, optional) – The style of the visualization.

  • legends (bool, Legend list, optional) – The legends definition for a layer. It contains a list of legend helpers. See Legend for more information.

  • widgets (bool, list, or WidgetList, optional) – Widget or list of widgets for a layer. It contains the information to display different widget types on the top right of the map. See WidgetList for more information.

  • popup_click (popup_element <cartoframes.viz.popup_element> list, optional) – Set up a popup to be displayed on a click event.

  • popup_hover (bool, popup_element <cartoframes.viz.popup_element> list, optional) – Set up a popup to be displayed on a hover event. Style helpers include a default hover popup, set it to popup_hover=False to remove it.

  • credentials (Credentials, optional) – A Credentials instance. This is only used for the simplified Source API. When a Source is passed as source, these credentials is simply ignored. If not provided the credentials will be automatically obtained from the default credentials.

  • bounds (dict or list, optional) – a dict with west, south, east, north keys, or an array of floats in the following structure: [[west, south], [east, north]]. If not provided the bounds will be automatically calculated to fit all features.

  • geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.

  • default_legend (bool, optional) – flag to set the default legend. This only works when using a style helper. Default True.

  • default_widget (bool, optional) – flag to set the default widget. This only works when using a style helper. Default False.

  • default_popup_hover (bool, optional) – flag to set the default popup hover. This only works when using a style helper. Default True.

  • default_popup_click (bool, optional) – flag to set the default popup click. This only works when using a style helper. Default False.

  • title (str, optional) – title for the default legend, widget and popups.

  • encode_data (bool, optional) – By default, local data is encoded in order to save local space. However, when using very large files, it might not be possible to encode all the data. By disabling this parameter with encode_data=False the resulting notebook will be large, but there will be no encoding issues.

Raises

ValueError – if the source is not valid.

Examples

Create a layer with the defaults (style, legend).

>>> Layer('table_name')  # or Layer(gdf)

Create a layer with a custom style, legend, widget and popups.

>>> Layer(
...     'table_name',
...     style=color_bins_style('column_name'),
...     legends=color_bins_legend(title='Legend title'),
...     widgets=histogram_widget('column_name', title='Widget title'),
...     popup_click=popup_element('column_name', title='Popup title')
...     popup_hover=popup_element('column_name', title='Popup title'))

Create a layer specifically tied to a Credentials.

>>> Layer(
...     'table_name',
...     credentials=Credentials.from_file('creds.json'))

Source

class cartoframes.viz.Source(source, credentials=None, geom_col=None, encode_data=True)

Bases: object

Parameters
  • data (str, pandas.DataFrame, geopandas.GeoDataFrame) – a table name, SQL query, DataFrame, GeoDataFrame instance.

  • credentials (Credentials, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available.

  • bounds (dict or list, optional) – a dict with west, south, east, north keys, or an array of floats in the following structure: [[west, south], [east, north]]. If not provided the bounds will be automatically calculated to fit all features.

  • geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.

Example

Table name.

>>> Source('table_name')

SQL query.

>>> Source('SELECT * FROM table_name')

DataFrame object.

>>> Source(df, geom_col='my_geom')

GeoDataFrame object.

>>> Source(gdf)

Setting the credentials.

>>> Source('table_name', credentials)

Layout

class cartoframes.viz.Layout(maps, n_size=None, m_size=None, viewport=None, map_height=250, is_static=True)

Bases: object

Create a layout of visualizations in order to compare them.

Parameters
  • maps (list of Map) – List of maps. Zero or more of Map.

  • N_SIZE (number, optional) – Number of columns of the layout

  • M_SIZE (number, optional) – Number of rows of the layout

  • viewport (dict, optional) – Properties for display of the maps viewport. Keys can be bearing or pitch.

Raises

ValueError – if the input elements are not instances of Map.

Examples

Basic usage.

>>> Layout([
...    Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account')),
...    Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account'))
>>> ])

Display a 2x2 layout.

>>> Layout([
...     Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account')),
...     Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account'))
>>> ], 2, 2)

Custom Titles.

>>> Layout([
...     Map(Layer('table_in_your_account'), title="Visualization 1 custom title"),
...     Map(Layer('table_in_your_account'), title="Visualization 2 custom title")),
>>> ])

Viewport.

>>> Layout([
...     Map(Layer('table_in_your_account')),
...     Map(Layer('table_in_your_account')),
...     Map(Layer('table_in_your_account')),
...     Map(Layer('table_in_your_account'))
>>> ], viewport={ 'zoom': 2 })
>>> Layout([
...     Map(Layer('table_in_your_account'), viewport={ 'zoom': 0.5 }),
...     Map(Layer('table_in_your_account')),
...     Map(Layer('table_in_your_account')),
...     Map(Layer('table_in_your_account'))
>>> ], viewport={ 'zoom': 2 })

Styles

cartoframes.viz.basic_style(color=None, size=None, opacity=None, stroke_color=None, stroke_width=None)

Helper function for quickly creating a basic style.

Parameters
  • color (str, optional) – hex, rgb or named color value. Defaults is ‘#FFB927’ for point geometries and ‘#4CC8A3’ for lines.

  • size (int, optional) – Size of point or line features.

  • opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

Returns

cartoframes.viz.style.Style

cartoframes.viz.color_bins_style(value, method='quantiles', bins=5, breaks=None, palette=None, size=None, opacity=None, stroke_color=None, stroke_width=None, animate=None)

Helper function for quickly creating a color bins style.

Parameters
  • value (str) – Column to symbolize by.

  • method (str, optional) – Classification method of data: “quantiles”, “equal”, “stdev”. Default is “quantiles”.

  • bins (int, optional) – Number of size classes (bins) for map. Default is 5.

  • breaks (list<int>, optional) – Assign manual class break values.

  • palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “purpor”.

  • size (int, optional) – Size of point or line features.

  • opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.color_category_style(value, top=11, cat=None, palette=None, size=None, opacity=None, stroke_color=None, stroke_width=None, animate=None)

Helper function for quickly creating a color category style.

Parameters
  • value (str) – Column to symbolize by.

  • top (int, optional) – Number of categories. Default is 11. Values can range from 1 to 16.

  • cat (list<str>, optional) – Category list. Must be a valid list of categories.

  • palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “bold”.

  • size (int, optional) – Size of point or line features.

  • opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.color_continuous_style(value, size=None, range_min=None, range_max=None, palette=None, opacity=None, stroke_color=None, stroke_width=None, animate=None)

Helper function for quickly creating a color continuous style.

Parameters
  • value (str) – Column to symbolize by.

  • range_min (int, optional) – The minimum value of the data range for the continuous color ramp. Defaults to the globalMIN of the dataset.

  • range_max (int, optional) – The maximum value of the data range for the continuous color ramp. Defaults to the globalMAX of the dataset.

  • palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “bluyl”.

  • size (int, optional) – Size of point or line features.

  • opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.cluster_size_style(value, operation='count', resolution=32, color=None, opacity=None, stroke_color=None, stroke_width=None, animate=None)

Helper function for quickly creating a cluster map with continuously sized points.

Parameters
  • value (str) – Numeric column to aggregate.

  • operation (str, optional) – Cluster operation, defaults to ‘count’. Other options available are ‘avg’, ‘min’, ‘max’, and ‘sum’.

  • resolution (int, optional) – Resolution of aggregation grid cell. Set to 32 by default.

  • color (str, optional) – Hex, rgb or named color value. Defaults is ‘#FFB927’ for point geometries.

  • opacity (float, optional) – Opacity value. Default is 0.8.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.isolines_style(value='range_label', top=11, cat=None, palette='pinkyl', size=None, opacity=0.8, stroke_color='rgba(150, 150, 150, 0.4)', stroke_width=None)

Helper function for quickly creating an isolines style. Based on the color category style.

Parameters
  • value (str, optional) – Column to symbolize by. Default is “range_label”.

  • top (int, optional) – Number of categories. Default is 11. Values can range from 1 to 16.

  • cat (list<str>, optional) – Category list. Must be a valid list of categories.

  • palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “pinkyl”.

  • size (int, optional) – Size of point or line features.

  • opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘rgba(150,150,150,0.4)’.

  • stroke_width (int, optional) – Size of the stroke on point features.

Returns

cartoframes.viz.style.Style

cartoframes.viz.size_bins_style(value, method='quantiles', bins=5, breaks=None, size_range=None, color=None, opacity=None, stroke_width=None, stroke_color=None, animate=None)

Helper function for quickly creating a size bind style with classification method/buckets.

Parameters
  • value (str) – Column to symbolize by.

  • method (str, optional) – Classification method of data: “quantiles”, “equal”, “stdev”. Default is “quantiles”.

  • bins (int, optional) – Number of size classes (bins) for map. Default is 5.

  • breaks (list<int>, optional) – Assign manual class break values.

  • size_range (list<int>, optional) – Min/max size array as a string. Default is [2, 14] for point geometries and [1, 10] for lines.

  • color (str, optional) – Hex, rgb or named color value. Default is ‘#EE5D5A’ for point geometries and ‘#4CC8A3’ for lines.

  • opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.size_category_style(value, top=5, cat=None, size_range=None, color=None, opacity=None, stroke_color=None, stroke_width=None, animate=None)

Helper function for quickly creating a size category style.

Parameters
  • value (str) – Column to symbolize by.

  • top (int, optional) – Number of size categories. Default is 5. Values can range from 1 to 16.

  • cat (list<str>, optional) – Category list as a string.

  • size_range (list<int>, optional) – Min/max size array as a string. Default is [2, 20] for point geometries and [1, 10] for lines.

  • color (str, optional) – hex, rgb or named color value. Default is ‘#F46D43’ for point geometries and ‘#4CC8A3’ for lines.

  • opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.size_continuous_style(value, size_range=None, range_min=None, range_max=None, color=None, opacity=None, stroke_color=None, stroke_width=None, animate=None, credentials=None)

Helper function for quickly creating a size continuous style.

Parameters
  • value (str) – Column to symbolize by.

  • size_range (list<int>, optional) – Min/max size array as a string. Default is [2, 40] for point geometries and [1, 10] for lines.

  • range_min (int, optional) – The minimum value of the data range for the continuous size ramp. Defaults to the globalMIN of the dataset.

  • range_max (int, optional) – The maximum value of the data range for the continuous size ramp. Defaults to the globalMAX of the dataset.

  • color (str, optional) – hex, rgb or named color value. Defaults is ‘#FFB927’ for point geometries and ‘#4CC8A3’ for lines.

  • opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • animate (str, optional) – Animate features by date/time or other numeric field.

Returns

cartoframes.viz.style.Style

cartoframes.viz.animation_style(value, duration=20, fade_in=1, fade_out=1, color=None, size=None, opacity=None, stroke_color=None, stroke_width=None)

Helper function for quickly creating an animated style.

Parameters
  • value (str) – Column to symbolize by.

  • duration (float, optional) – Time of the animation in seconds. Default is 20s.

  • fade_in (float, optional) – Time of fade in transitions in seconds. Default is 1s.

  • fade_out (float, optional) – Time of fade out transitions in seconds. Default is 1s.

  • color (str, optional) – Hex, rgb or named color value. Default is ‘#EE5D5A’ for points, ‘#4CC8A3’ for lines and #826DBA for polygons.

  • size (int, optional) – Size of point or line features.

  • opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.

  • stroke_width (int, optional) – Size of the stroke on point features.

  • stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.

Returns

cartoframes.viz.style.Style

Legends

cartoframes.viz.basic_legend(title=None, description=None, footer=None)

Helper function for quickly creating a basic legend.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources

Returns

cartoframes.viz.legend.Legend

Example

>>> basic_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer')
cartoframes.viz.color_bins_legend(title=None, description=None, footer=None, prop='color', variable=None, dynamic=True, ascending=False)

Helper function for quickly creating a color bins legend.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

  • prop (str, optional) – Allowed properties are ‘color’ and ‘stroke_color’. It is ‘color’ by default.

  • variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.

  • dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True.

  • ascending (boolean, optional) – If set to True the values are sorted in ascending order. Defaults to False.

Returns

cartoframes.viz.legend.Legend

Example

>>> color_bins_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer',
...     dynamic=False)
cartoframes.viz.color_category_legend(title=None, description=None, footer=None, prop='color', variable=None, dynamic=True)

Helper function for quickly creating a color category legend.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

  • prop (str, optional) – Allowed properties are ‘color’ and ‘stroke_color’. It is ‘color’ by default.

  • variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.

  • dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True.

Returns

cartoframes.viz.legend.Legend

Example

>>> color_category_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer',
...     dynamic=False)
cartoframes.viz.color_continuous_legend(title=None, description=None, footer=None, prop='color', variable=None, dynamic=True, ascending=False)

Helper function for quickly creating a color continuous legend.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

  • prop (str, optional) – Allowed properties are ‘color’ and ‘stroke_color’. It is ‘color’ by default.

  • variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.

  • dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True.

  • ascending (boolean, optional) – If set to True the values are sorted in ascending order. Defaults to False.

Returns

cartoframes.viz.legend.Legend

Example

>>> color_continuous_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer',
...     dynamic=False)
cartoframes.viz.size_bins_legend(title=None, description=None, footer=None, prop='size', variable=None, dynamic=True, ascending=False)

Helper function for quickly creating a size bins legend.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

  • prop (str, optional) – Allowed properties are ‘size’ and ‘stroke_width’. It is ‘size’ by default.

  • variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.

  • dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True.

  • ascending (boolean, optional) – If set to True the values are sorted in ascending order. Defaults to False.

Returns

cartoframes.viz.legend.Legend

Example

>>> size_bins_style(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer',
...     dynamic=False)
cartoframes.viz.size_category_legend(title=None, description=None, footer=None, prop='size', variable=None, dynamic=True)

Helper function for quickly creating a size category legend.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

  • prop (str, optional) – Allowed properties are ‘size’ and ‘stroke_width’. It is ‘size’ by default.

  • variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.

  • dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True.

Returns

cartoframes.viz.legend.Legend

Example

>>> size_category_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer',
...     dynamic=False)
cartoframes.viz.size_continuous_legend(title=None, description=None, footer=None, prop='size', variable='size_value', dynamic=True, ascending=False)

Helper function for quickly creating a size continuous legend.

Parameters
  • prop (str, optional) – Allowed properties are ‘size’ and ‘stroke_width’.

  • dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True.

  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

  • variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.

  • ascending (boolean, optional) – If set to True the values are sorted in ascending order. Defaults to False.

Returns

cartoframes.viz.legend.Legend

Example

>>> size_continuous_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer',
...     dynamic=False)
cartoframes.viz.default_legend(title=None, description=None, footer=None, **kwargs)

Helper function for quickly creating a default legend based on the style. A style helper is required.

Parameters
  • title (str, optional) – Title of legend.

  • description (str, optional) – Description in legend.

  • footer (str, optional) – Footer of legend. This is often used to attribute data sources.

Returns

cartoframes.viz.legend.Legend

Example

>>> default_legend(
...     title='Legend title',
...     description='Legend description',
...     footer='Legend footer')

Widgets

cartoframes.viz.basic_widget(title=None, description=None, footer=None)

Helper function for quickly creating a default widget.

The default widget is a general purpose widget that can be used to provide additional information about your map.

Parameters
  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom.

Returns

cartoframes.viz.widget.Widget

Example

>>> basic_widget(
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer')
cartoframes.viz.category_widget(value, title=None, description=None, footer=None, read_only=False)

Helper function for quickly creating a category widget.

Parameters
  • value (str) – Column name of the category value.

  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom.

  • read_only (boolean, optional) – Interactively filter a category by selecting it in the widget. Set to “False” by default.

Returns

cartoframes.viz.widget.Widget

Example

>>> category_widget(
...     'column_name',
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer')
cartoframes.viz.formula_widget(value, operation=None, title=None, description=None, footer=None, is_global=False)

Helper function for quickly creating a formula widget.

Formula widgets calculate aggregated values (‘avg’, ‘max’, ‘min’, ‘sum’) from numeric columns or counts of features (‘count’) in a dataset.

A formula widget’s aggregations can be calculated on ‘global’ or ‘viewport’ based values. If you want the values in a formula widget to update on zoom and/or pan, use viewport based aggregations.

Parameters
  • value (str) – Column name of the numeric value.

  • operation (str) – attribute for widget’s aggregated value (‘count’, ‘avg’, ‘max’, ‘min’, ‘sum’).

  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom.

  • is_global (boolean, optional) – Account for calculations based on the entire dataset (‘global’) vs. the default of ‘viewport’ features.

Returns

cartoframes.viz.widget.Widget

Example

>>> formula_widget(
...     'column_name',
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer')
>>> formula_widget(
...     'column_name',
...     operation='sum',
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer')
cartoframes.viz.histogram_widget(value, title=None, description=None, footer=None, read_only=False, buckets=20)

Helper function for quickly creating a histogram widget.

Histogram widgets display the distribution of a numeric attribute, in buckets, to group ranges of values in your data.

By default, you can hover over each bar to see each bucket’s values and count, and also filter your map’s data within a given range

Parameters
  • value (str) – Column name of the numeric or date value.

  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom.

  • read_only (boolean, optional) – Interactively filter a range of numeric values by selecting them in the widget. Set to “False” by default.

  • buckets (number, optional) – Number of histogram buckets. Set to 20 by default.

Returns

cartoframes.viz.widget.Widget

Example

>>> histogram_widget(
...     'column_name',
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer',
...     buckets=9)
cartoframes.viz.time_series_widget(value, title=None, description=None, footer=None, read_only=False, buckets=20)

Helper function for quickly creating a time series widget.

The time series widget enables you to display animated data (by aggregation) over a specified date or numeric field. Time series widgets provide a status bar of the animation, controls to play or pause, and the ability to filter on a range of values.

Parameters
  • value (str) – Column name of the numeric or date value.

  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom

  • read_only (boolean, optional) – Interactively filter a range of numeric values by selecting them in the widget. Set to “False” by default.

  • buckets (number, optional) – Number of histogram buckets. Set to 20 by default.

Returns

cartoframes.viz.widget.Widget

Example

>>> time_series_widget(
...     'column_name',
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer',
...     buckets=10)
cartoframes.viz.animation_widget(title=None, description=None, footer=None, prop='filter')

Helper function for quickly creating an animated widget.

The animation widget includes an animation status bar as well as controls to play or pause animated data. The filter property of your map’s style, applied to either a date or numeric field, drives both the animation and the widget. Only one animation can be controlled per layer.

Parameters
  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom.

  • prop (str, optional) – Property of the style to get the animation. Default “filter”.

Returns

cartoframes.viz.widget.Widget

Example

>>> animation_widget(
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer')
cartoframes.viz.default_widget(title=None, description=None, footer=None, **kwargs)

Helper function for quickly creating a default widget based on the style. A style helper is required.

Parameters
  • title (str, optional) – Title of widget.

  • description (str, optional) – Description text widget placed under widget title.

  • footer (str, optional) – Footer text placed on the widget bottom.

Returns

cartoframes.viz.widget.Widget

Example

>>> default_widget(
...     title='Widget title',
...     description='Widget description',
...     footer='Widget footer')

Popups

cartoframes.viz.popup_element(value, title=None, operation=None)

Helper function for quickly adding a popup element to a layer.

Parameters
  • value (str) – Column name to display the value for each feature.

  • title (str, optional) – Title for the given value. By default, it’s the name of the value.

  • operation (str, optional) – Cluster operation, defaults to ‘count’. Other options available are ‘avg’, ‘min’, ‘max’, and ‘sum’.

Example

>>> popup_element('column_name', title='Popup title')
cartoframes.viz.default_popup_element(title=None, operation=None)

Helper function for quickly adding a default popup element based on the style. A style helper is required.

Parameters
  • title (str, optional) – Title for the given value. By default, it’s the name of the value.

  • operation (str, optional) – Cluster operation, defaults to ‘count’. Other options available are ‘avg’, ‘min’, ‘max’, and ‘sum’.

Example

>>> default_popup_element(title='Popup title')

Utils

cartoframes.utils.set_log_level(level)

Set the level of the log in the library.

Parameters
  • level (str) – log level name. By default it’s set to “info”. Valid log levels are:

  • "error", "warning", "info", "debug", "notset". ("critical",) –

cartoframes.utils.decode_geometry(geom_col)

Decodes a DataFrame column. It detects the geometry encoding and it decodes the column if required. Supported geometry encodings are:

  • WKB (Bytes, Hexadecimal String, Hexadecimal Bytestring)

  • Extended WKB (Bytes, Hexadecimal String, Hexadecimal Bytestring)

  • WKT (String)

  • Extended WKT (String)

Parameters

geom_col (array) – Column containing the encoded geometry.

Example

>>> decode_geometry(df['the_geom'])
cartoframes.utils.setup_metrics(enabled)

Update the metrics configuration.

Parameters

enabled (bool) – flag to enable/disable metrics.

Exceptions

exception cartoframes.exceptions.DOError(message)

Bases: Exception

This exception is raised when a problem is encountered while using DO functions.

exception cartoframes.exceptions.CatalogError(message)

Bases: cartoframes.exceptions.DOError

This exception is raised when a problem is encountered while using catalog functions.

exception cartoframes.exceptions.EnrichmentError(message)

Bases: cartoframes.exceptions.DOError

This exception is raised when a problem is encountered while using enrichment functions.

exception cartoframes.exceptions.PublishError(message)

Bases: Exception

This exception is raised when a problem is encountered while publishing visualizations.