The CARTOframes API is organized into three parts: auth, data, and viz.
It is possible to use CARTOframes without having a CARTO account. However, to have access to data enrichment or to discover useful datasets, being a CARTO user offers many advantages. This module is responsible for connecting the user with its CARTO account through given user credentials.
From discovering and enriching data to applying data analyisis and geocoding methods, the CARTOframes API is built with the purpose of managing data without leaving the context of your notebook.
The viz API is designed to create useful, beautiful and straightforward visualizations. It is both predefined and flexible, giving advanced users the possibility of building specific visualizations, but also offering multiple built-in methods to work faster with a few lines of code.
Auth namespace contains the class to manage authentication: cartoframes.auth.Credentials
. It also includes the utility functions cartoframes.auth.set_default_credentials()
and cartoframes.auth.get_default_credentials()
.
Bases: object
Credentials class is used for managing and storing user CARTO credentials. The arguments are listed in order of precedence: Credentials
instances are first, key and base_url/username are taken next, and config_file (if given) is taken last. The config file is creds.json by default. If no arguments are passed, then there will be an attempt to retrieve credentials from a previously saved session. One of the above scenarios needs to be met to successfully instantiate a Credentials
object.
username (str, optional) – Username of CARTO account.
api_key (str, optional) – API key of user’s CARTO account. If the dataset is public, it can be set to ‘default_public’.
base_url (str, optional) – Base URL used for API calls. This is usually of the form https://johnsmith.carto.com/ for user johnsmith. On premises installations (and others) have a different URL pattern.
session (requests.Session, optional) – requests session. See requests documentation for more information.
allow_non_secure (bool, optional) – Allow non secure http connections. By default is not allowed.
ValueError – if not available username or base_url are found.
Example
>>> creds = Credentials(username='johnsmith', api_key='abcdefg')
Credentials api_key
Credentials username
Credentials base_url
Allow connections non secure over http
Credentials session
Credentials user ID
Retrives credentials from a file. Defaults to the user config directory.
config_file (str, optional) – Location where credentials are loaded from. If no argument is provided, it will be loaded from the default location.
session (requests.Session, optional) – requests session. See requests documentation for more information.
A (Credentials
) instance.
Example
>>> creds = Credentials.from_file('creds.json')
Retrives credentials from another Credentials object.
credentials (Credentials
) –
A (Credentials
) instance.
ValueError – if the credentials argument is not an instance of Credentials.
Example
>>> creds = Credentials.from_credentials(orig_creds)
Saves current user credentials to user directory.
config_file (str, optional) – Location where credentials are to be stored. If no argument is provided, it will be send to the default location (creds.json).
Example
>>> credentials = Credentials(username='johnsmith', api_key='abcdefg')
>>> credentials.save('creds.json')
User credentials for `johnsmith` were successfully saved to `creds.json`
Deletes the credentials file specified in config_file. If no file is specified, it deletes the default user credential file (creds.json)
config_file (str) – Path to configuration file. Defaults to delete the user default location if None.
Tip
To see if there is a default user credential file stored, do the following:
>>> print(Credentials.from_file())
Credentials(username='johnsmith', api_key='abcdefg',
base_url='https://johnsmith.carto.com/')
Returns if the user has instant licensing activated for the Data Observatory v2.
Returns the Data Observatory v2 Google Cloud Platform project and token.
Example
>>> from cartoframes.auth import Credentials
>>> from google.oauth2.credentials import Credentials as GoogleCredentials
>>> creds = Credentials(username='johnsmith', api_key='abcdefg')
>>> gcloud_project, gcloud_token = creds.get_gcloud_credentials()
>>> gcloud_credentials = GoogleCredentials(gcloud_token)
Set default credentials for all operations that require authentication against a CARTO account.
credentials (Credentials
, optional) – A Credentials
instance can be used in place of a username | base_url/api_key combination.
filepath (str, optional) – Location where credentials are stored as a JSON file.
username (str, optional) – CARTO user name of the account.
base_url (str, optional) – Base URL of CARTO user account. Cloud-based accounts should use the form https://{username}.carto.com
(e.g., https://johnsmith.carto.com for user johnsmith
) whether on a personal or multi-user account. On-premises installation users should ask their admin.
api_key (str, optional) – CARTO API key. Depending on the application, this can be a project API key or the account master API key.
session (requests.Session, optional) – requests session. See requests documentation for more information.
allow_non_secure (bool, optional) – Allow non secure http connections. By default is not allowed.
Note
The recommended way to authenticate in CARTOframes is to read user credentials from a JSON file that is structured like this:
{
"username": "your user name",
"api_key": "your api key",
"base_url": "https://your_username.carto.com"
}
Note that the ``base_url`` will be different for on premises installations.
By using the cartoframes.auth.Credentials.save()
method, this file will automatically be created for you in a default location depending on your operating system. A custom location can also be specified as an argument to the method.
This file can then be read in the following ways:
>>> set_default_credentials('./carto-project-credentials.json')
Example
Create Credentials from a username
, api_key
pair.
>>> set_default_credentials('johnsmith', 'your api key')
Create credentials from only a username
(only works with public datasets and those marked public with link). If the API key is not provided, the public API key default_public is used. With this setting, only read-only operations can occur (e.g., no publishing of maps, reading data from the Data Observatory, or creating new hosted datasets).
>>> set_default_credentials('johnsmith')
From a pair base_url
, api_key
.
>>> set_default_credentials('https://johnsmith.carto.com', 'your api key')
From a base_url
(for public datasets). The API key default_public is used by default.
>>> set_default_credentials('https://johnsmith.carto.com')
From a Credentials
class.
>>> credentials = Credentials(
... base_url='https://johnsmith.carto.com',
... api_key='your api key')
>>> set_default_credentials(credentials)
Retrieve the default credentials if previously set with cartoframes.auth.set_default_credentials()
in Python session.
Example
>>> set_default_credentials('creds.json')
>>> current_creds = get_default_credentials()
Default credentials previously set in current Python session. None will be returned if default credentials were not previously set.
cartoframes.auth.Credentials
Unset the default credentials if previously set with cartoframes.auth.set_default_credentials()
in Python session.
Example
>>> set_default_credentials('creds.json')
>>> unset_default_credentials()
Read a table or a SQL query from the CARTO account.
source (str) – table name or SQL query.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
limit (int, optional) – The number of rows to download. Default is to download all rows.
retry_times (int, optional) – Number of time to retry the download in case it fails. Default is 3.
schema (str, optional) – prefix of the table. By default, it gets the current_schema() using the credentials.
index_col (str, optional) – name of the column to be loaded as index. It can be used also to set the index name.
decode_geom (bool, optional) – convert the “the_geom” column into a valid geometry column.
null_geom_value (Object, optional) – value for the the_geom column when it’s null. Defaults to None
geopandas.GeoDataFrame
ValueError – if the source is not a valid table_name or SQL query.
Upload a DataFrame to CARTO. The geometry’s CRS must be WGS 84 (EPSG:4326) so you can use it on CARTO.
dataframe (pandas.DataFrame, geopandas.GeoDataFrame`) – data to be uploaded.
table_name (str) – name of the table to upload the data.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
if_exists (str, optional) – ‘fail’, ‘replace’, ‘append’. Default is ‘fail’.
geom_col (str, optional) – name of the geometry column of the dataframe.
index (bool, optional) – write the index in the table. Default is False.
index_label (str, optional) – name of the index column in the table. By default it uses the name of the index from the dataframe.
cartodbfy (bool, optional) – convert the table to CARTO format. Default True. More info here <https://carto.com/developers/sql-api/guides/creating-tables/#create-tables>.
log_enabled (bool, optional) – enable the logging mechanism. Default is True.
retry_times (int, optional) – Number of time to retry the upload in case it fails. Default is 3.
max_upload_size (int, optional) – defines the maximum size of the dataframe to be uploaded. Default is 2GB.
skip_quota_warning (bool, optional) – skip the quota exceeded check and force the upload. (The upload will still fail if the size of the dataset exceeds the remaining DB quota). Default is False.
the table name normalized.
string
ValueError – if the dataframe or table name provided are wrong or the if_exists param is not valid.
List all of the tables in the CARTO account.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
A DataFrame with all the table names for the given credentials.
DataFrame
Check if the table exists in the CARTO account.
table_name (str) – name of the table.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
schema (str, optional) – prefix of the table. By default, it gets the current_schema() using the credentials.
True if the table exists, False otherwise.
bool
ValueError – if the table name is not a valid table name.
Delete the table from the CARTO account.
table_name (str) – name of the table.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
log_enabled (bool, optional) – enable the logging mechanism. Default is True.
ValueError – if the table name is not a valid table name.
Rename a table in the CARTO account.
table_name (str) – name of the table.
new_table_name (str) – new name for the table.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
if_exists (str, optional) – ‘fail’, ‘replace’. Default is ‘fail’.
log_enabled (bool, optional) – enable the logging mechanism. Default is True.
ValueError – if the table names provided are wrong or the if_exists param is not valid.
Copy a table into a new table in the CARTO account.
table_name (str) – name of the original table.
new_table_name (str, optional) – name for the new table.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
if_exists (str, optional) – ‘fail’, ‘replace’, ‘append’. Default is ‘fail’.
log_enabled (bool, optional) – enable the logging mechanism. Default is True.
cartodbfy (bool, optional) – convert the table to CARTO format. Default True. More info here <https://carto.com/developers/sql-api/guides/creating-tables/#create-tables>.
ValueError – if the table names provided are wrong or the if_exists param is not valid.
Create a new table from an SQL query in the CARTO account.
query (str) – SQL query
new_table_name (str) – name for the new table.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
if_exists (str, optional) – ‘fail’, ‘replace’, ‘append’. Default is ‘fail’.
log_enabled (bool, optional) – enable the logging mechanism. Default is True.
cartodbfy (bool, optional) – convert the table to CARTO format. Default True. More info here <https://carto.com/developers/sql-api/guides/creating-tables/#create-tables>.
ValueError – if the query or table name provided is wrong or the if_exists param is not valid.
Describe the table in the CARTO account.
table_name (str) – name of the table.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
schema (str, optional) – prefix of the table. By default, it gets the current_schema() using the credentials.
A dict with the privacy, num_rows and geom_type of the table.
ValueError – if the table name is not a valid table name.
Update the table information in the CARTO account.
table_name (str) – name of the table.
privacy (str) – privacy of the table: ‘private’, ‘public’, ‘link’.
credentials (Credentials
, optional) – instance of Credentials (username, api_key, etc).
log_enabled (bool, optional) – enable the logging mechanism. Default is True.
ValueError – if the table name is wrong or the privacy name is not ‘private’, ‘public’, or ‘link’.
Bases: cartoframes.data.services.service.Service
Geocoding using CARTO data services.
This requires a CARTO account with an API key that allows for using geocoding services; (through explicit argument in constructor or via the default credentials).
To prevent having to geocode records that have been previously geocoded, and thus spend quota unnecessarily, you should always preserve the the_geom
and carto_geocode_hash
columns generated by the geocoding process. This will happen automatically if your input is a table from CARTO processed in place (i.e. without a table_name
parameter) or if you save your results in a CARTO table using the table_name
parameter, and only use the resulting table for any further geocoding.
In case you’re geocoding local data from a DataFrame
that you plan to re-geocode again, (e.g. because you’re making your work reproducible by saving all the data preparation steps in a notebook), we advise to save the geocoding results immediately to the same store from where the data is originally taken, for example:
>>> df = pandas.read_csv('my_data')
>>> geocoded_df = Geocoding().geocode(df, 'address').data
>>> geocoded_df.to_csv('my_data')
As an alternative, you can use the cached
option to store geocoding results in a CARTO table and reuse them in later geocodings. To do this, you need to use the table_name
parameter with the name of the table used to cache the results.
If the same dataframe is geocoded repeatedly no credits will be spent, but note there is a time overhead related to uploading the dataframe to a temporary table for checking for changes.
>>> df = pandas.read_csv('my_data')
>>> geocoded_df = Geocoding().geocode(df, 'address', table_name='my_data', cached=True).data
If you execute the previous code multiple times it will only spend credits on the first geocoding; later ones will reuse the results stored in the my_data
table. This will require extra processing time. If the CSV file should ever change, cached results will only be applied to unmodified records, and new geocoding will be performed only on new or changed records.
Geocode method.
source (str, pandas.DataFrame, geopandas.GeoDataFrame) – table, SQL query or DataFrame object to be geocoded.
street (str) – name of the column containing postal addresses
city (dict, optional) – dictionary with either a column key with the name of a column containing the addresses’ city names or a value key with a literal city value, e.g. ‘New York’. It also accepts a string, in which case column is implied.
state (dict, optional) – dictionary with either a column key with the name of a column containing the addresses’ state names or a value key with a literal state value, e.g. ‘WA’. It also accepts a string, in which case column is implied.
country (dict, optional) – dictionary with either a column key with the name of a column containing the addresses’ country names or a value key with a literal country value, e.g. ‘US’. It also accepts a string, in which case column is implied.
status (dict, optional) – dictionary that defines a mapping from geocoding state attributes (‘relevance’, ‘precision’, ‘match_types’) to column names. (See https://carto.com/developers/data-services-api/reference/) Columns will be added to the result data for the requested attributes. By default a column gc_status_rel
will be created for the geocoding _relevance_. The special attribute ‘*’ refers to all the status attributes as a JSON object.
table_name (str, optional) – the geocoding results will be placed in a new CARTO table with this name.
if_exists (str, optional) – Behavior for creating new datasets, only applicable if table_name isn’t None; Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’.
cached (bool, optional) – Use cache geocoding results, saving the results in a table. This parameter should be used along with table_name
.
dry_run (bool, optional) – no actual geocoding will be performed (useful to check the needed quota)
null_geom_value (Object, optional) – value for the the_geom column when it’s null. Defaults to None
A named-tuple (data, metadata)
containing either a data
geopandas.GeoDataFrame and a metadata
dictionary with global information about the geocoding process.
The data
contains a geometry
column with point locations for the geocoded addresses and also a carto_geocode_hash
that, if preserved, can avoid re-geocoding unchanged data in future calls to geocode.
The metadata
, as described in https://carto.com/developers/data-services-api/reference/, contains the following information:
Name | Type | Description |
---|---|---|
precision | text | precise or interpolated |
relevance | number | 0 to 1, higher being more relevant |
match_types | array | list of match type strings point_of_interest, country, state, county, locality, district, street, intersection, street_number, postal_code |
By default the relevance
is stored in an output column named gc_status_rel
. The name of the column and in general what attributes are added as columns can be configured by using a status
dictionary associating column names to status attribute.
ValueError – if chached param is set without table_name.
Examples
Geocode a DataFrame:
>>> df = pandas.DataFrame([['Gran Vía 46', 'Madrid'], ['Ebro 1', 'Sevilla']], columns=['address','city'])
>>> geocoded_gdf, metadata = Geocoding().geocode(
... df, street='address', city='city', country={'value': 'Spain'})
>>> geocoded_gdf.head()
Geocode a table from CARTO:
>>> gdf = read_carto('table_name')
>>> geocoded_gdf, metadata = Geocoding().geocode(gdf, street='address')
>>> geocoded_gdf.head()
Geocode a query against a table from CARTO:
>>> gdf = read_carto('SELECT * FROM table_name WHERE value > 1000')
>>> geocoded_gdf, metadata = Geocoding().geocode(gdf, street='address')
>>> geocoded_gdf.head()
Obtain the number of credits needed to geocode a CARTO table:
>>> gdf = read_carto('table_name')
>>> geocoded_gdf, metadata = Geocoding().geocode(gdf, street='address', dry_run=True)
>>> print(metadata['required_quota'])
Filter results by relevance:
>>> df = pandas.DataFrame([['Gran Vía 46', 'Madrid'], ['Ebro 1', 'Sevilla']], columns=['address','city'])
>>> geocoded_gdf, metadata = Geocoding().geocode(
... df,
... street='address',
... city='city',
... country={'value': 'Spain'},
... status=['relevance'])
>>> # show rows with relevance greater than 0.7:
>>> print(geocoded_gdf[geocoded_gdf['carto_geocode_relevance'] > 0.7, axis=1)])
Bases: cartoframes.data.services.service.Service
Time and distance Isoline services using CARTO dataservices.
isochrone areas.
This method computes areas delimited by isochrone lines (lines of constant travel time) based upon public roads.
source (str, pandas.DataFrame, geopandas.GeoDataFrame) – table, SQL query or DataFrame containing the source points for the isochrones: travel routes from the source points are computed to determine areas within specified travel times.
ranges (list) – travel time values in seconds; for each range value and source point a result polygon will be produced enclosing the area within range of the source.
exclusive (bool, optional) – when False (the default), inclusive range areas are generated, each one containing the areas for smaller time values (so the area is reachable from the source within the given time). When True, areas are exclusive, each one corresponding time values between the immediately smaller range value (or zero) and the area range value.
ascending (bool, optional) – when True, the isochornes are sorted ascending by travel time, and False (default) for the opposite case.
table_name (str, optional) – the resulting areas will be saved in a new CARTO table with this name.
if_exists (str, optional) – Behavior for creating new datasets, only applicable if table_name isn’t None; Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’.
dry_run (bool, optional) – no actual computation will be performed, and metadata will be returned including the required quota.
mode (str, optional) – defines the travel mode: 'car'
(the default) or 'walk'
.
is_destination (bool, optional) – indicates that the source points are to be taken as destinations for the routes used to compute the area, rather than origins.
mode_type (str, optional) – type of routes computed: 'shortest'
(default) or 'fastests'
.
mode_traffic (str, optional) – use traffic data to compute routes: 'disabled'
(default) or 'enabled'
.
resolution (float, optional) – level of detail of the polygons in meters per pixel. Higher resolution may increase the response time of the service.
maxpoints (int, optional) – Allows to limit the amount of points in the returned polygons. Increasing the number of maxpoints may increase the response time of the service.
quality – (int, optional): Allows you to reduce the quality of the polygons in favor of the response time. Admitted values: 1/2/3.
geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.
source_col (str, optional) – string indicating the source column name. This column will be used to reference the generated isolines with the original geometry. By default it uses the cartodb_id column if exists, or the index of the source DataFrame.
A named-tuple (data, metadata)
containing a data
geopandas.GeoDataFrame and a metadata
dictionary. For dry runs the data will be None
. The data contains a range_data
column with a numeric value and a the_geom
geometry with the corresponding area. It will also contain a source_id
column that identifies the source point corresponding to each area if the source has a cartodb_id
column.
isodistance areas.
This method computes areas delimited by isodistance lines (lines of constant travel distance) based upon public roads.
source (str, pandas.DataFrame, geopandas.GeoDataFrame) – table, SQL query or DataFrame containing the source points for the isodistances: travel routes from the source points are computed to determine areas within specified travel distances.
ranges (list) – travel distance values in meters; for each range value and source point a result polygon will be produced enclosing the area within range of the source.
exclusive (bool, optional) – when False (the default), inclusive range areas are generated, each one containing the areas for smaller distance values (so the area is reachable from the source within the given distance). When True, areas are exclusive, each one corresponding distance values between the immediately smaller range value (or zero) and the area range value.
ascending (bool, optional) – when True, the isochornes are sorted ascending by travel time, and False (default) for the opposite case.
table_name (str, optional) – the resulting areas will be saved in a new CARTO table with this name.
if_exists (str, optional) – Behavior for creating new datasets, only applicable if table_name isn’t None; Options are ‘fail’, ‘replace’, or ‘append’. Defaults to ‘fail’.
dry_run (bool, optional) – no actual computation will be performed, and metadata will be returned including the required quota.
mode (str, optional) – defines the travel mode: 'car'
(the default) or 'walk'
.
is_destination (bool, optional) – indicates that the source points are to be taken as destinations for the routes used to compute the area, rather than origins.
mode_type (str, optional) – type of routes computed: 'shortest'
(default) or 'fastests'
.
mode_traffic (str, optional) – use traffic data to compute routes: 'disabled'
(default) or 'enabled'
.
resolution (float, optional) – level of detail of the polygons in meters per pixel. Higher resolution may increase the response time of the service.
maxpoints (int, optional) – Allows to limit the amount of points in the returned polygons. Increasing the number of maxpoints may increase the response time of the service.
quality – (int, optional): Allows you to reduce the quality of the polygons in favor of the response time. Admitted values: 1/2/3.
geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.
source_col (str, optional) – string indicating the source column name. This column will be used to reference the generated isolines with the original geometry. By default it uses the cartodb_id column if exists, or the index of the source DataFrame.
A named-tuple (data, metadata)
containing a data
geopandas.GeoDataFrame and a metadata
dictionary. For dry runs the data will be None
. The data contains a range_data
column with a numeric value and a the_geom
geometry with the corresponding area. It will also contain a source_id
column that identifies the source point corresponding to each area if the source has a cartodb_id
column.
Exception – if the available quota is less than the required quota.
ValueError – if there is no valid geometry found in the dataframe.
With CARTOframes it is possible to enrich your data by using our Data Observatory Catalog through the enrichment methods.
Bases: object
This class represents the Data Observatory metadata Catalog
.
The catalog contains metadata that helps to discover and understand the data available in the Data Observatory for Dataset.download
and Enrichment
purposes.
You can get more information about the Data Observatory catalog from the CARTO website and in your CARTO user account dashboard.
Explore and discover the datasets available in the repository (both public and premium datasets).
Subscribe to some premium datasets and manage your datasets licenses.
Download data and use your licensed datasets and variables to enrich your own data by means of the Enrichment
functions.
The Catalog is public and can be explored without a CARTO account. Once you discover a Dataset
of interest and want to acquire a license to use it, you’ll need a CARTO account to subscribe to it, by means of the Dataset.subscribe
or Geography.subscribe
functions.
Dataset
: It is the main CatalogEntity
. It contains metadata of the actual data you can use to Dataset.download
or for Enrichment
purposes.
Geography
: Datasets in the Data Observatory are aggregated by different geographic boundaries. The Geography entity contains metadata to understand the boundaries of a Dataset
. It’s used for enrichment and you can also Geography.download
the underlying data.
Variable
: Variables contain metadata about the columns available in each dataset for enrichment. Let’s say you explore a dataset with demographic data for the whole US at the Census tract level. The variables give you information about the actual columns you have available, such as: total_population, total_males, etc. On the other hand, you can use lists of Variable instances, Variable.id
, or Variable.slug
to enrich your own data.
Every Dataset is related to a Geography. You can have for example, demographics data at the Census tract, block groups or blocks levels.
When subscribing to a premium dataset, you should subscribe to both the Dataset.subscribe
and the Geography.subscribe
to be able to access both tables to enrich your own data.
The two main entities of the Catalog (Dataset and Geography) are related to other entities, that are useful for a hierarchical categorization and discovery of available data in the Data Observatory:
Category
: Groups datasets of the same topic, for example, demographics, financial, etc.
Country
: Groups datasets available by country
Provider
: Gives you information about the provider of the source data
You can just list all the grouping entities. Take into account this is not the preferred way to discover the catalog metadata, since there can be thousands of entities on it:
>>> Category.get_all()
[<Category.get('demographics')>, ...]
>>> Country.get_all()
[<Country.get('usa')>, ...]
>>> Provider.get_all()
[<Provider.get('mrli')>, ...]
Or you can get them by ID:
>>> Category.get('demographics')
<Category.get('demographics')>
>>> Country.get('usa')
<Country.get('usa')>
>>> Provider.get('mrli')
<Provider.get('mrli')>
Examples
The preferred way of discover the available datasets in the Catalog is through nested filters
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').datasets
[<Dataset.get('acs_sociodemogr_b758e778')>, ...]
You can include the geography as part of the nested filter like this:
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
If a filter is already applied to a Catalog instance and you want to do a new hierarchical search, clear the previous filters with the Catalog().clear_filters() method:
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
>>> catalog.clear_filters()
>>> catalog.country('esp').category('demographics').datasets
Otherwise the filters accumulate and you’ll get unexpected results.
During the discovery process, it’s useful to understand the related metadata to a given Geography or Dataset. A useful way of reading or filtering by metadata values consists on converting the entities to a pandas DataFrame:
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets.to_dataframe()
For each dataset in the Catalog, you can explore its variables, get a summary of its stats, etc.
>>> dataset = Dataset.get('od_acs_13345497')
>>> dataset.variables
[<Variable.get('dwellings_2_uni_fb8f6cfb')> #'Two-family (two unit) dwellings', ...]
See the Catalog guides and examples in our public documentation website for more information.
Get all the countries with datasets available in the Catalog.
CatalogList
CatalogError – if there’s a problem when connecting to the catalog or no countries are found.
Get all the categories in the Catalog.
CatalogList
CatalogError – if there’s a problem when connecting to the catalog or no categories are found.
Get all the providers in the Catalog.
CatalogList
CatalogError – if there’s a problem when connecting to the catalog or no providers are found.
Get all the datasets in the Catalog.
CatalogList
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Get all the geographies in the Catalog.
CatalogList
CatalogError – if there’s a problem when connecting to the catalog or no geographies are found.
Add a country filter to the current Catalog instance.
country_id (str) – ID of the country to be used for filtering the Catalog.
Catalog
Add a category filter to the current Catalog instance.
category_id (str) – ID of the category to be used for filtering the Catalog.
Catalog
Add a geography filter to the current Catalog instance.
geography_id (str) – ID or slug of the geography to be used for filtering the Catalog
Catalog
Add a provider filter to the current Catalog instance
provider_id (str) – ID of the provider to be used for filtering the Catalog.
CatalogList
Add a public filter to the current Catalog instance
is_public (str, optional) – Flag to filter public (True) or private (False) datasets. Default is True.
CatalogList
Remove the current filters from this Catalog instance.
Get all the subscriptions in the Catalog. You’ll get all the Dataset or Geography instances you have previously subscribed to.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
Subscriptions
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Get all the datasets in the Catalog filtered :returns: Dataset
Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity
This class represents a Category
in the Catalog
. Catalog datasets (Dataset
class) are grouped by categories, so you can filter available datasets and geographies that belong (or are related) to a given Category.
Examples
List the available categories in the Catalog
>>> catalog = Catalog()
>>> categories = catalog.categories
Get a Category
from the Catalog
given its ID
>>> category = Category.get('demographics')
Get the list of Dataset
related to this category.
CatalogList
List of Dataset instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Examples
Get all the datasets Dataset
available in the catalog for a Category
instance
>>> category = Category.get('demographics')
>>> datasets = category.datasets
Same example as above but using nested filters:
>>> catalog = Catalog()
>>> datasets = catalog.category('demographics').datasets
You can perform other operations with a CatalogList
:
>>> catalog = Catalog()
>>> datasets = catalog.category('demographics').datasets
>>> # convert the list of datasets into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = datasets.to_dataframe()
>>> # get a dataset by ID or slug
>>> dataset = Dataset.get(A_VALID_ID_OR_SLUG)
Get the list of Geography
related to this category.
CatalogList
List of Geography instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Examples
Get all the geographies Dataset
available in the catalog for a Category
instance
>>> category = Category.get('demographics')
>>> geographies = category.geographies
Same example as above but using nested filters:
>>> catalog = Catalog()
>>> geographies = catalog.category('demographics').geographies
You can perform these other operations with a CatalogList
:
>>> catalog = Catalog()
>>> geographies = catalog.category('demographics').geographies
>>> # convert the list of datasets into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = geographies.to_dataframe()
>>> # get a geography by ID or slug
>>> dataset = Geography.get(A_VALID_ID_OR_SLUG)
Name of this category instance.
Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity
This class represents a Country
in the Catalog
. Catalog datasets (Dataset
class) belong to a country, so you can filter available datasets and geographies that belong (or are related) to a given Country.
Examples
List the available countries in the Catalog
>>> catalog = Catalog()
>>> countries = catalog.countries
Get a Country
from the Catalog
given its ID
>>> # country ID is a lowercase ISO Alpha 3 Code
>>> country = Country.get('usa')
Get the list of Dataset
covering data for this country.
CatalogList
List of Dataset instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Examples
Get all the datasets Dataset
available in the catalog for a Country
instance
>>> country = Country.get('usa')
>>> datasets = country.datasets
Same example as above but using nested filters:
>>> catalog = Catalog()
>>> datasets = catalog.country('usa').datasets
You can perform these other operations with a CatalogList
:
>>> datasets = catalog.country('usa').datasets
>>> # convert the list of datasets into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = datasets.to_dataframe()
>>> # get a dataset by ID or slug
>>> dataset = Dataset.get(A_VALID_ID_OR_SLUG)
Get the list of Geography
covering data for this country.
CatalogList
List of Geography instances.
CatalogError – if there’s a problem when connecting to the catalog or no geographies are found.
Examples
Get all the geographies Geography
available in the catalog for a Country
instance
>>> country = Country.get('usa')
>>> geographies = country.geographies
Same example as above but using nested filters:
>>> catalog = Catalog()
>>> geographies = catalog.country('usa').geographies
You can perform these other operations with a CatalogList
:
>>> geographies = catalog.country('usa').geographies
>>> # convert the list of geographies into a pandas DataFrame
>>> # for further filtering and exploration
>>> dataframe = geographies.to_dataframe()
>>> # get a geography by ID or slug
>>> geography = Geography.get(A_VALID_ID_OR_SLUG)
Get the list of Category
that are assigned to Dataset
that cover data for this country.
CatalogList
List of Category instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Examples
Get all the categories Category
available in the catalog for a Country
instance
>>> country = Country.get('usa')
>>> categories = country.categories
Same example as above but using nested filters:
>>> catalog = Catalog()
>>> category = catalog.country('usa').categories
Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity
A Dataset represents the metadata of a particular dataset in the catalog.
If you have Data Observatory enabled in your CARTO account you can:
Use any public dataset to enrich your data with the variables in it by means of the
Enrichment
functions.Subscribe (
Dataset.subscribe
) to any premium dataset to get a license that grants you the right to enrich your data with the variables (Variable
) in it.
See the enrichment guides for more information about datasets, variables and enrichment functions.
The metadata of a dataset allows you to understand the underlying data, from variables (the actual columns in the dataset, data types, etc.), to a description of the provider, source, country, geography available, etc.
See the attributes reference in this class to understand the metadata available for each dataset in the catalog.
Examples
There are many different ways to explore the available datasets in the catalog.
You can just list all the available datasets:
>>> catalog = Catalog()
>>> datasets = catalog.datasets
Since the catalog contains thousands of datasets, you can convert the list of datasets to a pandas DataFrame for further filtering:
>>> catalog = Catalog()
>>> dataframe = catalog.datasets.to_dataframe()
The catalog supports nested filters for a hierarchical exploration. This way you could list the datasets available for different hierarchies: country, provider, category, geography, or a combination of them.
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
Get the list of Variable
that corresponds to this dataset. Variables are used in the Enrichment
functions to augment your local DataFrames with columns from a Dataset in the Data Observatory.
CatalogList
List of Variable instances.
CatalogError – if there’s a problem when connecting to the catalog.
Get the list of VariableGroup
related to this dataset.
CatalogList
List of VariableGroup instances.
CatalogError – if there’s a problem when connecting to the catalog.
Name of this dataset.
Description of this dataset.
ID of the Provider
of this dataset.
Name of the Provider
of this dataset.
Get the Category
ID assigned to this dataset.sets
Name of the Category
assigned to this dataset.
ID of the data source of this dataset.
ISO 3166-1 alpha-3 code of the Country
of this dataset. More info in: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3.
ISO 639-3 code of the language that corresponds to the data of this dataset. More info in: https://en.wikipedia.org/wiki/ISO_639-3.
Get the Geography
ID associated to this dataset.
Get the name of the Geography
associated to this dataset.
Description of the Geography
associated to this dataset.
Time amount in which data is aggregated in this dataset.
This is a free text field in this form: seconds, daily, hourly, monthly, yearly, etc.
Time range that covers the data of this dataset.
List of str
Example: [2015-01-01,2016-01-01)
Frequency in which the dataset is updated.
str
Example: monthly, yearly, etc.
Internal version info of this dataset.
str
Allows to check if the content of this dataset can be accessed with public credentials or if it is a premium dataset that needs a subscription.
True
if the dataset is public
False
if the dataset is premium(it requires to Dataset.subscribe
)
A boolean value
JSON object with extra metadata that summarizes different properties of the dataset content.
Returns a sample of the 10 first rows of the dataset data.
If a dataset has fewer than 10 rows (e.g., zip codes of small countries), this method will return None
pandas.DataFrame
“Returns the last ten rows of the dataset”
If a dataset has fewer than 10 rows (e.g., zip codes of small countries), this method will return None
pandas.DataFrame
Returns a summary of different counts over the actual dataset data.
pandas.Series
Example
# rows: number of rows in the dataset
# cells: number of cells in the dataset (rows * columns)
# null_cells: number of cells with null value in the dataset
# null_cells_percent: percent of cells with null value in the dataset
Returns a summary of the number of columns per data type in the dataset.
pandas.Series
Example
# float number of columns with type float in the dataset
# string number of columns with type string in the dataset
# integer number of columns with type integer in the dataset
Shows a map to visualize the geographical coverage of the dataset.
Map
Shows a summary of the actual stats of the variables (columns) of the dataset. Some of the stats provided per variable are: avg, max, min, sum, range, stdev, q1, q3, median and interquartile_range
autoformat (boolean) – set automatic format for values. Default is True.
pandas.DataFrame
Example
# avg average value
# max max value
# min min value
# sum sum of all values
# range
# stdev standard deviation
# q1 first quantile
# q3 third quantile
# median median value
# interquartile_range
Get all the Dataset instances that comply with the indicated filters (or all of them if no filters are passed). If credentials are given, only the datasets granted for those credentials are returned.
credentials (Credentials
, optional) – credentials of CARTO user account. If provided, only datasets granted for those credentials are returned.
filters (dict, optional) – Dict containing pairs of dataset properties and its value to be used as filters to query the available datasets. If none is provided, no filters will be applied to the query.
CatalogList
List of Dataset instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
DOError – if DO is not enabled.
Download dataset data as a local csv file. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.
For premium datasets (those with is_public_data set to False), you need a subscription to the dataset. Check the subscription guides for more information.
file_path (str) – the file path where save the dataset (CSV).
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
limit (int, optional) – The number of rows to download. Default is to download all rows.
order_by (str, optional) – Field(s) used to order the rows to download. Default is unordered.
sql_query (str, optional) – a query to select, filter or aggregate the content of the dataset. For instance, to download just one row: select * from $dataset$ limit 1. The placeholder $dataset$ is mandatory and it will be replaced by the actual dataset before running the query. You can build any arbitrary query.
add_geom (boolean, optional) – to include the geography when using the sql_query argument. Default to True.
DOError – if you have not a valid license for the dataset being downloaded, DO is not enabled or there is an issue downloading the data.
ValueError – if the credentials argument is not valid.
Download dataset data as a geopandas.GeoDataFrame. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.
For premium datasets (those with is_public_data set to False), you need a subscription to the dataset. Check the subscription guides for more information.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
limit (int, optional) – The number of rows to download. Default is to download all rows.
order_by (str, optional) – Field(s) used to order the rows to download. Default is unordered.
sql_query (str, optional) – a query to select, filter or aggregate the content of the dataset. For instance, to download just one row: select * from $dataset$ limit 1. The placeholder $dataset$ is mandatory and it will be replaced by the actual dataset before running the query. You can build any arbitrary query.
add_geom (boolean, optional) – to include the geography when using the sql_query argument. Default to True.
geopandas.GeoDataFrame
DOError – if you have not a valid license for the dataset being downloaded, DO is not enabled or there is an issue downloading the data.
ValueError – if the credentials argument is not valid.
Subscribe to a dataset. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.
Datasets with is_public_data set to True do not need a license (i.e., a subscription) to be used. Datasets with is_public_data set to False do need a license (i.e., a subscription) to be used. You’ll get a license to use this dataset depending on the estimated_delivery_days set for this specific dataset.
See subscription_info
for more info
Once you subscribe to a dataset, you can download its data by Dataset.to_csv
or Dataset.to_dataframe
and use the Enrichment
functions. See the enrichment guides for more info.
You can check the status of your subscriptions by calling the subscriptions
method in the Catalog
with your CARTO Credentials
.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
CatalogError – if there’s a problem when connecting to the catalog.
DOError – if DO is not enabled.
Get the subscription information of a Dataset, which includes the license, Terms of Service, rights, price, and estimated time of delivery, among other metadata of interest during the Dataset.subscription
process.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
SubscriptionInfo
SubscriptionInfo instance.
CatalogError – if there’s a problem when connecting to the catalog.
DOError – if DO is not enabled.
Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity
A Geography represents the metadata of a particular geography dataset in the catalog.
If you have Data Observatory enabled in your CARTO account you can:
Use any public geography to enrich your data with the variables in it by means of the
Enrichment
functions.Subscribe (
Geography.subscribe
) to any premium geography to get a license that grants you the right to enrich your data with the variables in it.
See the enrichment guides for more information about geographies, variables, and enrichment functions.
The metadata of a geography allows you to understand the underlying data, from variables (the actual columns in the geography, data types, etc.), to a description of the provider, source, country, geography available, etc.
See the attributes reference in this class to understand the metadata available for each geography in the catalog.
Examples
There are many different ways to explore the available geographies in the catalog.
You can just list all the available geographies:
>>> catalog = Catalog()
>>> geographies = catalog.geographies
Since the catalog contains thousands of geographies, you can convert the list of geographies to a pandas DataFrame for further filtering:
>>> catalog = Catalog()
>>> dataframe = catalog.geographies.to_dataframe()
The catalog supports nested filters for a hierarchical exploration. This way you could list the geographies available for different hierarchies: country, provider, category or a combination of them.
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geographies
Usually you use a geography ID as an intermediate filter to get a list of datasets with aggregate data for that geographical resolution
>>> catalog = Catalog()
>>> catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets
Get the list of datasets related to this geography.
CatalogList
List of Dataset instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Name of this geography.
Description of this geography.
Code (ISO 3166-1 alpha-3) of the country of this geography.
Code (ISO 639-3) of the language that corresponds to the data of this geography.
ID of the Provider of this geography.
Name of the Provider of this geography.
Geographical coverage geometry encoded in WKB.
Info about the type of geometry of this geography.
Frequency in which the geography data is updated.
Example: monthly, yearly, etc.
Internal version info of this geography.
Allows to check if the content of this geography can be accessed with public credentials or if it is a premium geography that needs a subscription.
True
if the geography is public
False
if the geography is premium(it requires to Geography.subscribe
)
A boolean value
dict with extra metadata that summarizes different properties of the geography content.
Get all the Geography instances that comply with the indicated filters (or all of them if no filters are passed. If credentials are given, only the geographies granted for those credentials are returned.
credentials (Credentials
, optional) – credentials of CARTO user account. If provided, only geographies granted for those credentials are returned.
filters (dict, optional) – Dict containing pairs of geography properties and its value to be used as filters to query the available geographies. If none is provided, no filters will be applied to the query.
CatalogList
List of Geography instances.
CatalogError – if there’s a problem when connecting to the catalog or no geographies are found.
DOError – if DO is not enabled.
Download geography data as a local csv file. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.
For premium geographies (those with is_public_data set to False), you need a subscription to the geography. Check the subscription guides for more information.
file_path (str) – the file path where save the dataset (CSV).
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
limit (int, optional) – The number of rows to download. Default is to download all rows.
order_by (str, optional) – Field(s) used to order the rows to download. Default is unordered.
sql_query (str, optional) – a query to select, filter or aggregate the content of the geography dataset. For instance, to download just one row: select * from $geography$ limit 1. The placeholder $geography$ is mandatory and it will be replaced by the actual geography dataset before running the query. You can build any arbitrary query.
DOError – if you have not a valid license for the geography being downloaded, DO is not enabled or there is an issue downloading the data.
ValueError – if the credentials argument is not valid.
Download geography data as a pandas.DataFrame. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.
For premium geographies (those with is_public_data set to False), you need a subscription to the geography. Check the subscription guides for more information.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
limit (int, optional) – The number of rows to download. Default is to download all rows.
order_by (str, optional) – Field(s) used to order the rows to download. Default is unordered.
sql_query (str, optional) – a query to select, filter or aggregate the content of the geography dataset. For instance, to download just one row: select * from $geography$ limit 1. The placeholder $geography$ is mandatory and it will be replaced by the actual geography dataset before running the query. You can build any arbitrary query.
pandas.DataFrame
DOError – if you have not a valid license for the geography being downloaded, DO is not enabled or there is an issue downloading the data.
ValueError – if the credentials argument is not valid.
Subscribe to a Geography. You need Data Observatory enabled in your CARTO account, please contact us at support@carto.com for more information.
Geographies with is_public_data set to True, do not need a license (i.e. a subscription) to be used. Geographies with is_public_data set to False, do need a license (i.e. a subscription) to be used. You’ll get a license to use this geography depending on the estimated_delivery_days set for this specific geography.
See subscription_info
for more info
Once you Geography.subscribe
to a geography you can download its data by Geography.to_csv
or Geography.to_dataframe
and use the enrichment functions. See the enrichment guides for more info.
You can check the status of your subscriptions by calling the subscriptions
method in the Catalog
with your CARTO credentials.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
CatalogError – if there’s a problem when connecting to the catalog.
DOError – if DO is not enabled.
Get the subscription information of a Geography, which includes the license, TOS, rights, prize and estimated_time_of_delivery, among other metadata of interest during the subscription process.
credentials (Credentials
, optional) – credentials of CARTO user account. If not provided, a default credentials (if set with set_default_credentials
) will be used.
SubscriptionInfo
SubscriptionInfo instance.
CatalogError – if there’s a problem when connecting to the catalog.
DOError – if DO is not enabled.
Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity
This class represents a Provider
of datasets and geographies in the Catalog
.
Examples
List the available providers in the Catalog
in combination with nested filters (categories, countries, etc.)
>>> providers = Provider.get_all()
Get a Provider
from the Catalog
given its ID
>>> catalog = Catalog()
>>> provider = catalog.provider('mrli')
Get the list of datasets related to this provider.
CatalogList
List of Dataset instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Examples
>>> provider = Provider.get('mrli')
>>> datasets = provider.datasets
Same example as above but using nested filters:
>>> catalog = Catalog()
>>> datasets = catalog.provider('mrli').datasets
Name of this provider.
Bases: cartoframes.data.observatory.catalog.entity.CatalogEntity
This class represents a Variable
of datasets in the Catalog
.
Variables contain column names, description, data type, aggregation method, and some other metadata that is useful to understand the underlying data inside a Dataset
Examples
List the variables of a Dataset
in combination with nested filters (categories, countries, etc.)
>>> dataset = Dataset.get('mbi_retail_turn_705247a')
>>> dataset.variables
[<Variable.get('RT_CI_95050c10')> #'Retail Turnover: index (country eq.100)', ...]
Get the list of datasets related to this variable.
CatalogList
List of Dataset instances.
CatalogError – if there’s a problem when connecting to the catalog or no datasets are found.
Name of this variable.
Description of this variable.
Column name of the actual table related to the variable in the Dataset
.
Type in the database.
str
Examples: INTEGER, STRING, FLOAT, GEOGRAPHY, JSON, BOOL, etc.
ID of the Dataset
to which this variable belongs.
Text representing a description of the aggregation method used to compute the values in this Variable
If any, ID of the variable group to which this variable belongs.
JSON object with extra metadata that summarizes different properties of this variable.
Shows a summary of the actual stats of the variable (column) of the dataset. Some of the stats provided per variable are: avg, max, min, sum, range, stdev, q1, q3, median and interquartile_range
autoformat (boolean) – set automatic format for values. Default is True.
Example
# avg average value
# max max value
# min min value
# sum sum of all values
# range
# stdev standard deviation
# q1 first quantile
# q3 third quantile
# median median value
# interquartile_range
Returns a sample of the 10 first values of the variable data.
For the cases of datasets with a content fewer than 10 rows (i.e. zip codes of small countries), this method won’t return anything
Returns a sample of the 10 last values of the variable data.
For the cases of datasets with a content fewer than 10 rows (i.e. zip codes of small countries), this method won’t return anything
Returns a summary of different counts over the actual variable values.
Example
# all total number of values
# null total number of null values
# zero number of zero-valued entries
# extreme number of values 3stdev outside the interquartile range
# distinct number of distinct (unique) entries
# outliers number of outliers (outside 1.5stdev the interquartile range
# zero_percent percent of values that are zero
# distinct_percent percent of values that are distinct
Returns the quantiles of the variable data.
Returns information about the top values of the variable data.
Plots an histogram with the variable data.
Bases: cartoframes.data.observatory.enrichment.enrichment_service.EnrichmentService
This is the main class to enrich your own data with data from the Data Observatory
To be able to use the Enrichment functions you need A CARTO account with Data Observatory v2 enabled. Contact us at support@carto.com for more information about this.
Please, see the Catalog
discovery and subscription guides, to understand how to explore the Data Observatory repository and subscribe to premium datasets to be used in your enrichment workflows.
credentials (Credentials
, optional) – credentials of user account. If not provided, a default credentials (if set with set_default_credentials
) will attempted to be used.
Enrich your points DataFrame with columns (Variable
) from one or more Dataset
in the Data Observatory, intersecting the points in the source DataFrame with the geographies in the Data Observatory.
Extra columns as area and population will be provided in the resulting DataFrame for normalization purposes.
(pandas.DataFrame (dataframe) – a DataFrame instance to be enriched.
geopandas.GeoDataFrame – a DataFrame instance to be enriched.
variables (Variable
, list, str) – variable ID, slug or Variable
instance or list of variable IDs, slugs or Variable
instances taken from the Data Observatory Catalog
.
geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.
filters (dict, optional) – dictionary to filter results by variable values. As a key it receives the variable id, and as value receives a SQL operator, for example: {variable1.id: “> 30”}. It works by appending the filter SQL operators to the WHERE clause of the resulting enrichment SQL with the AND operator (in the example: WHERE {variable1.column_name} > 30). If you want to filter the same variable several times you can use a list as a dict value: {variable1.id: [”> 30”, “< 100”]}. The variables used to filter results should exist in variables property list.
A geopandas.GeoDataFrame enriched with the variables passed as argument.
EnrichmentError – if there is an error in the enrichment process.
Note that if the points of the `dataframe` you provide are contained in more than one geometry in the enrichment dataset, the number of rows of the returned `GeoDataFrame` could be different than the `dataframe` argument number of rows.
Examples
Enrich a points DataFrame with Catalog classes:
>>> df = pandas.read_csv('path/to/local/csv')
>>> variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> gdf_enrich = Enrichment().enrich_points(df, variables, geom_col='the_geom')
Enrich a points dataframe with several Variables using their ids:
>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = all_variables[:2]
>>> gdf_enrich = Enrichment().enrich_points(df, variables, geom_col='the_geom')
Enrich a points dataframe with filters:
>>> df = pandas.read_csv('path/to/local/csv')
>>> variable = Catalog().country('usa').category('demographics').datasets[0].variables[0]
>>> filters = {variable.id: "= '2019-09-01'"}
>>> gdf_enrich = Enrichment().enrich_points(
... df,
... variables=[variable],
... filters=filters,
... geom_col='the_geom')
Enrich your polygons DataFrame with columns (Variable
) from one or more Dataset
in the Data Observatory by intersecting the polygons in the source DataFrame with geographies in the Data Observatory.
When a polygon intersects with multiple geographies, the proportional part of the intersection will be used to interpolate the quantity of the polygon value intersected, aggregating them. Most of Variable
instances have a Variable.agg_method
property which is used by default as an aggregation function, but you can overwrite it using the aggregation parameter (not even doing the aggregation). If a variable does not have the agg_method property set and you do not overwrite it (with the aggregation parameter), the variable column will be skipped from the enrichment.
dataframe (pandas.DataFrame, geopandas.GeoDataFrame) – a DataFrame instance to be enriched.
variables (Variable
, list, str) – variable ID, slug or Variable
instance or list of variable IDs, slugs or Variable
instances taken from the Data Observatory Catalog
.
geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.
filters (dict, optional) – dictionary to filter results by variable values. As a key it receives the variable id, and as value receives a SQL operator, for example: {variable1.id: “> 30”}. It works by appending the filter SQL operators to the WHERE clause of the resulting enrichment SQL with the AND operator (in the example: WHERE {variable1.column_name} > 30). If you want to filter the same variable several times you can use a list as a dict value: {variable1.id: [”> 30”, “< 100”]}. The variables used to filter results should exist in variables property list.
aggregation (None, str, list, optional) –
sets the data aggregation. The polygons in the source DataFrame can intersect with one or more polygons from the Data Observatory. With this method you can select how to aggregate the resulting data.
An aggregation method can be one of these values: ‘MIN’, ‘MAX’, ‘SUM’, ‘AVG’, ‘COUNT’, ‘ARRAY_AGG’, ‘ARRAY_CONCAT_AGG’, ‘STRING_AGG’ but check this documentation for a complete list of aggregate functions.
The options are: - str (default): ‘default’. Most Variable`s has a default aggregation method in the :py:attr:`Variable.agg_method
property and it will be used to aggregate the data (a variable could not have agg_method defined and in this case, the variable will be skipped). - None: use this option to do the aggregation locally by yourself. You will receive a row of data from each polygon intersected. Also, you will receive the areas of the polygons intersection and the polygons intersected. - str: if you want to overwrite every default aggregation method, you can pass a string with the aggregation method to use. - dictionary: if you want to overwrite some default aggregation methods from your selected variables, use a dict as Variable.id
: aggregation method pairs, for example: {variable1.id: ‘SUM’, variable3.id: ‘AVG’}. Or if you want to use several aggregation method for one variable, you can use a list as a dict value: {variable1.id: [‘SUM’, ‘AVG’], variable3.id: ‘AVG’}
A geopandas.GeoDataFrame enriched with the variables passed as argument.
EnrichmentError – if there is an error in the enrichment process.
Note that if the geometry of the `dataframe` you provide intersects with more than one geometry in the enrichment dataset, the number of rows of the returned `GeoDataFrame` could be different than the `dataframe` argument number of rows.
Examples
Enrich a polygons dataframe with one Variable:
>>> df = pandas.read_csv('path/to/local/csv')
>>> variable = Catalog().country('usa').category('demographics').datasets[0].variables[0]
>>> variables = [variable]
>>> gdf_enrich = Enrichment().enrich_polygons(df, variables, geom_col='the_geom')
Enrich a polygons dataframe with all Variables from a Catalog Dataset:
>>> df = pandas.read_csv('path/to/local/csv')
>>> variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> gdf_enrich = Enrichment().enrich_polygons(df, variables, geom_col='the_geom')
Enrich a polygons dataframe with several Variables using their ids:
>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = [all_variables[0].id, all_variables[1].id]
>>> cdf_enrich = Enrichment().enrich_polygons(df, variables, geom_col='the_geom')
Enrich a polygons dataframe with filters:
>>> df = pandas.read_csv('path/to/local/csv')
>>> variable = Catalog().country('usa').category('demographics').datasets[0].variables[0]
>>> filters = {variable.id: "= '2019-09-01'"}
>>> gdf_enrich = Enrichment().enrich_polygons(
... df,
... variables=[variable],
... filters=filters,
... geom_col='the_geom')
Enrich a polygons dataframe overwriting every variables aggregation method to use SUM function:
>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = all_variables[:3]
>>> gdf_enrich = Enrichment().enrich_polygons(
... df,
... variables,
... aggregation='SUM',
... geom_col='the_geom')
Enrich a polygons dataframe overwriting some of the variables aggregation methods:
>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variable1 = all_variables[0] // variable1.agg_method is 'AVG' but you want 'SUM'
>>> variable2 = all_variables[1] // variable2.agg_method is 'AVG' and it is what you want
>>> variable3 = all_variables[2] // variable3.agg_method is 'SUM' but you want 'AVG'
>>> variables = [variable1, variable2, variable3]
>>> aggregation = {
... variable1.id: 'SUM',
... variable3.id: 'AVG'
>>> }
>>> gdf_enrich = Enrichment().enrich_polygons(
... df,
... variables,
... aggregation=aggregation,
... geom_col='the_geom')
Enrich a polygons dataframe using several aggregation methods for a variable:
>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variable1 = all_variables[0] // variable1.agg_method is 'AVG' but you want 'SUM' and 'AVG'
>>> variable2 = all_variables[1] // variable2.agg_method is 'AVG' and it is what you want
>>> variable3 = all_variables[2] // variable3.agg_method is 'SUM' but you want 'AVG'
>>> variables = [variable1, variable2, variable3]
>>> aggregation = {
... variable1.id: ['SUM', 'AVG'],
... variable3.id: 'AVG'
>>> }
>>> cdf_enrich = Enrichment().enrich_polygons(df, variables, aggregation=aggregation)
in case you want to use your custom function for aggregating the data):
>>> df = pandas.read_csv('path/to/local/csv')
>>> all_variables = Catalog().country('usa').category('demographics').datasets[0].variables
>>> variables = all_variables[:3]
>>> gdf_enrich = Enrichment().enrich_polygons(
... df,
... variables,
... aggregation=None,
... geom_col='the_geom')
The next example uses filters to calculate the SUM of car-free households Variable
of the Catalog
for each polygon of my_local_dataframe pandas DataFrame only for areas with more than 100 car-free households:
>>> variable = Variable.get('no_cars_d19dfd10')
>>> gdf_enrich = Enrichment().enrich_polygons(
... my_local_dataframe,
... variables=[variable],
... aggregation={variable.id: 'SUM'},
... filters={variable.id: '> 100'},
... geom_col='the_geom')
Bases: object
This class is used to list the datasets and geographies you have acquired a subscription (or valid license) for.
This class won’t show any dataset or geography tagged in the catalog as is_public_data since those data do not require a subscription.
List of Dataset
you have a subscription for.
CatalogError – if there’s a problem when connecting to the catalog.
List of Geography
you have a subscription for.
CatalogError – if there’s a problem when connecting to the catalog.
Bases: object
This class represents a SubscriptionInfo
of datasets and geographies in the Catalog
It contains private metadata (you need a CARTO account to query them) that is useful when you want a subscription license for a specific dataset or geography.
The ID of the dataset or geography.
Estimated days in which, once you Dataset.subscribe
or Geography.subscribe
, you’ll get a license.
Your licensed datasets and geographies will be returned by the catalog.subscriptions
method.
For the datasets and geographies listed in the catalog.subscriptions
method you can: - Dataset.download
or Geography.download
- Use their Dataset.variables in the Enrichment
functions
Price in $ for a one year subscription for this dataset.
Legal Terms Of Service.
Link to additional information for the legal Terms Of Service.
Description of the licenses.
Link to additional information about the available licenses.
Rights over the dataset or geography when you buy a license by means of a subscription.
Bases: abc.ABC
This is an internal class the rest of the classes related to the catalog discovery extend.
Properties: id, slug (a shorter ID).
Static methods: get, get_all, get_list to retrieve elements or lists of objects in the catalog such as datasets, categories, variables, etc.
Instance methods to convert to pandas Series, Python dict, compare instances, etc.
As a rule of thumb you don’t directly use this class, it is documented for inheritance purposes.
The ID of the entity.
The slug (short ID) of the entity.
Get an instance of an entity by ID or slug.
id (str) – ID or slug of a catalog entity.
CatalogError – if there’s a problem when connecting to the catalog or no entities are found.
List all instances of an entity.
filters (dict, optional) – Dict containing pairs of entity properties and its value to be used as filters to query the available entities. If none is provided, no filters will be applied to the query.
Get a list of instance of an entity by a list of IDs or slugs.
id_list (list) – List of ID or slugs of entities in the catalog to retrieve instances.
CatalogError – if there’s a problem when connecting to the catalog or no entities are found.
Converts the entity instance to a pandas Series.
Converts the entity instance to a Python dict.
Check if the entity is subscribed
Bases: list
This is an internal class that represents a list of entities in the catalog of the same type.
Instance methods to convert to get an instance of the entity by ID and to convert the list to a pandas DataFrame for further filtering and exploration.
As a rule of thumb you don’t directly use this class, it is documented for inheritance purposes.
Converts a list to a pandas DataFrame.
Examples
>>> catalog = Catalog()
>>> catalog.categories.to_dataframe()
Bases: object
SQLClient class is a client to run SQL queries in a CARTO account. It also provides basic SQL utilities for analyzing and managing tables.
credentials (Credentials
) – A Credentials
instance can be used in place of a username`|`base_url / api_key combination.
Example
>>> sql = SQLClient(credentials)
Run a SQL query. It returns a list with content of the response. If the verbose param is True it returns the full SQL response in a dict. For more information check the SQL API documentation <https://carto.com/developers/sql-api/reference/#tag/Single-SQL-Statement>.
query (str) – SQL query.
verbose (bool, optional) – flag to return all the response. Default False.
Example
>>> sql.query('SELECT * FROM table_name')
Run a long running query. It returns an object with the status and information of the job. For more information check the Batch API documentation <https://carto.com/developers/sql-api/reference/#tag/Batch-Queries>.
query (str) – SQL query.
Example
>>> sql.execute('DROP TABLE table_name')
Get the distict values and their count in a table for a specific column.
table_name (str) – name of the table.
column_name (str) – name of the column.
Example
>>> sql.distinct('table_name', 'column_name')
[('value1', 10), ('value2', 5)]
Get the number of elements of a table.
table_name (str) – name of the table.
Example
>>> sql.count('table_name')
15
Get the bounds of the geometries in a table.
table_name (str) – name of the table containing a “the_geom” column.
Example
>>> sql.bounds('table_name')
[[-1,-1], [1,1]]
Show information about the schema of a table.
table_name (str) – name of the table.
raw (bool, optional) – return raw dict data if set to True. Default False.
Example
>>> sql.schema('table_name')
Column name Column type
-------------------------------------
cartodb_id number
the_geom geometry
the_geom_webmercator geometry
column1 string
column2 number
Show information about a column in a specific table. It returns the COUNT of the table. If the column type is number it also returns the AVG, MIN and MAX.
table_name (str) – name of the table.
column_name (str) – name of the column.
Example
>>> sql.describe('table_name', 'column_name')
count 1.00e+03
avg 2.00e+01
min 0.00e+00
max 5.00e+01
type: number
Create a table with a specific table name and columns.
table_name (str) – name of the table.
column_types (dict) – dictionary with the column names and types.
if_exists (str, optional) – collision strategy if the table already exists in CARTO. Options are ‘fail’ or ‘replace’. Default ‘fail’.
cartodbfy (bool, optional) – convert the table to CARTO format. Default True. More info here <https://carto.com/developers/sql-api/guides/creating-tables/#create-tables>.
Example
>>> sql.create_table('table_name', {'column1': 'text', 'column2': 'integer'})
Insert a row to the table.
table_name (str) – name of the table.
columns_values (dict) – dictionary with the column names and values.
Example
>>> sql.insert_table('table_name', {'column1': ['value1', 'value2'], 'column2': [1, 2]})
Update the column’s value for the rows that match the condition.
table_name (str) – name of the table.
column_name (str) – name of the column.
column_value (str) – value of the column.
condition (str) – “where” condition of the request.
Example
>>> sql.update_table('table_name', 'column1', 'VALUE1', 'column1='value1'')
Rename a table from its table name.
table_name (str) – name of the original table.
new_table_name (str) – name of the new table.
Example
>>> sql.rename_table('table_name', 'table_name2')
Remove a table from its table name.
table_name (str) – name of the table.
Example
>>> sql.drop_table('table_name')
Viz namespace contains all the classes to create visualizations based on data
Bases: object
Map to display a data visualization. It must contain a one or multiple Map
instances. It provides control of the basemap, bounds and properties of the visualization.
layers (list of Layer
) – List of layers. Zero or more of Layer
.
basemap (str, optional) –
voyager, or darkmatter from the BaseMaps
class, or a hex, rgb or named color value.
If a Mapbox style, the access token is the value of the token key.
bounds (dict or list, optional) – a dict with west, south, east, north keys, or an array of floats in the following structure: [[west, south], [east, north]]. If not provided the bounds will be automatically calculated to fit all features.
size (tuple, optional) – a (width, height) pair for the size of the map. Default is (1024, 632).
viewport (dict, optional) – Properties for display of the map viewport. Keys can be bearing or pitch.
show_info (bool, optional) – Whether to display center and zoom information in the map or not. It is False by default.
is_static (bool, optional) – Default False. If True, instead of showing and interactive map, a png image will be displayed. Warning: UI components are not properly rendered in the static view, we recommend to remove legends and widgets before rendering a static map.
theme (string, optional) – Use a different UI theme (legends, widgets, popups). Available themes are dark and ligth. By default, it is light for Positron and Voyager basemaps and dark for DarkMatter basemap.
title (string, optional) – Title to label the map. and will be displayed in the default legend.
description (string, optional) – Text that describes the map and will be displayed in the default legend after the title.
ValueError – if input parameters are not valid.
Examples
Basic usage.
>>> Map(Layer('table in your account'))
Display more than one layer on a map.
>>> Map(layers=[
... Layer('table1'),
... Layer('table2')
>>> ])
Change the CARTO basemap style.
>>> Map(Layer('table in your account'), basemap=basemaps.darkmatter)
Choose a custom basemap style. Here we use the Mapbox streets style, which requires an access token.
>>> basemap = {
... 'style': 'mapbox://styles/mapbox/streets-v9',
... 'token': 'your Mapbox token'
>>> }
>>> Map(Layer('table in your account'), basemap=basemap)
Remove basemap and show a custom color.
>>> Map(Layer('table in your account'), basemap='yellow') # None, False, 'white', 'rgb(255, 255, 0)'
Set custom bounds.
>>> bounds = {
... 'west': -10,
... 'east': 10,
... 'north': -10,
... 'south': 10
>>> } # or bounds = [[-10, 10], [10, -10]]
>>> Map(Layer('table in your account'), bounds=bounds)
Show the map center and zoom value on the map (lower left-hand corner).
>>> Map(Layer('table in your account'), show_info=True)
Publish the map visualization as a CARTO custom visualization.
name (str) – The visualization name on CARTO.
password (str) – By setting it, your visualization will be protected by password. When someone tries to show the visualization, the password will be requested. To disable password you must set it to None.
credentials (Credentials
, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available. It is used to create the publication and also to save local data (if exists) into your CARTO account.
if_exists (str, optional) – ‘fail’ or ‘replace’. Behavior in case a publication with the same name already exists in your account. Default is ‘fail’.
maps_api_key (str, optional) – The Maps API key used for private datasets.
Example
Publishing the map visualization.
>>> tmap = Map(Layer('tablename'))
>>> tmap.publish('Custom Map Title', password=None)
Update the published map visualization.
name (str) – The visualization name on CARTO.
password (str) – setting it your visualization will be protected by password and using None the visualization will be public.
if_exists (str, optional) – ‘fail’ or ‘replace’. Behavior in case a publication with the same name already exists in your account. Default is ‘fail’.
PublishError – if the map has not been published yet.
Bases: object
Layer to display data on a map. This class can be used as one or more layers in Map
or on its own in a Jupyter notebook to get a preview of a Layer.
Map
if only visualizing data as a single layer.
source (str, pandas.DataFrame, geopandas.GeoDataFrame) – The source data: table name, SQL query or a dataframe. If dataframe, the geometry’s CRS must be WGS 84 (EPSG:4326).
style (dict, or Style
, optional) – The style of the visualization.
legends (bool, Legend
list, optional) – The legends definition for a layer. It contains a list of legend helpers. See Legend
for more information.
widgets (bool, list, or WidgetList
, optional) – Widget or list of widgets for a layer. It contains the information to display different widget types on the top right of the map. See WidgetList
for more information.
popup_click (popup_element <cartoframes.viz.popup_element> list, optional) – Set up a popup to be displayed on a click event.
popup_hover (bool, popup_element <cartoframes.viz.popup_element> list, optional) – Set up a popup to be displayed on a hover event. Style helpers include a default hover popup, set it to popup_hover=False to remove it.
credentials (Credentials
, optional) – A Credentials instance. This is only used for the simplified Source API. When a Source
is passed as source, these credentials is simply ignored. If not provided the credentials will be automatically obtained from the default credentials.
bounds (dict or list, optional) – a dict with west, south, east, north keys, or an array of floats in the following structure: [[west, south], [east, north]]. If not provided the bounds will be automatically calculated to fit all features.
geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.
default_legend (bool, optional) – flag to set the default legend. This only works when using a style helper. Default True.
default_widget (bool, optional) – flag to set the default widget. This only works when using a style helper. Default False.
default_popup_hover (bool, optional) – flag to set the default popup hover. This only works when using a style helper. Default True.
default_popup_click (bool, optional) – flag to set the default popup click. This only works when using a style helper. Default False.
title (str, optional) – title for the default legend, widget and popups.
encode_data (bool, optional) – By default, local data is encoded in order to save local space. However, when using very large files, it might not be possible to encode all the data. By disabling this parameter with encode_data=False the resulting notebook will be large, but there will be no encoding issues.
ValueError – if the source is not valid.
Examples
Create a layer with the defaults (style, legend).
>>> Layer('table_name') # or Layer(gdf)
Create a layer with a custom style, legend, widget and popups.
>>> Layer(
... 'table_name',
... style=color_bins_style('column_name'),
... legends=color_bins_legend(title='Legend title'),
... widgets=histogram_widget('column_name', title='Widget title'),
... popup_click=popup_element('column_name', title='Popup title')
... popup_hover=popup_element('column_name', title='Popup title'))
Create a layer specifically tied to a Credentials
.
>>> Layer(
... 'table_name',
... credentials=Credentials.from_file('creds.json'))
Layer map index
Bases: object
source (str, pandas.DataFrame, geopandas.GeoDataFrame) – a table name, SQL query, DataFrame, GeoDataFrame instance.
credentials (Credentials
, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available.
geom_col (str, optional) – string indicating the geometry column name in the source DataFrame.
encode_data (bool, optional) – Indicates whether the data needs to be encoded. Default is True.
Example
Table name.
>>> Source('table_name')
SQL query.
>>> Source('SELECT * FROM table_name')
DataFrame object.
>>> Source(df, geom_col='my_geom')
GeoDataFrame object.
>>> Source(gdf)
Setting the credentials.
>>> Source('table_name', credentials)
Bases: object
Create a layout of visualizations in order to compare them.
maps (list of Map
) – List of maps. Zero or more of Map
.
n_size (number, optional) – Number of columns of the layout
m_size (number, optional) – Number of rows of the layout
viewport (dict, optional) – Properties for display of the maps viewport. Keys can be bearing or pitch.
is_static (boolean, optional) – By default is False. All the maps in each visualization are interactive. In order to set them static images for performance reasons set is_static to True.
map_height (number, optional) – Height in pixels for each visualization. Default is 250.
full_height (boolean, optional) – When a layout visualization is published, it will fit the screen height. Otherwise, each visualization height will be map_height. Default True.
ValueError – if the input elements are not instances of Map
.
Examples
Basic usage.
>>> Layout([
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account'))
>>> ])
Display a 2x2 layout.
>>> Layout([
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account'))
>>> ], 2, 2)
Custom Titles.
>>> Layout([
... Map(Layer('table_in_your_account'), title="Visualization 1 custom title"),
... Map(Layer('table_in_your_account'), title="Visualization 2 custom title")),
>>> ])
Viewport.
>>> Layout([
... Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account'))
>>> ], viewport={ 'zoom': 2 })
>>> Layout([
... Map(Layer('table_in_your_account'), viewport={ 'zoom': 0.5 }),
... Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account'))
>>> ], viewport={ 'zoom': 2 })
Create an static layout
>>> Layout([
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account'))
>>> ], is_static=True)
Publish the layout visualization as a CARTO custom visualization.
name (str) – The visualization name on CARTO.
password (str) – By setting it, your visualization will be protected by password. When someone tries to show the visualization, the password will be requested. To disable password you must set it to None.
credentials (Credentials
, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available. It is used to create the publication and also to save local data (if exists) into your CARTO account.
if_exists (str, optional) – ‘fail’ or ‘replace’. Behavior in case a publication with the same name already exists in your account. Default is ‘fail’.
maps_api_key (str, optional) – The Maps API key used for private datasets.
Example
Publishing the map visualization.
>>> tlayout = Layout([
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account')),
... Map(Layer('table_in_your_account')), Map(Layer('table_in_your_account'))
>>> ])
>>> tlayout.publish('Custom Map Title', password=None)
Update the published layout visualization.
name (str) – The visualization name on CARTO.
password (str) – setting it your visualization will be protected by password and using None the visualization will be public.
if_exists (str, optional) – ‘fail’ or ‘replace’. Behavior in case a publication with the same name already exists in your account. Default is ‘fail’.
PublishError – if the map has not been published yet.
Helper function for quickly creating an animated style.
value (str) – Column to symbolize by.
duration (float, optional) – Time of the animation in seconds. Default is 20s.
fade_in (float, optional) – Time of fade in transitions in seconds. Default is 1s.
fade_out (float, optional) – Time of fade out transitions in seconds. Default is 1s.
color (str, optional) – Hex, rgb or named color value. Default is ‘#EE5D5A’ for points, ‘#4CC8A3’ for lines and #826DBA for polygons.
size (int, optional) – Size of point or line features.
opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.
stroke_width (int, optional) – Size of the stroke on point features.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
cartoframes.viz.style.Style
Helper function for quickly creating a basic style.
color (str, optional) – hex, rgb or named color value. Defaults is ‘#FFB927’ for point geometries and ‘#4CC8A3’ for lines.
size (int, optional) – Size of point or line features.
opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
cartoframes.viz.style.Style
Helper function for quickly creating a cluster map with continuously sized points. Cluster operations are performed in the back-end, so this helper can be used only with CARTO tables or SQL queries. It cannot be used with GeoDataFrames.
value (str) – Numeric column to aggregate.
operation (str, optional) – Cluster operation, defaults to ‘count’. Other options available are ‘avg’, ‘min’, ‘max’, and ‘sum’.
resolution (int, optional) – Resolution of aggregation grid cell. Set to 32 by default.
color (str, optional) – Hex, rgb or named color value. Defaults is ‘#FFB927’ for point geometries.
opacity (float, optional) – Opacity value. Default is 0.8.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating a color bins style.
value (str) – Column to symbolize by.
method (str, optional) – Classification method of data: “quantiles”, “equal”, “stdev”. Default is “quantiles”.
bins (int, optional) – Number of size classes (bins) for map. Default is 5.
breaks (list<int>, optional) – Assign manual class break values.
palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “purpor”.
size (int, optional) – Size of point or line features.
opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating a color category style.
value (str) – Column to symbolize by.
top (int, optional) – Number of categories. Default is 11. Values can range from 1 to 16.
cat (list<str>, optional) – Category list. Must be a valid list of categories.
palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “bold”.
size (int, optional) – Size of point or line features.
opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating a color continuous style.
value (str) – Column to symbolize by.
range_min (int, optional) – The minimum value of the data range for the continuous color ramp. Defaults to the globalMIN of the dataset.
range_max (int, optional) – The maximum value of the data range for the continuous color ramp. Defaults to the globalMAX of the dataset.
palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “bluyl”.
opacity (float, optional) – Opacity value. Default is 1 for points and lines and 0.9 for polygons.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating an isolines style. Based on the color category style.
value (str, optional) – Column to symbolize by. Default is “range_label”.
top (int, optional) – Number of categories. Default is 11. Values can range from 1 to 16.
cat (list<str>, optional) – Category list. Must be a valid list of categories.
palette (str, optional) – Palette that can be a named cartocolor palette or other valid color palette. Use help(cartoframes.viz.palettes) to get more information. Default is “pinkyl”.
size (int, optional) – Size of point or line features.
opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘rgba(150,150,150,0.4)’.
stroke_width (int, optional) – Size of the stroke on point features.
cartoframes.viz.style.Style
Helper function for quickly creating a size bind style with classification method/buckets.
value (str) – Column to symbolize by.
method (str, optional) – Classification method of data: “quantiles”, “equal”, “stdev”. Default is “quantiles”.
bins (int, optional) – Number of size classes (bins) for map. Default is 5.
breaks (list<int>, optional) – Assign manual class break values.
size_range (list<int>, optional) – Min/max size array as a string. Default is [2, 14] for point geometries and [1, 10] for lines.
color (str, optional) – Hex, rgb or named color value. Default is ‘#EE5D5A’ for point geometries and ‘#4CC8A3’ for lines.
opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating a size category style.
value (str) – Column to symbolize by.
top (int, optional) – Number of size categories. Default is 5. Values can range from 1 to 16.
cat (list<str>, optional) – Category list as a string.
size_range (list<int>, optional) – Min/max size array as a string. Default is [2, 20] for point geometries and [1, 10] for lines.
color (str, optional) – hex, rgb or named color value. Default is ‘#F46D43’ for point geometries and ‘#4CC8A3’ for lines.
opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating a size continuous style.
value (str) – Column to symbolize by.
size_range (list<int>, optional) – Min/max size array as a string. Default is [2, 40] for point geometries and [1, 10] for lines.
range_min (int, optional) – The minimum value of the data range for the continuous size ramp. Defaults to the globalMIN of the dataset.
range_max (int, optional) – The maximum value of the data range for the continuous size ramp. Defaults to the globalMAX of the dataset.
color (str, optional) – hex, rgb or named color value. Defaults is ‘#FFB927’ for point geometries and ‘#4CC8A3’ for lines.
opacity (float, optional) – Opacity value for point color and line features. Default is 0.8.
stroke_color (str, optional) – Color of the stroke on point features. Default is ‘#222’.
stroke_width (int, optional) – Size of the stroke on point features.
animate (str, optional) – Animate features by date/time or other numeric field.
cartoframes.viz.style.Style
Helper function for quickly creating a basic legend.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources
cartoframes.viz.legend.Legend
Example
>>> basic_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer')
Helper function for quickly creating a color bins legend.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
prop (str, optional) – Allowed properties are ‘color’ and ‘stroke_color’. It is ‘color’ by default.
variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.
dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True
.
ascending (boolean, optional) – If set to True
the values are sorted in ascending order. Defaults to False
.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
cartoframes.viz.legend.Legend
Example
>>> color_bins_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... dynamic=False,
... format='.2~s')
Helper function for quickly creating a color category legend.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
prop (str, optional) – Allowed properties are ‘color’ and ‘stroke_color’. It is ‘color’ by default.
variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.
dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True
.
cartoframes.viz.legend.Legend
Example
>>> color_category_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... dynamic=False)
Helper function for quickly creating a color continuous legend.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
prop (str, optional) – Allowed properties are ‘color’ and ‘stroke_color’. It is ‘color’ by default.
variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.
dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True
.
ascending (boolean, optional) – If set to True
the values are sorted in ascending order. Defaults to False
.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
cartoframes.viz.legend.Legend
Example
>>> color_continuous_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... dynamic=False,
... format='.2~s')
Helper function for quickly creating a default legend based on the style. A style helper is required.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
cartoframes.viz.legend.Legend
Example
>>> default_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... format='.2~s')
Helper function for quickly creating a size bins legend.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
prop (str, optional) – Allowed properties are ‘size’ and ‘stroke_width’. It is ‘size’ by default.
variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.
dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True
.
ascending (boolean, optional) – If set to True
the values are sorted in ascending order. Defaults to False
.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
cartoframes.viz.legend.Legend
Example
>>> size_bins_style(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... dynamic=False,
... format='.2~s')
Helper function for quickly creating a size category legend.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
prop (str, optional) – Allowed properties are ‘size’ and ‘stroke_width’. It is ‘size’ by default.
variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.
dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True
.
cartoframes.viz.legend.Legend
Example
>>> size_category_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... dynamic=False)
Helper function for quickly creating a size continuous legend.
prop (str, optional) – Allowed properties are ‘size’ and ‘stroke_width’.
dynamic (boolean, optional) – Update and render the legend depending on viewport changes. Defaults to True
.
title (str, optional) – Title of legend.
description (str, optional) – Description in legend.
footer (str, optional) – Footer of legend. This is often used to attribute data sources.
variable (str, optional) – If the information in the legend depends on a different value than the information set to the style property, it is possible to set an independent variable.
ascending (boolean, optional) – If set to True
the values are sorted in ascending order. Defaults to False
.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
cartoframes.viz.legend.Legend
Example
>>> size_continuous_legend(
... title='Legend title',
... description='Legend description',
... footer='Legend footer',
... dynamic=False,
... format='.2~s')
Helper function for quickly creating an animated widget.
The animation widget includes an animation status bar as well as controls to play or pause animated data. The filter property of your map’s style, applied to either a date or numeric field, drives both the animation and the widget. Only one animation can be controlled per layer.
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom.
prop (str, optional) – Property of the style to get the animation. Default “filter”.
cartoframes.viz.widget.Widget
Example
>>> animation_widget(
... title='Widget title',
... description='Widget description',
... footer='Widget footer')
Helper function for quickly creating a default widget.
The default widget is a general purpose widget that can be used to provide additional information about your map.
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom.
cartoframes.viz.widget.Widget
Example
>>> basic_widget(
... title='Widget title',
... description='Widget description',
... footer='Widget footer')
Helper function for quickly creating a category widget.
value (str) – Column name of the category value.
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom.
read_only (boolean, optional) – Interactively filter a category by selecting it in the widget. Set to “False” by default.
weight (int, optional) – Weight of the category widget. Default value is 1.
cartoframes.viz.widget.Widget
Example
>>> category_widget(
... 'column_name',
... title='Widget title',
... description='Widget description',
... footer='Widget footer')
Helper function for quickly creating a default widget based on the style. A style helper is required.
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom.
cartoframes.viz.widget.Widget
Example
>>> default_widget(
... title='Widget title',
... description='Widget description',
... footer='Widget footer')
Helper function for quickly creating a formula widget.
Formula widgets calculate aggregated values (‘avg’, ‘max’, ‘min’, ‘sum’) from numeric columns or counts of features (‘count’) in a dataset.
A formula widget’s aggregations can be calculated on ‘global’ or ‘viewport’ based values. If you want the values in a formula widget to update on zoom and/or pan, use viewport based aggregations.
value (str) – Column name of the numeric value.
operation (str) – attribute for widget’s aggregated value (‘count’, ‘avg’, ‘max’, ‘min’, ‘sum’).
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom.
is_global (boolean, optional) – Account for calculations based on the entire dataset (‘global’) vs. the default of ‘viewport’ features.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
cartoframes.viz.widget.Widget
Example
>>> formula_widget(
... 'column_name',
... title='Widget title',
... description='Widget description',
... footer='Widget footer')
>>> formula_widget(
... 'column_name',
... operation='sum',
... title='Widget title',
... description='Widget description',
... footer='Widget footer',
... format='.2~s')
Helper function for quickly creating a histogram widget.
Histogram widgets display the distribution of a numeric attribute, in buckets, to group ranges of values in your data.
By default, you can hover over each bar to see each bucket’s values and count, and also filter your map’s data within a given range
value (str) – Column name of the numeric or date value.
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom.
read_only (boolean, optional) – Interactively filter a range of numeric values by selecting them in the widget. Set to “False” by default.
buckets (number, optional) – Number of histogram buckets. Set to 20 by default.
weight (int, optional) – Weight of the category widget. Default value is 1.
cartoframes.viz.widget.Widget
Example
>>> histogram_widget(
... 'column_name',
... title='Widget title',
... description='Widget description',
... footer='Widget footer',
... buckets=9)
Helper function for quickly creating a time series widget.
The time series widget enables you to display animated data (by aggregation) over a specified date or numeric field. Time series widgets provide a status bar of the animation, controls to play or pause, and the ability to filter on a range of values.
value (str) – Column name of the numeric or date value.
title (str, optional) – Title of widget.
description (str, optional) – Description text widget placed under widget title.
footer (str, optional) – Footer text placed on the widget bottom
read_only (boolean, optional) – Interactively filter a range of numeric values by selecting them in the widget. Set to “False” by default.
buckets (number, optional) – Number of histogram buckets. Set to 20 by default.
weight (int, optional) – Weight of the category widget. Default value is 1.
cartoframes.viz.widget.Widget
Example
>>> time_series_widget(
... 'column_name',
... title='Widget title',
... description='Widget description',
... footer='Widget footer',
... buckets=10)
Helper function for quickly adding a default popup element based on the style. A style helper is required.
title (str, optional) – Title for the given value. By default, it’s the name of the value.
operation (str, optional) – Cluster operation, defaults to ‘count’. Other options available are ‘avg’, ‘min’, ‘max’, and ‘sum’.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
Example
>>> default_popup_element(title='Popup title', format='.2~s')
Helper function for quickly adding a popup element to a layer.
value (str) – Column name to display the value for each feature.
title (str, optional) – Title for the given value. By default, it’s the name of the value.
format (str, optional) – Format to apply to number values in the widget, based on d3-format specifier (https://github.com/d3/d3-format#locale_format).
Example
>>> popup_element('column_name', title='Popup title', format='.2~s')
Get all map visualizations published by the current user.
credentials (Credentials
, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available.
Delete a map visualization published by id.
name (str) – name of the publication to be deleted.
credentials (Credentials
, optional) – A Credentials instance. If not provided, the credentials will be automatically obtained from the default credentials if available.
Update the metrics configuration.
enabled (bool) – flag to enable/disable metrics.
Set the level of the log in the library.
level (str) – log level name. By default it’s set to “info”. Valid log levels are:
"critical" –
"error" –
"warning" –
"info" –
"debug" –
"notset". –
Decodes a DataFrame column. It detects the geometry encoding and it decodes the column if required. Supported geometry encodings are:
WKB (Bytes, Hexadecimal String, Hexadecimal Bytestring)
Extended WKB (Bytes, Hexadecimal String, Hexadecimal Bytestring)
WKT (String)
Extended WKT (String)
geom_col (array) – Column containing the encoded geometry.
Example
>>> decode_geometry(df['the_geom'])
Bases: Exception
This exception is raised when a problem is encountered while using DO functions.
Bases: cartoframes.exceptions.DOError
This exception is raised when a problem is encountered while using catalog functions.
Bases: cartoframes.exceptions.DOError
This exception is raised when a problem is encountered while using enrichment functions.
Bases: Exception
This exception is raised when a problem is encountered while publishing visualizations.