• ## A

• ### Agglomerative clustering

Type of hierarchical clustering where clusters are built from the bottom up. This algorithm starts building clusters where each object is in its own cluster, then clusters are recursively merged (agglomerated) using a 'linkage strategy' such as minimizing the sum of squared distances within a cluster.

• ### Areal data

Data associated with a fixed set of locations, with well-defined boundaries. The boundaries can be irregular, as in the case of administrative units (e.g. districts, regions, counties), or can be defined by a regular grid, as in the case of raster data. Typical applications consist of model inference, prediction at unsampled locations, and spatial smoothing.

• ## B

• ### Bayes theorem

Provides a way to calculate the probability of a hypothesis based on its prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself.

• ### Bayesian Methods

Use Bayes' theorem to compute and update probabilities after obtaining new data.

• ## C

• ### Cartography

The science and technique of building maps to communicate spatial information.

• ### Clustering

Statistical technique of grouping the data in such a way that those belonging to the same group (cluster) are more likely to be similar that those in other clusters.

• ### Complete Spatial Randomness (CSR)

If data is distributed randomly and uniformly over the study area, it is said to exhibit CSR.

• ### Conditional Autoregressive Models (CAR) / Simultaneoud Autoregressive Models (SAR)

Conditional (CAR) and simultaneous (SAR) random spatial effects for fitting hierarchical Bayesian models.

• ### Cross-validation

Model validation technique for assessing how the results of a statistical model will generalize to new data. It involves partitioning a sample of data into complementary subsets, performing the analysis on one subset and validating the analysis on the other subset.

• ## D

• ### Data Analysis

A process for cleaning & transforming data to extract useful insights for making decisions.

• ### Density-based spatial clustering of applications with noise (DBSCAN)

Clustering method that groups together data that are close to each other based on a distance metric and a minimum number of data points. Using the appropriate metric, can be applied to the coordinates of point reference data to perform a spatial clustering.

• ## F

• ### FisherJenks algorithm

Clustering method designed to determine the best arrangement of values into different classes.

• ## G

• ### Gaussian Markov Random Field (GMRF)

Stochastic process such that satisfies the Markovian property that the parameters of the i-th area are independent on all the other parameters given the set of its neighbours.

• ### Gaussian Process (GP)

Stochastic process such that has Gaussian distributed marginal distributions. It is parametriezed by a mean function and covariance function, which apply to vectors of inputs and return a mean vector and covariance matrix which provide the mean and covariance of the outputs corresponding to those input points in the functions drawn from the process.

• ### Geocode

Create pint geometrics in your data.

• ### Geocoding

The process that converts addresses, such as a street, into latitude & longitude coordinates, which you can use to position a marker on a map.

• ### Geographic Information System (GIS)

A system to capture and analyze spatial data.

• ### Geographically Weighted Regression (GWR)

Spatially varying coefficient model used as an exploratory technique intended to indicate where non-stationarity is taking place.

• ### GeoJSON

An open format designed for encoding spatial data.

• ### Geospatial data

Data with a geographic component.

• ### Graphical model

Collection of random variables that are associated with the nodes of a graph.

• ## H

• ### Hirerarchical Bayesian Model

Statistical model written in a hierarchical (multi-level) form that estimates the parameters of the posterior distribution using the Bayesian method.

• ## I

• ### Integrated Nested Laplace Approximation (INLA)

Combines analytical approximations and efficient numerical integration schemes to achieve highly accurate deterministic approximations to the posterior distribution.

• ### Intrinsic spatial stationarity

A stochastic process is said to be instrinsically stationary if its variance function does not change when shifted in space.

• ### Isochrone

Isoline for travel time, that is a curve of equal travel time.

• ## J

• ### Jupyter notebook

An open-source web that allows data scientist to create and share documents that contain live code, equations, visualizations and text.

• ## K

• ### K-means

Non-spatial clustering method that aims to partiion the data into a fixed number of clusters in which each data point belongs to the cluster with the closest mean.

• ### Kriging

Spatial interpolation method used to derive predictions at unmeasured location based on a GP. The covariance function is usually derived from a variogram analysis.

• ## L

• ### letterOpen Data

Data freely available for everyone to use without restrictions.

• ### letterPoints-of-Interest (POI)

Location that someone may find useful or interesting, such as, restaurants, monuments, parks, schools...

• ### Location Intelligence

The methodology for transforming your location data into business outcomes. Location data can be anything from addresses and latitude/longitude coordinates, to existing points, lines, and polygons.

• ## M

• ### Markov Chain Monte Carlo (MCMC)

Class of simulation methods used to approximate the posterior distribution by randomly sampling in a probabilistic space.

• ### Markov chains

Stochastic model describing a sequence of possible states in which the probability of each state depends only on the previous state.

• ### Matplotlib

Comprehensive library for creating static, animated, and interactive visualizations in Python.

• ### Model

The model is the formulation of the problem.

• ### Moran Statistics

Measure of global and local spatial autocorrelation for areal data.

• ## N

• ### Network data

Data associated to a set of ordered points, connected by straight lines. Examples include data from mobility networks, internet, and mobile phone networks. Typical applications include the analysis of spatial networks and route optimization.

• ## P

• ### Pandas

Fast and open source data analysis and manipulation tool, built on top of the Python programming language.

• ### Point patterns

Data representing occurrences of events where locations themselves are random. In this context, this data is useful in evaluating possible clustering or inhibition between the observations.

• ### Point-referenced data

Data associated with a spatial index that varies continuously across space. Examples include data from GPS tracking, fixed devices, high resolution satellites. This data is often useful for model inference and prediction at unsampled locations.

• ### PostgreSQL

PostgreSQL is a general purpose and object-relational database management system.

• ### Python

Programming language.

• ## R

• ### R

Programming language.

• ### Regionalization

Type of clustering that enforces contiguity constraints on the geographies. That means that smaller geographies can be put together to form larger, contiguous regions that are constructed to optimize for qualities such as similar populations, homogenous measures (e.g., similar socio-demographic characteristics), and compactness among others.

• ### Root Mean Square Error (RMSE)

RMSE is the standard deviation of the prediction errors.

• ## S

• ### Skater

Regionalization method that works by constructing a contiguity-based minimum spanning tree that ensures homogeneity within trees by minimizing costs that are the inverse of the similarity of joined regions.

• ### Spatial clustering

Clustering methods accounting for the spatial relationships inherent in spatial data.

• ### Spatial Confounding

Spatial confounding occurs when adding a spatially-correlated error term changes the estimates of the fixed-effect coefficients, especially when the fixed effects are highly correlated with the spatially structured random effect.

• ### Spatial cross-validation

Cross-validation technique that uses the spatial information to partition the data into subsets.

• ### Spatial index

Data structure that allows for accessing a spatial object.

• ### Spatial Modelling

Consists of the analysis of spatial data (i.e. data that exhibits spatial dependence) to make inferences about the model parameters, to predict at unsampled locations, and for spatial smoothing.

• ### Spatial stationarity

A stochastic process is said to be stationary if its joint probability distribution does not change when shifted in space.

• ### Spatio-temporal modelling

Consists in the analysis of spatio-temporal data: the data are defiend by a process indexed by space and time.

• ### Stochastic Partial Differential Equation (SPDE) approximation

Consists in representing a GP (a continous spatial process) using a GMRF (a discretely index spatial process).

• ## U

• ### Uniform Manifold Approximation and Projection UMAP

Dimensionality reduction technique and clustering method.

• ## V

• ### Variogram

Defines the variability between data points as a function of distance only.

• ### Variogram analysis

Consists in computing the experimental variogram from the data and fit a variogram model to the empirical variogram to infer the parameters characterizing the spatial dependence.

• ## W

• ### Weak spatial stationarity

A stochastic process is said to be weak stationary if its covariance function does not change when shifted in space.