Boost your Spatial Analytics with Snowflake ML in CARTO

Bringing the power of machine learning to spatial analytics has often been a challenge, requiring complex coding and deep technical expertise. For GIS and data professionals looking to integrate forecasting and classification models into their workflows, the learning curve and implementation barriers can be steep. But with the new Snowflake ML extension for CARTO Workflows, users can now seamlessly build, train, and deploy ML models - all easily inside their lakehouse. In this post, we’ll explore how this extension simplifies machine learning in spatial analytics, walking through a real-world example of crime forecasting in Chicago.
Snowflake ML is the last addition to our set of pre-built extension packages for CARTO Workflows. Extension packages enable CARTO users to design and introduce additional functions to Workflows, expanding on their off-the-shelf functionality. In this post, we will showcase some of the functions that this extension provides, along with a complete use case that shows how easy it is to roll out your own forecasting models in Workflows.
This new Extension Package allows users to take advantage of Snowflake ML functions, which include some high-level abstractions to create, train and use both classification and forecasting models. With this approach, users can manage the complete ML pipeline all within CARTO Workflows, without the need for a single line of code. Training and evaluation is as simple as using the predefined components, and inference can be performed after training or in a completely different Workflow where end-users consume the predictions of a pre-trained model – inside or outside CARTO Workflows!
The functions available in the Snowflake ML package are below, and include classification and forecasting tools. Learn more about what each of these do in our Documentation.

This extension package is a way to call the underlying Snowflake ML functions provided by Snowflake from Workflows. For further information about the models used and the parameters, please refer to the original Snowflake ML documentation.
The best way to understand how easy it is to work with Snowflake ML from Workflows is to take a look at a real-world use case. In this example, we will be using the Chicago crime dataset, an open-source dataset that tracks the crimes happening in the city of Chicago weekly from 2001 to the present day. We will train a forecasting model, check its performance against an evaluation dataset and save the forecast of 2024 to visualize it in CARTO Builder - our tool for creating dynamic data visualizations.
Want to follow along? Sign up here for a free 14-day CARTO trial!
Being able to forecast this data is a very valuable tool in the insurance industry, where this data can be useful for risk assessment and adjusting premiums accordingly, with this information being a good proxy for the frequency of vandalism and property damage. It’s also useful to be able to forecast crimes and be able to understand which variables are relevant to the model and how the crime rate is changing in different neighborhoods through time.
To achieve these goals, we’ll be building the below workflow:

Before getting started with the actual analysis, we need to install the extension package in our account so that the components are available. To do so, in any Workflow you can navigate to the Components ribbon in the left sidebar, where you can find the Manage Extension Packages button right at the bottom. Click to find the Snowflake ML for Workflows package, where you can install or update it. After installing, there will be a new section in the Components tab where all the Snowflake ML components can be found to drag and drop onto your canvas .

First, we will need to import the data into Snowflake. This data is readily available in a bucket and can be accessed here, which we can access with an Import from URL component. This data includes a selection of 18 H3 cells all across Chicago including violent crimes data, and we will attempt to develop a model able to forecast it. This can be useful for the insurance sector when performing risk assessment for different assets in these zones. We will be training the model on the weekly data from 2001 to 2023, and setting aside 2024 to plot the forecast and compare the results in a map to the real data - we’ll apply these filters with a Date Cutoff component.

After importing and splitting the data, we can use the Edit Schema component to select the variables we want to include in the model. Three of those columns are compulsory: a time series ID (the H3 column), a timestep ID (the WEEK column) and the dependent variable (in this case is called COUNTS). We will also include an exogenous variable in the model – the holiday calendar, which could be useful and is data that is available for the future.
We configure all of these choices in the Create Forecasting Component along with the data frequency (weekly) and we are ready to use the resulting model. We will use the unmatch output of the date cutoff to predict our 2024 data.

Before moving on, we can use two components to better understand the resulting model. Firstly, Feature Importance provides a score per variable and time series, useful to understand the effect of each input feature. In this case we can see how some of the time series do use the holiday information that we included earlier while others ignore it altogether, relying solely on the characteristics of the time series. We can also check the Evaluation Forecast component, which returns some metrics that were computed during the Create Forecasting Model phase. The component shows some commonly used metrics like MAE, MAPE and a 95% Confidence Interval Coverage. That way, we can check if our model is good enough for our use case or track how different tuning options or different inputs affect the performance of it. In the screenshot below we can see how the mean absolute errors for each series are reasonable for our use case, so we can accept the current model configuration and training.

Finally, we can visualize the resulting forecast by aggregating all the 2024 information (right) and plotting it side by side with our ground truth (left). We show this information in the map below (or open in full-screen here)., where we can see how the model seems to catch fairly well the general dynamic along different zones of the city.
Snowflake ML in CARTO Workflows makes machine learning more accessible than ever, empowering users to easily integrate predictive analytics into their spatial data workflows. By leveraging pre-built components, users can easily train, evaluate, and deploy models for tasks like crime forecasting, risk assessment, and beyond. As machine learning continues to shape spatial analytics, tools like this extension will be key in making advanced modeling techniques available to a wider audience.
Want to try it yourself? Don’t forget to sign up for a free 14-day CARTO trial today!





.png)



