Urban Mobility Insights with MovingPandas & CARTO in Snowflake
Urban mobility is a complex and ever-changing landscape. Understanding the flow of people and vehicles through city streets is crucial for city planners, transportation engineers, and researchers. With urban populations growing, the need to optimize transportation systems and enhance urban livability has never been more pressing.
We’ve been working on combining the strengths of MovingPandas—a powerful Python library for movement data analysis—and CARTO, a user-friendly platform for scalab;e spatial analysis and visualization. Together, within the Snowflake Lakehouse environment, we offer a groundbreaking framework for analyzing mobility patterns. This integration enables users to uncover hidden traffic hotspots, optimize transportation networks, and ultimately make smarter urban planning decisions.
Previously, we explored how taxi trip data from Porto, Portugal, could be used to analyze mobility hotspots. By combining MovingPandas and CARTO, we demonstrated how to extract patterns and insights from extensive datasets. The taxi trip data consisted of GPS measurements taken every 15 seconds over a week.
We used this data to create a heatmap that visualized high-traffic density areas. From there, we expanded our analysis by processing the taxi trajectories with MovingPandas in Python. Finally, we used CARTO’s Spacetime hotspot tools to identify areas and times of high and low activity.
There were some limitations to this approach , such as running time and parallelization constraints, which led us to explore a more seamless integration with Snowflake.
Building on our previous work, we have now developed a robust framework that seamlessly integrates MovingPandas and CARTO within the Snowflake Lakehouse environment. This integration provides a powerful toolset for analyzing large-scale mobility datasets with efficiency and precision. The processing of the trajectories is performed using the MovingPandas library within the Snowflake environment and then the orchestration of the steps plus the spacetime hotspot analysis is done through CARTO Workflows.
- Data Ingestion and Preprocessing: Using Snowflake's User Defined Functions (UDFs) in Python, we preprocess and transform trajectory data. MovingPandas is available in Snowflake, allowing us to leverage its functionalities directly within the cloud environment.
- Spatial and Temporal Analysis: We perform detailed spatial and temporal analysis using MovingPandas, splitting trajectories into sub-trajectories and hourly segments to ensure comprehensive coverage.
- CARTO Integration: The processed data is seamlessly integrated with CARTO via it’s cloud-native connection to Snowflake. This enables more advanced visualizations and further spatial analysis, enabling users to explore the results interactively. Furthermore, leveraging the no-code tool Workflows, we can easily design and orchestrate the whole pipeline with CARTO platform.
The overall architecture is illustrated below:
- Creating UDFs in Snowflake: We developed three UDFs to process and transform the trajectory data (the code of the UDFs is available here):some text
- OutlierCleaner: Removes GPS points that are outliers due to incorrect location retrieval.
- TrajectoryClip: Splits trajectories into smaller segments where each segment intersects only with one H3 cell.
- TemporalSplitter: Further divides sub-trajectories into hourly segments, ensuring detailed temporal analysis (see below).
- Connecting CARTO to Snowflake: A connection to the Snowflake database is established within the CARTO platform. Custom components are created in CARTO’s Workflow to utilize these UDFs, enabling a smooth processing pipeline from raw data to actionable insights.
- Running SpaceTime Hotspots Analysis: The final step involves running the SpaceTime Getis Ord function in CARTO to identify areas of high and low traffic intensity, both spatially and temporally. This analysis highlights critical hotspots and coldspots, allowing for targeted urban planning interventions.
The Workflow is shown below and it is available here.
The outcome of this analysis is visualized in CARTO, where users can interactively explore the results. Examples of the intermediate steps are shown as well.
First we visualize the output of the first component, the OutlierCleaner. The result of passing this trajectory through the OutlierCleaner component is shown on the right hand figure, where the outlier gps point has been removed.
The second component of the workflow, TrajectoryClip, splits each trajectory into smaller segments where each segment intersects only with one H3 cell. An example of this can be seen in the following map.
Lastly, the outcome of this analysis is shown below. The H3/hour duration data is shown alongside the output of the SpaceTime Getis Ord analysis, revealing clear correlations between high-traffic areas and specific time windows.
The integration of MovingPandas with CARTO in Snowflake represents a significant advancement in urban mobility analysis. Here’s why it’s a game-changer:
- Scalability and Performance: Snowflake's cloud environment allows for the efficient processing of large datasets, making it possible to analyze mobility patterns at scale with greater accuracy.
- User-Friendly Tools: By leveraging CARTO’s no-code platform, even non-technical users can easily access and analyze complex spatial data, broadening the scope of who can contribute to urban planning decisions.
- Temporal Precision: The ability to analyze mobility data not just spatially but also temporally provides deeper insights, enabling more effective traffic management and urban planning.
- Future-Ready: As cities continue to evolve, this framework is equipped to scale and adapt, ensuring it remains a valuable tool for future urban mobility challenges.
Our work with MovingPandas and CARTO within the Snowflake Lakehouse environment illustrates the power of integrated tools for analyzing urban mobility. By seamlessly combining data processing, spatial and temporal analysis, and visualization, we provide a comprehensive framework that empowers users to extract valuable insights from complex mobility datasets.
From identifying high-demand areas to optimizing routes for transportation services, this integrated approach paves the way for smarter, more efficient urban transportation systems. As we continue to refine and expand this framework, its potential to transform urban mobility analysis is boundless.
This work was developed as part of the EMERALDS project, funded by the Horizon Europe R&I program under GA No. 101093051.