Data storytelling is the process of translating and presenting data analysis as meaningful information. Whether your data story promotes environmental conservation or a change in your organization’s go-to-market strategies, the goal for data storytellers remains the same: tell a story that engages audiences and encourages action.
Following these principles, our research team recently published a data story called A Million Walks in the Park.
Working with one month of nation-wide GPS data from LiveRamp, A Million Walks in the Park explores the use of public spaces, specifically public parks across New York City, to understand social interactions among New Yorkers.
Wenfei Xu, who led data analysis and visualization, and Stuart Lynn, who supervised the data engineering, had two additional objectives for this story:
Below, let’s take a closer look at the story behind our new data story.
One of the primary objectives behind A Million Walks in the Park was to demonstrate the difference between raw data and information, which Wenfei and Stuart define as:
Raw data refers to unprocessed, unorganized facts that are effectively meaningless as no useful information can be extracted.
Information refers to processed, organized facts that are presented in a meaningful manner to persuade an audience to act on data-driven insights.
The image below, for example, displays one month of raw data for cell phone pings in Manhattan’s Central Park.
In visualizing raw data, the map above presents readers with a lot of noise from which few, if any, meaningful information can be heard. Also, because GPS signals are not 100% accurate, some of the raw data is inaccurate or misrepresented. Notice the amount of cell phone pings located in the middle of the reservoir, which are likely not accurate.
Instead, take a look at the data visualization below displaying the same GPS data structured into spatial clusters. Using 1-dimensional range searches to calculate dwell time, Wenfei was able to identify several common points of interest (POI) including organic space, near the Metropolitan Museum of Art, or near the Delacorte Theater, Great Lawn, and Belvedere Castle.
Building a typology of spatial clusters and processing cell phone pings accordingly allowed Wenfei to add meaning to this GPS data. While the raw data treated all cell phone pings equally, Wenfei’s structured data visualization presents actual information about the data, like:
This type of information provides descriptive insights into how New Yorkers share public spaces allowing for more specific (and predictive!) questions to be asked of this GPS data set.
The first obstacle that Stuart and Wenfei faced when working with a month of nationwide GPS data was determining a manageable scale on which to work with this massive amount of cell phone data.
They decided to only focus on GPS data fromthe two largest parks in New York City, Manhattan’s Central Park and Brooklyn’s Prospect Park.
In the second part of A Million Walks in the Park, Wenfei and Stuart zoomed in using a more granular scale to explore 270 smaller parks found across Manhattan and Brooklyn.
To help viewers navigate this more granular GPS data, Wenfei and Stuart built an interactive tool that filters the data according to:
The image below shows a Hotspot map for Fort Greene Park in Brooklyn, New York, where you can see:
This interactive tool presents more granular information on activity in Fort Greene Park, for example:
This method, as it is now, is imperfect, but it is a huge advance and in the not so distant future we will be able to build a model that saves time and money for population count. Traditionally, this type of count data would be collected manually, resulting in far more subjective estimates that would also deplete resources in terms of time and costs. Urban planners and local city officials may find this information useful when determining future resource allocation to parks across the city.
The final section of A Million Walks in the Park is an overview that ranks each park according to a classification designed according to the volume of foot traffic each park received compared to its size.
Although a correlation exists between park size and park area, Wenfei discovered that park sizes scale sublinearly to visitor count. In other words, every 10-fold increase in park size yielded only an 8-fold increase in the number of visitors which means that some smaller parks are receiving a high volume of visitors than more expansive parks with far less foot traffic.
To reexamine and revise our previous understanding about park size and foot traffic, Wenfei devised the following classification system to rank each park.
What’s meaningful about this presentation of information involves the insights into the use of space throughout New York City.
If you’re looking to find an escape from the wear and tear of the city, then some of the Big and Beautiful parks (Spring Creek Park, Riverside Park South, Fresh Creek Nature Preserve, etc.) may be the perfect destination. If you’re looking for more well-known landmarks, then you may instead want to check out some Reliable Favorites (Fort Washington Park, The Battery, Randall’s Island Park, etc.).
There’s a lot more to say about A Million Walks in the Park so make sure to visit the page and learn how New Yorkers take a walk in the park!
Raster is faster but vector is corrector!
Data VisualizationMany comparisons have been made between these two iconic cities, not least in terms of size, population and other factors like quality of life and the best pizza slice, but...
Data VisualizationMeetup is the most popular platform to organize community events. With more than 40 million members and 320,000 Meetup groups, the site facilitates 12,000 meetups per day a...
Data VisualizationPlease fill out the below form and we'll be in touch real soon.