CartoDB is more than just a useful tool for visualizing spreadsheets. When you use CartoDB you have the ability to tap into powerful data manipulation and GIS tools as well. To showcase these capabilities, we are creating a short series, Mapping the invisible. As the title may suggest, we’ll be exploring different datasets, doing analysis and visualizations to tell stories that are hidden in that data, so you can learn and get inspired to work with CartoDB.
In our first installment, we’re going to dig through some publically available data from Citi Bike. The data are all recorded bike trips over a short period of time. Like many datasets that governments and businesses publish, there is often more than meets the eye. Here we are going to show you how you can transform data into entirely new and insightful visualizations. If you work with data and find this interesting, stay tuned and follow us on Twitter to stay on top of the series. Let’s go!
Since its inception last year, Citi Bike has been an active participant in the open data movement, prompting data enthusiasts to engage in analyzing the granular data taken from each of their 7.5 million rides. However, despite its popularity with the data community and bike riders, the company has been going through higher-than-expected costs as it considers rate hikes and additional outside investments. Although it is difficult to know the breakdown of the costs of running the program, the cost of transporting bikes between stations is likely to be significant. Here’s a closer look at how Citi Bike transfers their bikes from one location to another using some simple analysis of the information through maps.
Being a bike share, Citi Bike allows their riders to pick up and drop off a bike from any of their stations. Although this may seem obvious to many, car shares (i.e. Zipcar) generally require the users to return vehicles to the original pickup location. As a result, to make sure bikes are available where needed, Citi Bike needs to move bikes from one location to another depending on the availability and demand of bikes as each station. In general, one would expect to have fewer costs the less often the company needs to move the bikes.
Bike #17310 is the most widely used bike among all the bikes in the bike share, with over 1,600 rides as of May. The visualization below shows all the rides taken with the bike: red indicating trips by a rider and blue indicating trips done by a Citi Bike truck. In general, it looks like the bike is moved around without any extra help in the western part of Manhattan, but frequently needs to be moved by truck in the East Village and Brooklyn.
While the original data only records when the bike was riden by a user, you can use SQL to identify gaps in the network of Bike #17310’s movement:
This is the SQL we used to create the blue lines above. Our analysis assumes that all unrecorded moves are relocation events, but clearly there could be cases where the bike needed to be taken in for maintainance or other factors. Assumptions aside, this simple SQL statement let’s us start to look at the efficiency of the system.
To minimize costs, it would be important to find ways for the riders to do the majority of the moving, while the trucks move as few bikes as possible. On the other hand, the worst case scenario is where bikes need to be constantly ferried across the city. If this is spread among many stations, this will cost Citi Bike a significant amount of money to run their program. This begs the question, where should one place a station to minimize management effort?
The visualization below shows the commuter flow over various periods of time (use the layer toggle for different time periods), generated by this SQL
The map helps demonstrate the general trends of where bikes are flowing from one locaton to another, minus maintenance work.
Not surprisingly we can see that the largest flows are the morning and evening weekday commute from Times Square, Penn Station, and Grand Central Station. What is surprising is Bryant Park, a major hub of work commuters and very close to Grand Central and Port Authority, generated bike flow in the opposite directions. We wonder if there could be simple solutions for this cluster.
…which is not exactly what is happenening in the chart below. The visualization below shows actual pickup and dropoff times…
…highlighting a surprisingly large number of blinking lights in many of the stations. One notable example is the stations around Grand Central Station, around 42nd and Park Avenue. If this really is trucks ferrying bikes, wow! There, you can see alternating red and blue lights indicating constant ferring of bikes in and out of a single station by vans or other non-riders. It makes us want to dig deeper to find out which of these are vans and which are oddities in the data.
This is just a simple example of how you can use CartoDB to turn data into insights. If you’re interested in checking it out create a free account and start mapping your own data today.
One of my favorite parts about working at CARTO is getting to test out the new features that our product team is constantly shipping. A month or so ago, it was our Kepler c...Core Tech
Our primary goal at CARTO is to be the world’s leading Location Intelligence platform, empowering our clients with the best data and the best spatial analysis. We frequentl...Core Tech
Please fill out the below form and we'll be in touch real soon.