Map of the Week: Safest and riskiest areas of New York subway system by the New York Daily News
Welcome Sarah Ryley to our Map of the Week series. Sarah is the data projects editor at the New York Daily News which has the second-largest newspaper website in the United States. Crime reporting is the bread and butter of the Daily News so Sarah has been focused on making crime data interactive and accessible to the publication's millions of readers.
The Daily News' transit reporter Pete Donohue recently received a Freedom of Information Law request with police data on all felonies and misdemeanors over a roughly five-year period from July 1 2008 to Aug. 4 2013. The data included the time location category and crime classification of 48 749 incidents. Sarah developed an interactive web package that allow readers to explore this data in several different ways -by station crime type hour day month and quarter- while also uncovering trends such as the hours when the most crimes occur and the crowded train line that is the source of half the system's groping complaints. She also calculated a station-by-station crime rate by merging the dataset with the Metropolitan Transportation Authority's (MTA) ridership statistics. The CartoDB map displays both the rate and total of various crime categories by station.
Prepping the data
The raw data which can be downloaded here needed a lot of work before I could load it into CartoDB. First I removed all records after June 30 2013 so we would have a clean five-year period.
The train lines in the New York City Police Department data are based on an old map possibly this one from 1998 when an S train stopped at Lexington Avenue and Roosevelt Island and the Q ran along the B/D line in Manhattan. So I did a "change alias" on many line and station dimensions to match them up with the current system. I did this manually because many of the routes half-matched the current ones and figuring out the rest took some research.
##_NOT_FOUND_IMAGE_## https://i.imgur.com/IwZU8DN.png ##_NOT_FOUND_IMAGE_##
The locations are by post not station. A post is roughly equivalent to a platform so a multi-platform station like Union Square has multiple posts and in some cases an additional post if there is a district office located in the station. I did these groupings based on how the MTA does them for the purpose of calculating ridership.
Then there are 123 crime classifications. I grouped these dimensions based on New York State Penal Law. Then I was ready to do a lot of the more granular queries for the story and other two interactives. For the map I made a separate set of broader groupings -violent crimes property crimes drug crimes weapon possession and misdemeanor sex crimes-. I then queried the number of records for each broader group by station and exported the results into a spreadsheet.
I then joined the exported data with annual ridership figures and station entrance geocodes from the MTA's website.
Calculating the rate
Once I had all of this in one spreadsheet I could begin to calculate a rate. I consulted several academics on the best methodology and finally decided on a calculation per 100 000 trips.
A few things to consider:
MTA Ridership data is a count of the number of people swiping into a station. That doesn't include transfers at the station which the MTA has no good way of tracking or people exiting the station.
We decided against using the term "riders" to avoid giving the impression we were talking about a population of individuals versus the traffic in a station.
Even though the lack of transfer figures obviously skews the data for express and transfer stations the academics I consulted still agreed it was a worthwhile and interesting exercise as long as the deficiencies in the analysis were clearly noted. And there are so many stations where riders can transfer between lines it somewhat evens out the analysis. Further a city’s crime rate is based on its resident population but does not include commuters or tourists even though they can also become the victim or perpetrator of a crime. So rates are rarely a perfect science.
The station with the highest crime rate Broad Channel is a required transfer station for those continuing on to the last four stops on a branch of the A train but not many people actually swipe into the station (only an average of 224 per day). So the lack of transfers in the analysis has a significant impact. But even when factoring in the total ridership on the four stops where riders must use Broad Channel as a transfer during non-peak hours the station still has one of the highest crime rates in the system.
Interestingly most of the stations with the highest crime rate are not express or transfer stops indicating that overall you are MOST LIKELY to encounter crime in low-volume or far-flung stations. This again contradicts the conventional wisdom that had long been held by many academics that packed stations were the most dangerous before some studies like this began debunking that myth. You can see on the map that for all crime the stations in the heart of Manhattan have among the lowest rates per 100 000 trips and that’s not even taking into account transfer traffic.
I also found it interesting that for the rates misdemeanor sex crimes violent crime and to a certain extent property crime are more evenly distributed throughout the system. But drug crimes and weapon possession incidents -crimes that are more likely to generate reports from police stops versus straphanger reports - have rates more weighted in low-income minority communities.
CartoDB time!
Finally I had a table to import into CartoDB with a column for each of the rate and total categories. I had an ambitious vision for this map given my very novice status as a coder so here I need to extend my bottomless thanks to Michael Keller from Al Jazeera America and Matt Clark from Newsday for their help in making this happen.
I wanted a mode change to switch between the total and rate and buttons to switch between crime categories. Here is a link to a full tutorial on how to make buttons that toggle between map views which would not include a mode change.
To solve the double button issue Michael made LayerActions into an object that under each of the six crime categories has a key called "rate" and another called "total." The variable "mode" tells it which function to use.
##_INIT_REPLACE_ME_PRE_##
var LayerActions = { crime: { total: function() { sublayers[0].set({ sql: "SELECT * FROM mtacartodb_1" cartocss: "#mtacartodb_1{marker-fill-opacity:0.9;marker-line-color:#960916;marker-line-width:1;marker-line-opacity:1;marker-placement:point;marker-multi-policy:largest;marker-type:ellipse;marker-fill:#eb1024;marker-allow-overlap:true;marker-clip:false;}#mtacartodb_1 [ crime_total <= 1810] { marker-width: 40.6;} … #mtacartodb_1 [ crime_total <= 5] { marker-width: 2.1;}" }); return true; } rate: function() { sublayers[0].set({ sql: "SELECT * FROM mtacartodb_1" cartocss: "#mtacartodb_1{marker-fill-opacity:0.9;marker-line-color:#960916;marker-line-width:1;marker-line-opacity:1;marker-placement:point;marker-multi-policy:largest;marker-type:ellipse;marker-fill:#eb1024;marker-allow-overlap:true;marker-clip:false;}#mtacartodb_1 [ crime_rate <= 27.4] { marker-width: 50;} … #mtacartodb_1 [ crime_rate <= 0.0] { marker-width: 0;} "}); return true; } } } ##_END_REPLACE_ME_PRE_##
This sets the active filter (all crime) and mode (total) and allows you to click on a different mode and make that the current mode:
##_INIT_REPLACE_ME_PRE_##