Traffic Incident Reports in San Francisco

In [1]:
from cartoframes.auth import set_default_context, Context
from cartoframes.viz import Map, Layer, Legend, Source
from cartoframes.data import Dataset
import pandas

If you have a CARTO account, you can set your credentials in the following cell. This allows you to upload the dataset and share the final visualization through your account.

In [2]:
# username = '' # <-- insert your username here
# api_key = ''# <-- insert your API key here

# context = Context('https://{}.carto.com/'.format(username), api_key)
# set_default_context(context)

Load incident reports

Using pandas, we can read an external data source, which is converted to a dataframe. Let's see which columns we have:

In [3]:
incident_reports_df = pandas.read_csv('https://data.sfgov.org/resource/wg3w-h783.csv')
incident_reports_df.head()
Out[3]:
incident_datetime incident_date incident_time incident_year incident_day_of_week report_datetime row_id incident_id incident_number cad_number ... longitude point :@computed_region_6qbp_sg9q :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md :@computed_region_nqbw_i6c3 :@computed_region_2dwj_jsy4 :@computed_region_h4ep_8xdi :@computed_region_y6ts_4iup
0 2018-07-18T13:30:00.000 2018-07-18T00:00:00.000 13:30 2018 Wednesday 2018-07-18T13:31:00.000 69250964070 692509 180536729 182001522.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2019-04-08T05:25:00.000 2019-04-08T00:00:00.000 05:25 2019 Monday 2019-04-13T13:34:00.000 79165671000 791656 196076240 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2019-06-05T14:00:00.000 2019-06-05T00:00:00.000 14:00 2019 Wednesday 2019-06-05T14:30:00.000 81006072000 810060 190416337 191610728.0 ... -122.422464 \n, \n(37.78268536745206, -122.42246374465972) 100.0 4.0 11.0 39.0 NaN NaN NaN NaN
3 2019-04-16T20:20:00.000 2019-04-16T00:00:00.000 20:20 2019 Tuesday 2019-04-17T00:21:00.000 79171306244 791713 196076024 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2019-06-10T11:00:00.000 2019-06-10T00:00:00.000 11:00 2019 Monday 2019-06-10T11:00:00.000 81012372000 810123 190393440 191521989.0 ... -122.464145 \n, \n(37.779090726308574, -122.46414497098554) 5.0 8.0 4.0 11.0 NaN NaN NaN NaN

5 rows × 34 columns

In [4]:
incident_reports_df.columns
Out[4]:
Index(['incident_datetime', 'incident_date', 'incident_time', 'incident_year',
       'incident_day_of_week', 'report_datetime', 'row_id', 'incident_id',
       'incident_number', 'cad_number', 'report_type_code',
       'report_type_description', 'filed_online', 'incident_code',
       'incident_category', 'incident_subcategory', 'incident_description',
       'resolution', 'intersection', 'cnn', 'police_district',
       'analysis_neighborhood', 'supervisor_district', 'latitude', 'longitude',
       'point', ':@computed_region_6qbp_sg9q', ':@computed_region_qgnn_b9vv',
       ':@computed_region_26cr_cadq', ':@computed_region_ajp5_b2md',
       ':@computed_region_nqbw_i6c3', ':@computed_region_2dwj_jsy4',
       ':@computed_region_h4ep_8xdi', ':@computed_region_y6ts_4iup'],
      dtype='object')

Some of the latitude and longitude values are NaN, in the next step we get rid of them. After that, we create a dataset from the dataframe and use it in a Layer to visualize the data:

In [5]:
incident_reports_df = incident_reports_df[incident_reports_df.longitude == incident_reports_df.longitude]
incident_reports_df = incident_reports_df[incident_reports_df.latitude == incident_reports_df.latitude]

incident_reports_data = Dataset(incident_reports_df)

Map(Layer(incident_reports_data))
Out[5]:

Now, we are going to use a helper method to color by category, and the category is 'Day of Week' (incident_day_of_week)

In [6]:
from cartoframes.viz.helpers import color_category_layer

Map(
    color_category_layer(incident_reports_data, 'incident_day_of_week', 'Day of Week', top=7)
)
Out[6]:

As we can see in the legend, the days are sorted by frequency, which means that there're less incidents on Thursdays and More on Tuesdays. Since our purpose is not to visualize the frequency and we want to see the days properly sorted from Monday to Sunday in the legend, we can modify the helper and set the categories we want to visualize in the desired position:

In [7]:
from cartoframes.viz.helpers import color_category_layer

Map(
    color_category_layer(incident_reports_data, 'incident_day_of_week', 'Day of Week', cat=[
        'Monday',
        'Tuesday',
        'Wednesday',
        'Thursday',
        'Friday',
        'Saturday',
        'Sunday'
    ])
)
Out[7]:

Now, we want to look for traffic incidents, and then use these categories to visualize those incidents:

In [8]:
incident_reports_df.incident_category.unique()
Out[8]:
array(['Non-Criminal', 'Assault', 'Malicious Mischief', 'Larceny Theft',
       'Recovered Vehicle', 'Robbery', 'Other Miscellaneous',
       'Drug Offense', 'Disorderly Conduct', 'Burglary', 'Suspicious Occ',
       'Traffic Collision', 'Warrant', 'Arson',
       'Offences Against The Family And Children',
       'Traffic Violation Arrest', 'Forgery And Counterfeiting',
       'Missing Person', 'Lost Property', 'Motor Vehicle Theft',
       'Case Closure', 'Weapons Carrying Etc', 'Other',
       'Miscellaneous Investigation', 'Weapons Offense', 'Fraud',
       'Civil Sidewalks', 'Other Offenses', 'Prostitution',
       'Juvenile Offenses', 'Homicide', 'Vandalism', 'Sex Offense',
       'Courtesy Report', 'Stolen Property', 'Family Offense'],
      dtype=object)
In [9]:
from cartoframes.viz.helpers import size_category_layer

Map(
    size_category_layer(
        incident_reports_data,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest'])
)
Out[9]:

In CARTO we have a dataset we can use for the next step, named 'sfcta_congestion_roads'. We are going to set the Context for this dataset. To have more control over this dataset, if you have a CARTO account you can import it to have everything together, and it won't be needed to create a different source for this Dataset.

Once we've the data source created, we're going to combine two helper methods. The first one uses the Source with the roads data from CARTO, and the second one the traffic incident reports.

In [10]:
from cartoframes.viz.helpers import color_continuous_layer

sfcta_congestion_roads_source=Source(
    'sfcta_congestion_roads',
    Context(
        base_url='https://cartovl.carto.com',
        api_key='default_public'
    )
)

Map([
    color_continuous_layer(sfcta_congestion_roads_source, 'auto_speed', 'Recorded vehicle speeds'),
    size_category_layer(
        incident_reports_data,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest'])
])
Out[10]:

We are going to add information about traffic signals, by getting data from a different source:

In [11]:
traffic_signals_df = pandas.read_csv('https://data.sfgov.org/resource/c8ue-f4py.csv')
traffic_signals_df.head()
Out[11]:
objectid cnn code cnn_1 street1 street2 detection street3 sup_dist street4 ... last_edited_date point point_address point_zip point_city point_state :@computed_region_6qbp_sg9q :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md
0 14201 23163000 CALTRANS 23163000 21ST AVE SLOAT NaN NaN 4,7 NaN ... 2019-04-18T14:45:34.000 POINT (-122.477080829761 37.734617663245) NaN NaN NaN NaN 40.0 10.0 8.0 35.0
1 13946 23799001 MESSAGE SIGN 23799001 03RD ST KING 6 ... 2019-04-18T14:45:34.000 POINT (-122.392013643268 37.77809930415) NaN NaN NaN NaN 34.0 1.0 10.0 4.0
2 14161 1278800 Speed Radar 1278800 MADRONE ULLOA - WB 7 ... 2019-04-18T14:45:34.000 POINT (-122.467683831285 37.741501987719) NaN NaN NaN NaN 46.0 10.0 8.0 41.0
3 13949 7210000 CALTRANS 7210000 ALEMANY CUT THROUGH 9 ... 2019-04-18T14:45:34.000 POINT (-122.407917735682 37.737729505444) NaN NaN NaN NaN 83.0 2.0 2.0 25.0
4 13198 26528000 Fix 26528000 OCTAVIA PINE 2,5 ... 2019-04-18T14:45:13.000 POINT (-122.427059672706 37.788786895296) NaN NaN NaN NaN 102.0 4.0 11.0 30.0

5 rows × 61 columns

In [12]:
traffic_signals_df.columns
Out[12]:
Index(['objectid', 'cnn', 'code', 'cnn_1', 'street1', 'street2', 'detection',
       'street3', 'sup_dist', 'street4', 'sig_num', 'type', 'ic_make', 'model',
       'cabinet_si', 'system_num', 'master', 'bbs', 'veh_actuat', 'ped_signal',
       'ped_actuat', 'manual_con', 'tbc', 'interconne', 'preempt_pr',
       'd_ate2070', 'project_ne', 'project_ol', 'upgraded', 'yr_of_cont',
       'last_upgra', 'new_signal', 'mod_projec', 'full_upgra', 'beacon_fla',
       'funding', 'rlcam', 'startyear', 'caltrans_r', 'caltrans', 'percent_c',
       'sf', 'percent_sf', 'percent_po', 'lamp_assgn', 'lamp_sort', 'sort_me',
       'aps', 'created_user', 'created_date', 'last_edited_user',
       'last_edited_date', 'point', 'point_address', 'point_zip', 'point_city',
       'point_state', ':@computed_region_6qbp_sg9q',
       ':@computed_region_qgnn_b9vv', ':@computed_region_26cr_cadq',
       ':@computed_region_ajp5_b2md'],
      dtype='object')
In [13]:
traffic_signals_df = traffic_signals_df.rename(columns={'type': 'signal_type'})
traffic_signals_df.signal_type.unique()
Out[13]:
array(['CALTRANS - HAWK', 'MESSAGE SIGN', 'RADAR SPEED SIGN', 'CALTRANS',
       'SIGNAL', 'BEACON - RRFB', 'PENDING SIGNAL', 'BEACON',
       'FUTURE SIGNAL', 'CALTRANS (BY CONTRACTOR CONSORTIUM GLC)',
       'PENDING BEACON', 'OTHER', 'LANE CONTROL', 'FLASHER',
       'LIGHTED CROSSWALK', 'DALY CITY'], dtype=object)

Since there is no latitude and longitude columns, we can use the point column to create a GeoDataFrame.

In [14]:
import geopandas
from shapely import wkt

traffic_signals_df['point'] = traffic_signals_df['point'].apply(wkt.loads)
traffic_signals_df = traffic_signals_df.rename(columns={'point': 'geometry'}).set_geometry('geometry')
trafic_signals_gdf = geopandas.GeoDataFrame(traffic_signals_df, geometry='geometry')
In [15]:
traffic_signals_data = Dataset(trafic_signals_gdf)

Map(Layer(traffic_signals_data))
Out[15]:

We are getting only the signal types we want to visualize, but we are going to build a Layer that uses cross symbols this time:

In [16]:
signal_gdf = trafic_signals_gdf[trafic_signals_gdf['signal_type'].isin(['RADAR SPEED SIGN', 'FLASHER',  'LIGHTED CROSSWALK'])]
signal_data = Dataset(signal_gdf)

Map(
    Layer(
        signal_data,
        '''
        color: ramp($signal_type, bold)
        width: 15
        symbol: cross
        ''',
        legend={
           'type': 'color-category',
           'title': 'Radar'
        })
)
Out[16]:

All together:

In [17]:
Map([
    color_continuous_layer(
        sfcta_congestion_roads_source, 'auto_speed', 'Recorded vehicle speeds'),
    size_category_layer(
        incident_reports_data,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest']),
    Layer(
        signal_data,
        '''
        color: ramp($signal_type, bold)
        width: 15
        symbol: cross
        ''',
        legend={
           'type': 'color-category',
           'title': 'Radar'
        })
])
Out[17]:
In [ ]: