Traffic Incident Reports in San Francisco¶

Visualize traffic incident reports in San Francisco.

Data sources:

from cartoframes.auth import set_default_context, Context
from cartoframes.viz import Map, Layer, Legend, Source
from cartoframes.data import Dataset
import pandas

If you have a CARTO account, you can set your credentials in the following cell. This allows you to upload the dataset and share the final visualization through your account.

# username = '' # <-- insert your username here
# api_key = ''# <-- insert your API key here

# context = Context('https://{}.carto.com/'.format(username), api_key)
# set_default_context(context)

Load incident reports¶

Using pandas, we can read an external data source, which is converted to a dataframe. Let's see which columns we have:

incident_reports_df = pandas.read_csv('https://data.sfgov.org/resource/wg3w-h783.csv')
incident_reports_df.head()

incident_reports_df.columns

Index(['incident_datetime', 'incident_date', 'incident_time', 'incident_year',
       'incident_day_of_week', 'report_datetime', 'row_id', 'incident_id',
       'incident_number', 'cad_number', 'report_type_code',
       'report_type_description', 'filed_online', 'incident_code',
       'incident_category', 'incident_subcategory', 'incident_description',
       'resolution', 'intersection', 'cnn', 'police_district',
       'analysis_neighborhood', 'supervisor_district', 'latitude', 'longitude',
       'point', ':@computed_region_6qbp_sg9q', ':@computed_region_qgnn_b9vv',
       ':@computed_region_26cr_cadq', ':@computed_region_ajp5_b2md',
       ':@computed_region_nqbw_i6c3', ':@computed_region_2dwj_jsy4',
       ':@computed_region_h4ep_8xdi', ':@computed_region_y6ts_4iup'],
      dtype='object')

Some of the latitude and longitude values are NaN, in the next step we get rid of them. After that, we create a dataset from the dataframe and use it in a Layer to visualize the data:

incident_reports_df = incident_reports_df[incident_reports_df.longitude == incident_reports_df.longitude]
incident_reports_df = incident_reports_df[incident_reports_df.latitude == incident_reports_df.latitude]

incident_reports_data = Dataset(incident_reports_df)

Map(Layer(incident_reports_data))

Now, we are going to use a helper method to color by category, and the category is 'Day of Week' (incident_day_of_week)

from cartoframes.viz.helpers import color_category_layer

Map(
    color_category_layer(incident_reports_data, 'incident_day_of_week', 'Day of Week', top=7)
)

As we can see in the legend, the days are sorted by frequency, which means that there're less incidents on Thursdays and More on Tuesdays. Since our purpose is not to visualize the frequency and we want to see the days properly sorted from Monday to Sunday in the legend, we can modify the helper and set the categories we want to visualize in the desired position:

from cartoframes.viz.helpers import color_category_layer

Map(
    color_category_layer(incident_reports_data, 'incident_day_of_week', 'Day of Week', cat=[
        'Monday',
        'Tuesday',
        'Wednesday',
        'Thursday',
        'Friday',
        'Saturday',
        'Sunday'
    ])
)

Now, we want to look for traffic incidents, and then use these categories to visualize those incidents:

incident_reports_df.incident_category.unique()

array(['Non-Criminal', 'Assault', 'Malicious Mischief', 'Larceny Theft',
       'Recovered Vehicle', 'Robbery', 'Other Miscellaneous',
       'Drug Offense', 'Disorderly Conduct', 'Burglary', 'Suspicious Occ',
       'Traffic Collision', 'Warrant', 'Arson',
       'Offences Against The Family And Children',
       'Traffic Violation Arrest', 'Forgery And Counterfeiting',
       'Missing Person', 'Lost Property', 'Motor Vehicle Theft',
       'Case Closure', 'Weapons Carrying Etc', 'Other',
       'Miscellaneous Investigation', 'Weapons Offense', 'Fraud',
       'Civil Sidewalks', 'Other Offenses', 'Prostitution',
       'Juvenile Offenses', 'Homicide', 'Vandalism', 'Sex Offense',
       'Courtesy Report', 'Stolen Property', 'Family Offense'],
      dtype=object)

from cartoframes.viz.helpers import size_category_layer

Map(
    size_category_layer(
        incident_reports_data,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest'])
)

In CARTO we have a dataset we can use for the next step, named 'sfcta_congestion_roads'. We are going to set the Context for this dataset. To have more control over this dataset, if you have a CARTO account you can import it to have everything together, and it won't be needed to create a different source for this Dataset.

Once we've the data source created, we're going to combine two helper methods. The first one uses the Source with the roads data from CARTO, and the second one the traffic incident reports.

from cartoframes.viz.helpers import color_continuous_layer

sfcta_congestion_roads_source=Source(
    'sfcta_congestion_roads',
    Context(
        base_url='https://cartovl.carto.com',
        api_key='default_public'
    )
)

Map([
    color_continuous_layer(sfcta_congestion_roads_source, 'auto_speed', 'Recorded vehicle speeds'),
    size_category_layer(
        incident_reports_data,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest'])
])

We are going to add information about traffic signals, by getting data from a different source:

traffic_signals_df = pandas.read_csv('https://data.sfgov.org/resource/c8ue-f4py.csv')
traffic_signals_df.head()

traffic_signals_df.columns

Index(['objectid', 'cnn', 'code', 'cnn_1', 'street1', 'street2', 'detection',
       'street3', 'sup_dist', 'street4', 'sig_num', 'type', 'ic_make', 'model',
       'cabinet_si', 'system_num', 'master', 'bbs', 'veh_actuat', 'ped_signal',
       'ped_actuat', 'manual_con', 'tbc', 'interconne', 'preempt_pr',
       'd_ate2070', 'project_ne', 'project_ol', 'upgraded', 'yr_of_cont',
       'last_upgra', 'new_signal', 'mod_projec', 'full_upgra', 'beacon_fla',
       'funding', 'rlcam', 'startyear', 'caltrans_r', 'caltrans', 'percent_c',
       'sf', 'percent_sf', 'percent_po', 'lamp_assgn', 'lamp_sort', 'sort_me',
       'aps', 'created_user', 'created_date', 'last_edited_user',
       'last_edited_date', 'point', 'point_address', 'point_zip', 'point_city',
       'point_state', ':@computed_region_6qbp_sg9q',
       ':@computed_region_qgnn_b9vv', ':@computed_region_26cr_cadq',
       ':@computed_region_ajp5_b2md'],
      dtype='object')

traffic_signals_df = traffic_signals_df.rename(columns={'type': 'signal_type'})
traffic_signals_df.signal_type.unique()

array(['CALTRANS - HAWK', 'MESSAGE SIGN', 'RADAR SPEED SIGN', 'CALTRANS',
       'SIGNAL', 'BEACON - RRFB', 'PENDING SIGNAL', 'BEACON',
       'FUTURE SIGNAL', 'CALTRANS (BY CONTRACTOR CONSORTIUM GLC)',
       'PENDING BEACON', 'OTHER', 'LANE CONTROL', 'FLASHER',
       'LIGHTED CROSSWALK', 'DALY CITY'], dtype=object)

Since there is no latitude and longitude columns, we can use the point column to create a GeoDataFrame.

import geopandas
from shapely import wkt

traffic_signals_df['point'] = traffic_signals_df['point'].apply(wkt.loads)
traffic_signals_df = traffic_signals_df.rename(columns={'point': 'geometry'}).set_geometry('geometry')
trafic_signals_gdf = geopandas.GeoDataFrame(traffic_signals_df, geometry='geometry')

traffic_signals_data = Dataset(trafic_signals_gdf)

Map(Layer(traffic_signals_data))

We are getting only the signal types we want to visualize, but we are going to build a Layer that uses cross symbols this time:

signal_gdf = trafic_signals_gdf[trafic_signals_gdf['signal_type'].isin(['RADAR SPEED SIGN', 'FLASHER',  'LIGHTED CROSSWALK'])]
signal_data = Dataset(signal_gdf)

Map(
    Layer(
        signal_data,
        '''
        color: ramp($signal_type, bold)
        width: 15
        symbol: cross
        ''',
        legend={
           'type': 'color-category',
           'title': 'Radar'
        })
)

All together:

Map([
    color_continuous_layer(
        sfcta_congestion_roads_source, 'auto_speed', 'Recorded vehicle speeds'),
    size_category_layer(
        incident_reports_data,
        'incident_category',
        'Traffic Incidents',
        cat=['Traffic Collision', 'Traffic Violation Arrest']),
    Layer(
        signal_data,
        '''
        color: ramp($signal_type, bold)
        width: 15
        symbol: cross
        ''',
        legend={
           'type': 'color-category',
           'title': 'Radar'
        })
])

	objectid	cnn	code	cnn_1	street1	street2	detection	street3	sup_dist	street4	...	last_edited_date	point	point_address	point_zip	point_city	point_state	:@computed_region_6qbp_sg9q	:@computed_region_qgnn_b9vv	:@computed_region_26cr_cadq	:@computed_region_ajp5_b2md
0	14201	23163000	CALTRANS	23163000	21ST AVE	SLOAT	NaN	NaN	4,7	NaN	...	2019-04-18T14:45:34.000	POINT (-122.477080829761 37.734617663245)	NaN	NaN	NaN	NaN	40.0	10.0	8.0	35.0
1	13946	23799001	MESSAGE SIGN	23799001	03RD ST	KING			6		...	2019-04-18T14:45:34.000	POINT (-122.392013643268 37.77809930415)	NaN	NaN	NaN	NaN	34.0	1.0	10.0	4.0
2	14161	1278800	Speed Radar	1278800	MADRONE	ULLOA - WB			7		...	2019-04-18T14:45:34.000	POINT (-122.467683831285 37.741501987719)	NaN	NaN	NaN	NaN	46.0	10.0	8.0	41.0
3	13949	7210000	CALTRANS	7210000	ALEMANY	CUT THROUGH			9		...	2019-04-18T14:45:34.000	POINT (-122.407917735682 37.737729505444)	NaN	NaN	NaN	NaN	83.0	2.0	2.0	25.0
4	13198	26528000	Fix	26528000	OCTAVIA	PINE			2,5		...	2019-04-18T14:45:13.000	POINT (-122.427059672706 37.788786895296)	NaN	NaN	NaN	NaN	102.0	4.0	11.0	30.0

	incident_datetime	incident_date	incident_time	incident_year	incident_day_of_week	report_datetime	row_id	incident_id	incident_number	cad_number	...	longitude	point	:@computed_region_6qbp_sg9q	:@computed_region_qgnn_b9vv	:@computed_region_26cr_cadq	:@computed_region_ajp5_b2md	:@computed_region_nqbw_i6c3	:@computed_region_2dwj_jsy4	:@computed_region_h4ep_8xdi	:@computed_region_y6ts_4iup
0	2018-07-18T13:30:00.000	2018-07-18T00:00:00.000	13:30	2018	Wednesday	2018-07-18T13:31:00.000	69250964070	692509	180536729	182001522.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	2019-04-08T05:25:00.000	2019-04-08T00:00:00.000	05:25	2019	Monday	2019-04-13T13:34:00.000	79165671000	791656	196076240	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	2019-06-05T14:00:00.000	2019-06-05T00:00:00.000	14:00	2019	Wednesday	2019-06-05T14:30:00.000	81006072000	810060	190416337	191610728.0	...	-122.422464	\n, \n(37.78268536745206, -122.42246374465972)	100.0	4.0	11.0	39.0	NaN	NaN	NaN	NaN
3	2019-04-16T20:20:00.000	2019-04-16T00:00:00.000	20:20	2019	Tuesday	2019-04-17T00:21:00.000	79171306244	791713	196076024	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	2019-06-10T11:00:00.000	2019-06-10T00:00:00.000	11:00	2019	Monday	2019-06-10T11:00:00.000	81012372000	810123	190393440	191521989.0	...	-122.464145	\n, \n(37.779090726308574, -122.46414497098554)	5.0	8.0	4.0	11.0	NaN	NaN	NaN	NaN