Traffic Incident Reports in San Francisco

In [1]:
from cartoframes.auth import set_default_context, Context
from cartoframes.viz import Map, Layer, Legend, Source
from cartoframes.data import Dataset
import pandas

If you have a CARTO account, you can set your credentials in the following cell. This allows you to upload the dataset and share the final visualization through your account.

In [2]:
# username = '' # <-- insert your username here
# api_key = ''# <-- insert your API key here

# context = Context('https://{}.carto.com/'.format(username), api_key)
# set_default_context(context)

Load incident reports

Using pandas, we can read an external data source, which is converted to a dataframe. Let's see which columns we have:

In [3]:
incident_reports_df = pandas.read_csv('https://data.sfgov.org/resource/wg3w-h783.csv')
incident_reports_df.head()
Out[3]:
incident_datetime incident_date incident_time incident_year incident_day_of_week report_datetime row_id incident_id incident_number cad_number ... longitude point :@computed_region_6qbp_sg9q :@computed_region_qgnn_b9vv :@computed_region_26cr_cadq :@computed_region_ajp5_b2md :@computed_region_nqbw_i6c3 :@computed_region_2dwj_jsy4 :@computed_region_h4ep_8xdi :@computed_region_y6ts_4iup
0 2018-07-18T13:30:00.000 2018-07-18T00:00:00.000 13:30 2018 Wednesday 2018-07-18T13:31:00.000 69250964070 692509 180536729 182001522.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2019-04-08T05:25:00.000 2019-04-08T00:00:00.000 05:25 2019 Monday 2019-04-13T13:34:00.000 79165671000 791656 196076240 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2019-06-05T14:00:00.000 2019-06-05T00:00:00.000 14:00 2019 Wednesday 2019-06-05T14:30:00.000 81006072000 810060 190416337 191610728.0 ... -122.422464 \n, \n(37.78268536745206, -122.42246374465972) 100.0 4.0 11.0 39.0 NaN NaN NaN NaN
3 2019-04-16T20:20:00.000 2019-04-16T00:00:00.000 20:20 2019 Tuesday 2019-04-17T00:21:00.000 79171306244 791713 196076024 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2019-06-10T11:00:00.000 2019-06-10T00:00:00.000 11:00 2019 Monday 2019-06-10T11:00:00.000 81012372000 810123 190393440 191521989.0 ... -122.464145 \n, \n(37.779090726308574, -122.46414497098554) 5.0 8.0 4.0 11.0 NaN NaN NaN NaN

5 rows × 34 columns

In [4]:
incident_reports_df.columns
Out[4]:
Index(['incident_datetime', 'incident_date', 'incident_time', 'incident_year',
       'incident_day_of_week', 'report_datetime', 'row_id', 'incident_id',
       'incident_number', 'cad_number', 'report_type_code',
       'report_type_description', 'filed_online', 'incident_code',
       'incident_category', 'incident_subcategory', 'incident_description',
       'resolution', 'intersection', 'cnn', 'police_district',
       'analysis_neighborhood', 'supervisor_district', 'latitude', 'longitude',
       'point', ':@computed_region_6qbp_sg9q', ':@computed_region_qgnn_b9vv',
       ':@computed_region_26cr_cadq', ':@computed_region_ajp5_b2md',
       ':@computed_region_nqbw_i6c3', ':@computed_region_2dwj_jsy4',
       ':@computed_region_h4ep_8xdi', ':@computed_region_y6ts_4iup'],
      dtype='object')

Some of the latitude and longitude values are NaN, in the next step we get rid of them. After that, we create a dataset from the dataframe and use it in a Layer to visualize the data:

In [5]:
incident_reports_df = incident_reports_df[incident_reports_df.longitude == incident_reports_df.longitude]
incident_reports_df = incident_reports_df[incident_reports_df.latitude == incident_reports_df.latitude]

incident_reports_data = Dataset(incident_reports_df)

Map(Layer(incident_reports_data))
Out[5]: