When I saw this tweet from Jorge Arévalo I couldn’t resist.

The blog post linked in Jorge’s tweet nicely explains how to process NBA statistics data in order to analyse it and generate some charts of different metrics. When I was reading the blog post, I immediately saw coordinates and time and thought that it would be cool to try visualizing it with Torque, our own temporal mapping library.

Using Jorge’s methods, I processed the data, but instead of plotting the data, I exported it to CSV. As always, uploading the CSV to CartoDB was just a quick drag-n-drop and immediately I finally had an nba_moment. Because CartoDB didn’t recognize immediately the coordinates of the players as something to map, it was a table without geometries. However, with the x_loc and y_loc fields it was just waiting a little converstion to be used on a visualization using SQL. Here is the final result:

With my new CartoDB Map, the first thing was to remove the basemap and use the same court png used at the blog post, uploading it as a custom basemap. You could do something more advanced, but for testing purposes I allowed to just repeat over the entire background.

With the court in place the next step was to generate the data for Torque. I came up with this SQL:

SELECT
  cartodb_id,
  game_clock,
  CASE
    WHEN team_id = -1 THEN 'The Ball'
    WHEN team_id = 1610612746 THEN 'Los Angeles Clippers'
    ELSE 'Houston Rockets'
  END team,         -- Gate team names from ids
  ST_SetSRID(
    ST_TransScale(
      ST_MakePoint( -- Create a point from
        x_loc,      -- same x from the table
        50 - y_loc  -- but revert the Y coordinate
      ),
      0.0,0.0,      -- Don't translate
      1.45,1.45     -- and scale a bit to adapt to the court
    ),
    3257            -- Fake a Web Mercator
  ) AS the_geom_webmercator
FROM nba_moment

Torque usually works well with dates and I could actually have had exact timestamps but for this short period, but for a quick prototype the game_clock field worked just as well. For the categories I used a CASE statement to organize the data and make it easier to produce a nice output.

With the data ready, the next step was symbology. Using the wizard I selected the Torque Cat option which allows you to apply variable colors to different categories. I defined fields to be used, colors, steps, etc. and fine tuned the settings in the CartoCSS tab to give a better aspect to the trails. I added a few elements on top of the map and a proper description for the visualization and that was it.

Following the blog post, once you have the data in a well defined structure, you can start digging into on the numbers - that’s what they were collected for after all. With that table in CartoDB, you can run all kinds of aggregations and statistics but I wanted to try one that takes into account the spatial aspect. Watching the cool visualizations produced by the NBA portal, it seems the area covered by the team members is an important metric. That’s a Convex Hull of our positions grouped by moment and team!

WITH positions AS (
  SELECT
    game_clock,
    team_id,
  ST_MakePoint(x_loc,y_loc) as position
  FROM nba_moment
  WHERE team_id != -1  -- we don't need the ball
)
SELECT
  ST_Area(          -- get the area
    ST_ConvexHull(  -- of the polygon formed
      ST_Collect(   -- by the aggregation
        position    -- of the positions
  ))) area,
  game_clock, team_id
FROM positions
GROUP BY game_clock, team_id
ORDER BY game_clock, team_id

Plotting this query using web technologies is pretty straight forward, check the code of this jsfiddle and you’ll see that running the query and displaying this graph in 40 lines of code is easy as pie.

This is just an example, good sports geeks will know way better than me how to extract more knowledge from these datasets that technology applied to sports brings to us.

Happy data mapping!