Site Planning and Revenue Prediction: Optimizing Food Truck Locations in New York City

Summary

Learn why new data streams are ushering in a new era of site planning in our latest post about optimizing food truck locations in New York City.

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Site Planning and Revenue Prediction: Optimizing Food Truck Locations in New York City

Advances made possible thanks to location data continue transforming business practices and processes when it comes to site planning. And now with insights from new data streamsit is even possible to determine what sites are most likely to increase sales for seasonal temporary and mobile businesses.

Food trucks a lunch-time staple for many operate on a location-dependent business model. Generally speaking food trucks offer similar lunch options for roughly the same price which makes it difficult for businesses to differentiate themselves from nearby competitors. As a result food truck location can determine whether a business succeeds or fails.

Recently we helped a local food truck business determine the prime spots for their trucks with revenue prediction models. The company provided one month’s worth of anonymized transaction data for each of its 10 food carts and with this information our team of data scientists were able to determine current performance build increasingly confident revenue models and finally predict the six best performing food truck locations.

Measuring current performance

Before we could predict what locations should be selected to drive future sales we first needed to figure out a way to measure the current performance of each site in Manhattan and Brooklyn.

To get started Wenfei and Dongjie two of our data scientists first aggregated the data by truck by hour to find a measure for the average spend per hour.

Average spend per hour

The graphs show that hourly revenue for each food truck usually peaks around lunch-time although sometimes there are spikes in sales around breakfast-time as well. Next Wenfei and Mamata our head of cartography mapped food truck sales using proportional circles reflecting revenue amounts for each location across Manhattan and Brooklyn.

Food Truck Sales

As expected food trucks located in high volume traffic areas--Grand Central Station SoHo Times Square etc.--are the most lucrative locations for this company.

Now we want to figure where the best locations are for increasing sales which means we’ll need to identify some variables near and around our current locations that can serve as predictors in our revenue model. Traditionally these predictors are identified using data from the census and points of interest (POI) data.

The demographic insights available from census data are helpful for segmenting target customers but this use case illustrates one of the significant limitations of working with census data.

Census Tracts

The census provides residential data for our area of operations and in the image above this information is presented at the census tract level. However many food truck customers are workers who commute into the city or tourist visiting New York landmarks which is likely why the Grand Central Station and Times Square are among the most profitable locations. As such residential data offers few insights relevant to increasing sales among this target customer base.

POI data will be more useful here for finding patterns of nearby attractions around high-performing food trucks that can serve as a predictor for our models.

Point of Interests

The first map shows every POI point in Manhattan and Brooklyn but there’s so much noise that it’s hard to determine which attractions appear and reappear near and around each of our food trucks. Since many customers select food trucks based on proximity 200 meter radius buffers were created around each cart which is about a 2 ½ to 3 minute walking time so predictor features could more easily be identified in the second map.

Building more precise models with new data streams

Now we’re ready to start building a gradient boosted regression (GBR) model that will allow us to determine which features from this data are most important when considering where to place our food trucks. In short the GBR model will help us rank feature importance that will provide us a list of predictors to look for when considering a potential food truck location.

The first revenue model was created using only traditional data sources specifically census and POI data:

Model One

The GBR model returned an R-squared score a measure of the variability within the dataset from 0-1 that can gauge confidence in the model. An R-squared score of .38 means that there is a range of variability in the data so to determine with a greater sense of confidence what features are most important to consider when selecting a food truck location more data is needed.

To improve the model MasterCard spend data was added and the same equations performed to see whether the R-squared score would increase.

Spend Score

MasterCard spend scores provide aggregated and anonymized merchant-level transaction insights on where when and how people spend money. More specifically the transaction percentile score provides a frequency measure that is important. Because most food carts offer similar types of food for around the same price the frequency measure provides insights on customer volume for each cart.

Model Two

Here we see a sizeable score increase and greater alignment among points in the scatter plot. However the R-squared score could be stronger so a layer of foot traffic data was added to the model.

Model Three

Here the R-squared score has increased by 18 points since model one which makes a lot of sense and confirms our earlier assumption with POI buffers that food trucks rely on foot traffic from nearby customers. It is significant to note that when additional derivative data layers were added to our model there was an improvement in our R-squared score. Without these new data streams we would not be in a position to identify with much confidence where the best locations are for each food truck.

Feature Importance

The image above presents the 12 features that our model identified as having a statistically significant impact on food truck sales and the top four features were selected to serve as predictors for identifying new locations: 1. Foot Traffic from previous hour 2. Foot Traffic from current hour 3. Day of the week and 4. Mastercard frequency score.

Revenue Predictions

Now it is time to start mapping the selected predictors across New York City using 100x100 meter grid tiles (roughly the size of a city block). Next using a histogram we looked at the sales distribution across the city and calculated the weekly sales average per truck to be approximately $2 786.

Since the goal is to find new locations that are likely to increase sales revenue we selected the higher end of the revenue distribution and then clustered them into revenue areas. Because the model’s R-squared score was .63 there’s not quite enough confidence to pinpoint the exact location for each truck. Instead these revenue areas were clustered to locate regions within a neighborhood with a higher likelihood of being profitable.

Model Three

The image above shows the changes to our map that each of these operations yielded. In the end six locations were identified with revenue predictions for each. Below the six locations are ranked highest to lowest by weekly sales average for each locations.

  1. Corona Park: $6 128 weekly sales average
  2. Penn Station: $5 975 weekly sales average
  3. SoHo: $5 911 weekly sales average
  4. Grand Central Station: $5 766 weekly sales average
  5. West Village: $5 234 weekly sales average
  6. DUMBO: $5 193 weekly sales average

While there are the usual suspects on this list (Penn Station Grand Central etc.) it is surprising that Corona Park turns out to be the best location for increasing food truck sales revenue. When nearby tourist attractions and the area’s population density are taken into consideration however the results make sense.

A new era of site planning

New data streams are ushering in a new era of site planning making previously impossible solutions possible. Indeed as this food truck example highlights the future of site planning depends on accessing and working with various types of data from traditional sources to new derivative datasets to identify understand and quantify the impact that mobility patterns will have on your sales revenue.

Want to learn more about working with new data streams to modernize your site planning? Then check out our recent webinar on working with Mastercard Retail Location Insights Watch Today!