Advances made possible thanks to location data continue transforming business practices and processes when it comes to site planning. And now with insights from new data streams it is even possible to determine what sites are most likely to increase sales for seasonal, temporary, and mobile businesses.
Food trucks, a lunch-time staple for many, operate on a location-dependent business model. Generally speaking, food trucks offer similar lunch options for roughly the same price, which makes it difficult for businesses to differentiate themselves from nearby competitors. As a result, food truck location can determine whether a business succeeds or fails.
Recently, we helped a local food truck business determine the prime spots for their trucks with revenue prediction models. The company provided one month’s worth of anonymized transaction data for each of its 10 food carts, and with this information our team of data scientists were able to determine current performance, build increasingly confident revenue models, and, finally, predict the six best performing food truck locations.
Before we could predict what locations should be selected to drive future sales, we first needed to figure out a way to measure the current performance of each site in Manhattan and Brooklyn.
To get started, Wenfei and Dongjie, two of our data scientists, first aggregated the data by truck by hour to find a measure for the average spend per hour.
The graphs show that hourly revenue for each food truck usually peaks around lunch-time, although sometimes there are spikes in sales around breakfast-time as well. Next, Wenfei and Mamata, our head of cartography, mapped food truck sales using proportional circles reflecting revenue amounts for each location across Manhattan and Brooklyn.
As expected, food trucks located in high volume traffic areas–Grand Central Station, SoHo, Times Square, etc.–are the most lucrative locations for this company.
Now we want to figure where the best locations are for increasing sales, which means we’ll need to identify some variables near and around our current locations that can serve as predictors in our revenue model. Traditionally, these predictors are identified using data from the census and points of interest (POI) data.
The demographic insights available from census data are helpful for segmenting target customers, but this use case illustrates one of the significant limitations of working with census data.
The census provides residential data for our area of operations, and in the image above this information is presented at the census tract level. However, many food truck customers are workers who commute into the city or tourist visiting New York landmarks, which is likely why the Grand Central Station and Times Square are among the most profitable locations. As such, residential data offers few insights relevant to increasing sales among this target customer base.
POI data will be more useful here for finding patterns of nearby attractions around high-performing food trucks that can serve as a predictor for our models.
The first map shows every POI point in Manhattan and Brooklyn, but there’s so much noise that it’s hard to determine which attractions appear and reappear near and around each of our food trucks. Since many customers select food trucks based on proximity, 200 meter radius buffers were created around each cart, which is about a 2 ½ to 3 minute walking time, so predictor features could more easily be identified in the second map.
Now we’re ready to start building a gradient boosted regression (GBR) model that will allow us to determine which features from this data are most important when considering where to place our food trucks. In short, the GBR model will help us rank feature importance that will provide us a list of predictors to look for when considering a potential food truck location.
The first revenue model was created using only traditional data sources, specifically census and POI data:
The GBR model returned an R-squared score, a measure of the variability within the dataset from 0-1 that can gauge confidence in the model. An R-squared score of .38 means that there is a range of variability in the data so to determine with a greater sense of confidence what features are most important to consider when selecting a food truck location more data is needed.
To improve the model, MasterCard spend data was added and the same equations performed to see whether the R-squared score would increase.
MasterCard spend scores provide aggregated and anonymized merchant-level transaction insights on where, when, and how people spend money. More specifically, the transaction percentile score provides a frequency measure that is important. Because most food carts offer similar types of food for around the same price, the frequency measure provides insights on customer volume for each cart.
Here we see a sizeable score increase and greater alignment among points in the scatter plot. However, the R-squared score could be stronger so a layer of foot traffic data was added to the model.
Here the R-squared score has increased by 18 points since model one, which makes a lot of sense and confirms our earlier assumption with POI buffers that food trucks rely on foot traffic from nearby customers. It is significant to note that when additional derivative data layers were added to our model there was an improvement in our R-squared score. Without these new data streams we would not be in a position to identify with much confidence where the best locations are for each food truck.
The image above presents the 12 features that our model identified as having a statistically significant impact on food truck sales, and the top four features were selected to serve as predictors for identifying new locations: 1. Foot Traffic from previous hour, 2. Foot Traffic from current hour, 3. Day of the week, and 4. Mastercard frequency score.
Now it is time to start mapping the selected predictors across New York City using 100x100 meter grid tiles (roughly the size of a city block). Next, using a histogram, we looked at the sales distribution across the city and calculated the weekly sales average per truck to be approximately $2,786.
Since the goal is to find new locations that are likely to increase sales revenue, we selected the higher end of the revenue distribution and then clustered them into revenue areas. Because the model’s R-squared score was .63 there’s not quite enough confidence to pinpoint the exact location for each truck. Instead, these revenue areas were clustered to locate regions within a neighborhood with a higher likelihood of being profitable.
The image above shows the changes to our map that each of these operations yielded. In the end, six locations were identified with revenue predictions for each. Below, the six locations are ranked highest to lowest by weekly sales average for each locations.
While there are the usual suspects on this list (Penn Station, Grand Central, etc.), it is surprising that Corona Park turns out to be the best location for increasing food truck sales revenue. When nearby tourist attractions and the area’s population density are taken into consideration, however, the results make sense.
New data streams are ushering in a new era of site planning making previously impossible solutions possible. Indeed, as this food truck example highlights, the future of site planning depends on accessing and working with various types of data, from traditional sources to new derivative datasets, to identify, understand, and quantify the impact that mobility patterns will have on your sales revenue.
Want to learn more about working with new data streams to modernize your site planning? Then check out our recent webinar on working with Mastercard Retail Location InsightsWatch Today!
Raster is faster but vector is corrector!Data Visualization
Many comparisons have been made between these two iconic cities, not least in terms of size, population and other factors like quality of life and the best pizza slice, but...Data Visualization
Please fill out the below form and we'll be in touch real soon.