Local Hotspots and Travel Flows, Directly from the Data

at October 29th, 2014

It’s a crucial question in ecommerce: How likely is customer X to buy product Y? For Local, we must of course consider the physical locations of both X and Y. This is the location relevance problem, which is one of the most important ingredients in determining the best deals for each of our users. When we send out emails or return search query results, the deals that we display have to be relevant. To solve this problem we need to know our users’ propensity to travel for the different services that we offer, and having an accurate measurement of these travel patterns helps us to understand demand and thus optimize our sales force.

We cannot simply assume that users will want to stay in their home neighborhoods. People want to get out and explore, and we want them to check Groupon first. One approach is to determine location relevance based on simple distance, but this is an over-simplification. We know that people flock to local hotspots and avoid certain neighborhoods. They are also more likely to travel farther for a rare service, a pricey restaurant, and many leisure activities such as museums and waterparks. Fortunately, we can capture these trends directly from the data. Here’s how.

Getting the data

For this analysis, we require data pertaining to where our users are located and what they have purchased. For this we leverage the fact that users can voluntarily provide us with their locations in the form of zipcodes or full addresses. For each historical order data point, there are several variables that we want to track due to their importance in determining whether or not someone is willing to travel for a Local deal. For this analysis we consider the following:

  1. the location of the user
  2. the location of the merchant
  3. the service that the merchant provides
  4. the price of the deal

In order to compute useful empirical probabilities regarding how likely users are to travel, we need to group the data points into bins. For the locations, we partition the world into markets (metropolitan areas plus the surrounding suburbs), which are further divided into “submarkets.” For the deals, we have a hierarchical taxonomy with three levels of merchant attribution: the type of services offered (e.g. milkshakes or body wraps), the merchant type (e.g. “Bakery & Desserts” or ”Spa Services”), and the general merchant category (e.g. “Food & Drink” or “Beauty / Wellness / Healthcare”). We further define price bins such as “$0-25″ on the low end and “$100+” on the high end.

Thus for each user we have a market and a submarket, and for each deal we also have a market and a submarket, in addition to a service (with the associated merchant type and category) and a price bin. Then for each <service, price bin, deal location, user location> combination we can empirically determine the odds that the user will be willing to travel for the type of deal in question, based on what has occurred in the past.

Based on these odds we can determine the most popular travel patterns, which will tell us where each city’s hotspots are located. We can further define an effective radius for each individual service, thereby determining how far users are typically willing to travel for, say, aerobics versus paintball.

Not so fast…

There are several subtleties to this analysis. For starters, many users and merchants will have multiple locations in our database. This can happen for instance when a user has multiple addresses registered with us and when merchants have multiple locations. To work around this, we assume the <user location, deal location> pair that corresponds to the shortest distance. In other words, if a user buys a deal that is closer to their home than to their workplace, we assume that they’re traveling from the former. Similarly, if someone buys a deal that has multiple locations, we assume that they’re going to redeem it at the location that is closest to them.

A bigger problem is data sparsity. Given the extremely broad variety of services that we offer, we find that some of our <service, price bin, deal location> combinations have too few data points, and thus a poor sampling of the relevant locations of the travel-willing users. For instance, in Chicago and the surrounding suburbs we have 14 submarkets. Thus if we wanted to determine which submarkets’ users are buying mid-price downtown yoga deals, we need to have far more than 14 data points to get a good sampling. We work around this problem by using geography-independent fallbacks, utilizing our taxonomy. For instance if we lack sufficient data at the <service, price bin> level, then we collapse out the price bin and only consider the service. If we still lack sufficient data, we then fall back a level in our taxonomy and use the merchant type or the even more granular merchant category.

Another important issue is outlier deals. Especially amazing deals might draw users from a much wider radius than is typical, which would skew our results. To deal with this we use outlier removal to exclude the very top- and bottom- performing historical deals from our dataset.


For each <service, price bin, user location, deal location> combination, our result is the probability that a user from that location would be willing to travel for a deal of that service, price bin, and location. To be sure that we’re not just seeing noise and that these travel flows are actual organic tendencies, we say that a flow is important enough to be deemed “travel-worthy” if this probability reaches a threshold of 20% or more. This level was found to be aggressive enough to leave us with only the truly statistically significant flows, yet low enough to give us sufficient useful information on the travel patterns for each city and service.

As expected, we find that users are indeed inclined to travel beyond their home neighborhoods, and that those travel propensities depend on where they live. For instance, the median distance Chicago users are willing to travel is about 5 miles. However, this median depends strongly on user submarket, and is under 2 miles for downtown users but approximately 12 miles for users from the South suburbs.


Where are these users traveling, and for what? Our results tell us these hotspots as a function of service, and we find that they depend on a combination of merchant density and merchant quality. For more common services, such as steakhouses, we find that users generally travel from areas of lower to higher merchant density. However, users are also willing to travel for particularly great merchants, and this preference is more likely to dictate the travel patterns for more unique services, like museums.

For example, one service in our “Food & Drink” category is “Cupcake.” We can query our results to give us the travel worthy patterns for this service for Chicago and the lowest price bin.


Chicago’s cupcake hotspots, here marked with red dots and defined as submarkets having at least 3 travel patterns ending there, all contain regions with the highest densities of cupcake-providing merchants. Similarly, for steakhouses we find that the submarkets containing the Magnificent Mile and Naperville are the major hotspots, as any Chicagoan might expect.

Which <service, deal location> combinations draw the most travelers? In Chicago in particular, we find that people are flocking to a Murder Mystery dinner spot in the West suburbs, a Hawaiian restaurant in the North suburbs, and the Field Museum and Segway tours downtown.

Despite our strict travel-worthy threshold, we still must verify that these travel patterns are actual organic travel tendencies, as opposed to being due to gaps in our inventory. Fortunately using census data for each submarket we find a strong correlation between the resident and business densities and the number of travel patterns that end there, indicating that our hotspots reflect actual travel patterns and are not biased by Groupon offerings. Still there are two behaviors that we need to stay on the lookout for:

  1. regions with low business density but high travel worthiness that are getting more than their fair share of deals
  2. regions with high business density but low travel worthiness that are getting less than their fair share of deals

In this way we can keep an eye on our inventory and troubleshoot as needed.

As mentioned above, we can also define an effective radius for each <service, price bin, location> combination, to determine how far users from said location are willing to travel for certain types of deals. We define this to be the 75th percentile of all of the relevant user-deal distances in our historical data set. By doing so we find that the lowest-radius services tend to include everyday fitness activities like aerobics classes, gym memberships, and spinning classes, whereas the highest-radius services tend to include weekend leisure activities such as white water rafting, off-roading, and skiing.

Next steps

This analysis was performed entirely with subscription and order data, and thus it was limited to a study of the interplay of merchant and user home locations. The expansion of our mobile business provides a huge opportunity for further tracking of travel patterns. Assuming users have given us their explicit consent to track it, GPS data gives us a much finer-grained picture of their behavior, thereby enabling us to learn where users are when they open the app and where they are when they place orders. Coupled with Gnome, we can further empower merchants to build stronger ties with customers who routinely travel to their neighborhoods.

Meet Groupon Engineering: Tim Macdonald

at October 23rd, 2014

Tim Macdonald

LR: How long have you been at Groupon? TM: Just over a year; I started part-time work in September 2013, and then switched to full-time last January.

LR: What are some of the challenges and problems you get to solve in your work as a software engineer at Groupon? TM: I’ve been on several teams, but so far have spent most of my time doing some interesting work with analyzing past deals we’ve run and using that data to predict how new deals would do in the market. It’s of course both challenging and exciting to be working on a suite of new products, and more specifically there were some good problems related to parsing deals—that is, figuring out what exactly they’re selling and how that stacks up to other ones. From a more technical standpoint, I’ve learned a new language that I really love (Clojure), massively improved my skills with another (Javascript), and generally be more involved with full-stack development.

LR: What do you enjoy most about working at Groupon? TM: Lots of great things about working at Groupon but I think the best part is the people I get to work with. It amazes me that the same group of people making great technical decisions and giving great product pushback can always make me laugh or be up for a beer!

LR: You recently took first place at the 2014 U.S. Scottish Fiddle Championship! How did you get into fiddle playing? TM: I’ve been playing the (classical) violin since I was four—I went to see the Tegucigalpa Philharmonic and was entranced. When I was twelve, I was at a Scottish festival, saw the fiddle competition, and thought, “That’s awesome! Wait a second…I play the violin…I could totally learn how to play like that.” So I took private lessons for a year and a half and started going to a summer fiddle camp, and now here I am! For this contest, I was specifically trying to make a point about historically informed performance. All of my pieces were written before 1793, and I was playing on period instruments. The baroque violin is very different from the modern one.

Tim’s first-place finish qualifies him to compete in the 2015 Glenfiddich Fiddle Championship in Scotland next October when he’ll go up against other maestros from around the world.

Tune in to Tim’s winning performance below, check out this article featuring the fiddle aficionado himself and leave any and all accolades in the comments.

Meet Groupon Engineering Operations: Erica Geil

at October 22nd, 2014

Erica Geil has been at Groupon for over four years. “The thing I love most about my job is what I learn from the people I work with at Groupon. Everyone has different experiences, different interests and that keeps me engaged with the initiatives I oversee. I never have the same day twice,” Erica said. As Sr. Director, Global Engineering Operations, Erica’s role gives input to the project management team responsible for driving global, cross-functional initiatives. She also oversees ETHOS, Groupon’s team focused on engineering training and culture, and she gives guidance to Salesforce engineering and product development – these groups provide critical data to many Groupon tools and systems.

Groupon will be 6 years old in November and Erica has seen the company grow exponentially in her time. Groupon started as a daily deals business and is evolving to a Pull model where users can check Groupon first to discover an inventory of services, goods and travel deals. Watch below to learn more about how Erica’s work in Engineering Operations is part of that growth.

Sharing is Caring: Open Source at Groupon

at October 7th, 2014

Groupon is fueled by open source software. We run on software built in the open, supported by the community, and shared to move technology forward. While we give back when we can to the projects we use and share new creations, the true value resides in the people that make it happen, and that’s why we are pleased to announce Groupon’s new OSS home. It pulls information from Github, Open Source Report Card, Stack Overflow , and Lanyrd, with plans to add more information in the future. The purpose is to highlight the people that are actively contributing within Groupon Engineering and celebrate the things they do. This page is the result of Groupon Engineering’s most recent GEEKon, our bi-annual internal hackathon that gives our devs a chance to innovate bottoms up. We feel that it is important to share this information not only to show the world that we care about open source software, but to encourage more participation and find other amazing people who want to work with us.

Of course, this wouldn’t be in good taste unless we open sourced the code that generates this page. You can find it on Github.

We are excited to continue work on this project and even more excited to see additional contributions from folks outside of Groupon! We hope that it can help highlight the people in every company that make open source software what it is and find more meaningful connections between other like-minded people in their communities.

To get a complete picture of what we have already released you can visit the Groupon Github page. Be sure to keep an eye out for new releases and updates to our projects. And if you like the work that we have done and would like to work with us, check out our Groupon tech jobs.

Meet Groupon Engineering: Candice Savino

at October 7th, 2014


LR: How long have you been at Groupon? CS: I’ve been at Groupon for almost 3 years! I started the day after we IPO’d.

LR: What do you do at Groupon? CS: Currently I’m a Sr. Engineering Manager, but I started here as a Software Engineer.

LR: What are some of the challenges you get to solve in your work? CS: Building platforms that allow Groupon to give customers the best experience possible has been challenging but rewarding work. My team built the platform that powers the Holiday shop, Valentine’s day shop and countless other themes in 33 countries. The challenge behind this was to build a tool flexible and useful to the business without needing constant deployment cycles from engineering teams. This platform gave the control and flexibility to the business while keeping the engineers out of monotonous work.

LR: What do you like most about working at Groupon? CS: The best thing about Groupon is getting the opportunity to rebuild the tech platform at the ground level. Groupon is the fastest growing company ever and keeping up with the load from a technical perspective is a challenge. Being at Groupon gives you the opportunity to be part of rebuilding a company which you don’t get to do everywhere. My team is one of the best teams I’ve worked with in my career. Getting to work with each of these talented engineers is always the best part of my day!

Meet Groupon Engineering: Trina Gizel

at October 7th, 2014

Trina Gizel recently joined Groupon as Global Director of Information Technology. Among other responsibilities, Trina is the driver for the backend productivity systems for all employees at Groupon around the world. As the fastest growing company in history, Groupon has a complex IT landscape that Trina is optimizing. Click below to learn more about her work and how it fits within Groupon’s context of innovation:

Meet Groupon Engineering: Macey Briggs

at October 6th, 2014

“I like working on things that are highly visible to our customers,” said Macey Briggs, who drives project management at Groupon. In this role she works on cross functional projects that touch more than five development teams across the company. Macey has been at Groupon for three years and currently leads teams in Chicago and Europe. She is in the middle of one of our biggest projects to date – work on our international web frontend across 25 countries in Europe. “Any customer in Europe is being touched by the current project I’m working on and rolling out right now,” said Macey. When done, the web will match the web frontend we have in the US today and will be in line with our global technology.

Meet Groupon Engineering: Ali Vanderveld

at October 6th, 2014

Ali Vanderveld is a Data Scientist at Groupon. Prior to Groupon, she was at the University of Chicago where she was a research fellow. Ali holds a PhD in Physics from Cornell. Read below to find out more about her work.

Ali Vanderveld

LR: How long have you been at Groupon? AV: A little over one year.

LR: What interests you most about data science as a field? AV: I was attracted to data science because of the fast pace, the broad variety of problems to tackle, and the opportunity to create significant, tangible change.

LR: What are some of the challenges and problems you’re working to solve as a Data Scientist at Groupon? AV: The Chicago Data Science team works mostly on the Local and Sales side of the business. We work on answering Local-related business questions and on developing tools to optimize our sales force. For example, we forecast demand using a combination of sales velocity and search query data, and we determine user travel patterns using historical orders.

LR: Tell us a bit about your work as an astrophysicist and how that has parlayed into your work at Groupon? AV: I spent several years in academic research, studying theoretical cosmology. In particular, I studied gravitational lensing and worked on ESA’s Euclid mission, for which I developed numerical simulations and worked with Hubble Space Telescope legacy data. This is work that requires a lot of programming and statistics, and I now use those same tools to study e-commerce.

LR: What do you enjoy most about working at Groupon? AV: The people and the atmosphere are great. My team is filled with smart people from a wide variety of backgrounds, and we’re given ample opportunities both to be creative and for professional development.

LR: What do you like to do outside of work? AV: Work life balance is incredibly important to me, and as such I always have a few hobbies. I play the cello and I’m currently working on my first solo aerial silks act. I also volunteer at the Adler Planetarium.

Meet Groupon Engineering: Ushashi Chakraborty

at October 6th, 2014

Ushashi joined Groupon just under a year ago as a software development engineer in Test on the Internal Tools team. Ushashi works on backend tools and the testing of Groupon products. She collaborates extensively with the data mining team, writing tests to ensure that the predictive and recommendational algorithms are working correctly by having a test automation framework in place. According to Ushashi, “Groupon as a company is young and vibrant, and there is so much to do.”

Meet Groupon Engineering: Ushashi Chakraborty from Groupon Engineering on Vimeo.