DMARC at Groupon

at December 17th, 2014

At Groupon we are a global company sending email in 47 countries worldwide. Our mission is to connect our customers with our merchant partners through price and discovery using email as one of the communication channels. Given the global reach and strength of our brand “bad actors” have attempted to misuse our brand and email domains through phishing activity to trick unsuspecting users into providing sensitive personal information. As such we began the work to implement Domain-based Message Authentication, Reporting & Conformance policies, or DMARC for short, globally to combat these “bad actors.”

DMARC is a policy-reporting layer built on top of standard email authentication protocols known as Sender Policy Framework (SPF) & Domain Keys Identified Mail (DKIM). At a high level SPF allows receiving email servers to check whether email from a domain is sent using approved infrastructure or IPs. DKIM applies similar concepts at the domain level but uses a private/public key pair to validate pre-defined portions of the email message from the domain in question. From an execution level SPF and DKIM both rely on DNS lookups to function correctly.

At Groupon SPF and DKIM are standard authentication protocols used in every country we operate. As such we took the next step to implement DMARC around the world in an effort to fight phishing and create a feedback loop for how our email domains are utilized in the wild. DMARC operates through a DNS record where we are able to tell participating email providers like Gmail, Hotmail, and Yahoo to take specific policy actions (none, quarantine, reject) for email failing SPF & DKIM.

When declaring a policy of “none”, defined as “p=none” in the below example, we are instructing the participating email providers to take no action with messages failing authentication. Even though no action is taken we still receive reports on how email is passing or failing authentication from those providers. The reports are sent to the email addresses defined below in the “rua=” and “ruf=” sections. The “rua” option refers to an aggregate report of failures. It can be thought of as a high level aggregate failure report. The “ruf” option is the more detailed reporting path, providing significantly more and detailed forensic reports for every failure. At Groupon we work with Agari, an email security company, to compile this data into human readable reports, which support our DMARC work globally. Overall, the “p=none” step is key in our DMARC rollout process as we use this data to create a baseline for authentication performance and ensure we are in a position to not block legitimate email when we choose to enforce a “quarantine” or “reject” policy.

v=DMARC1; p=none; fo=1; rua=mailto:example@example.com; ruf=mailto:example@example.com; rf=afrf; pct=100

After a complete and thorough audit at the “p=none” stage we move to publishing a “quarantine policy”, defined as “p=quarantine” in the below example. When declaring a quarantine policy we are instructing email providers to send any email failing SPF & DKIM to spam, which quarantines the email outside the users’ inboxes. It is at this stage that we take advantage of the “pct” feature. This gives us the ability to inform email providers about the percentage of email failing authentication to quarantine. At Groupon we found that anything less than 50% does not provide a significant enough sample size to analyze the data for when to move to publishing a “reject policy.”

v=DMARC1; p=quarantine; fo=1; rua=mailto:example@example.com; ruf=mailto:example@example.com; rf=afrf; pct=50

Once any remaining issues have been corrected at the quarantine stage we publish a “reject policy”, which is represented as “p=reject” in the below example. Publishing a “reject policy” instructs any participating email providers to block all email failing authentication from reaching the inbox or spam folder. As a practice at Groupon when we reach this stage we leave the “pct” option set to 100, which instructs participating email providers to block 100% of all email failing authentication. This is done to take full advantage of the anti-phishing benefits DMARC provides and is possible due to the work completed to ensure no legitimate email is blocked by accident.

Throughout the DMARC process we have alerts set to trigger if any failures on legitimate email exceed our internal thresholds. These alerts take center stage when we reach the “reject” phase. If our pre-defined thresholds are met, it initiates a rollback of DMARC policies from “quarantine” or “reject” to “none” in the effected region to ensure email is not inadvertently blocked.

v=DMARC1; p=reject; fo=1; rua=mailto:example@example.com; ruf=mailto:example@example.com; rf=afrf; pct=100

We follow the process of moving incrementally from a policy of “none” to “quarantine” and eventually “reject” to make changes in a controlled fashion. A staged rollout allows us to adjust the process as needed by responding to what the data highlights as our action items at each phase. This provides the opportunity to complete our due diligence while minimizing the overall risk of blocking legitimate email to our subscribers. I am happy to report that we are enforcing DMARC policies in 45 countries with 43 countries publishing a “reject policy.”

The implications of being able to globally reject phishing emails that are targeting our subscribers and brand are enormous. Recently in Brazil we tracked a phishing campaign offering discount iPhones in an attempt to steal credit card information. (screenshot below)

phishing_example

Due to our use of DMARC and the stellar implementation by my team in South America we were already publishing a “reject policy” for our mailing domain in Brazil, r.grouponmail.com.br. As a result we were able to proactively block around 50,000 phishing emails targeting Gmail, Hotmail, and Yahoo! addresses, which added another layer of protection for our subscribers. (data below)

Data

We will continue to roll out DMARC through the remaining countries to ensure our subscribers are able to benefit from the anti-phishing protection they deserve. Once the process is completed all Groupon email operations will be covered by DMARC. For Game of Thrones fans, DMARC can be thought of as a member of the Night’s Watch, silently standing guard on The Wall. DMARC protects the Groupon realm from phishing attempts and keeps our subscribers and brand safe in the process.


Groupon Selected as One of the Best Apps of 2014

By
at December 8th, 2014

Screen Shot 2014-12-08 at 2.49.39 PM

We are all very excited that Google has named Groupon on of the Best Apps of 2014. We work very hard to make our app fun and delightful, and are happy that people love it and consistently give us great reviews. We’ve recently refreshed the UI, added your reviews and tips for many merchants, and made significant architectural changes to get us a 40% improvement in startup times. There’s a lot more to come so look forward to our releases in 2015!

Well done to all the teams that have contributed to this effort!


How do Groupon Customers Fare When it Comes to Gift Giving?

By
at November 24th, 2014

It’s that time of year!! And personally, it’s my favorite time of year! I love what the season represents: family, togetherness, generosity, and opportunities to show appreciation for one another.

This month I thought I would step back and take some time for something that’s always fun……PRESENTS! As the gift giving season is upon us, the Groupon Data Science team is here to tell you who are the best Groupon gift givers!

As a whole, the industry has been experiencing a shift toward online shopping and more recently a shift toward shopping on Mobile. Last year Mobile traffic accounted for 30+% site visits on Cyber Monday. At Groupon, mobile accounts for more than 50% of our transactions worldwide.

As more and more people decide to buy products on their phones, we thought it would be interesting to know who are the better gift givers: iPhone or Android users?

First off, Groupon users spend 45% more online than your average US consumer! So make sure you cozy up to your Groupon-loving friends this season!

Not only do Groupon customers spend more money online, they are more generous to others than to themselves! All customers spend more when buying a Groupon deal as a gift than when buying a Groupon deal for themselves. But as we see later, Groupon app users are the more generous gift givers.

Screen Shot 2014-11-24 at 4.26.17 PM

Q: Who gets more in the spirit in gift giving?

iPhone users. The data suggests that iPhone users tend to get a little more in the spirit spending upwards of 50% more on a gifted Groupon deal than on Groupon deal for themselves. If you have an iPhone user friend spending $50 on average on Groupon deal you can expect them to spend $75 on a gift! But Android friends aren’t too shabby either and compared to all Groupon users, are overall more generous when it comes to gift giving. When looking at a random Monday, the average Android user’s generosity surpasses that of an average iPhone user’s.

Screen Shot 2014-11-24 at 6.10.12 PM

Q: The holiday season can be hectic, taking care of oneself is important! Who takes care of themselves the best?

Android users. Android users spend 10-20% more on purchases during the holiday season. Cyber Monday seems to be the day when everyone goes for that upgrade and pays a few extra dollars to get something nice for themselves. It is the peak time for self-spend, especially for Android users. And with Groupon’s crazy Cyber Monday deals why wouldn’t you treat yo self, even Batman does.

Q: The most annoying giver gifts items better suited for himself then for the recipient, who is the biggest offender?

Neither. Neither Andriod users or iPhone users are guilty of having the same purchase profile for themselves as they do for gift giving. Interestingly, the different platforms’ gift giving patterns stay true to stereotypes: andoid = more techie, iphone = experience.

Screen Shot 2014-11-24 at 5.04.18 PM

So back to the original question: who gives the best gifts? It probably depends on what you’re looking for: cool gadgets or fun experiences! Either way, Groupon’s got it all this Holiday season and we’re kicking it off a little early with these killer Black Friday deals!


On The Subject of Girls, Technology, and Marshmallow Or: how the Evolution of Girl Scouts and STEM is evident at Groupon

at November 14th, 2014

IMG_7901

Groupon recently opened its green doors to some of the Girl Scouts’ best and brightest for our Scout Out Engineering event. For the second and consecutive year, Groupon Engineering and the Groupon Employee Volunteer Program partnered with the Girl Scouts of Greater Chicago and Northwest Indiana to welcome 5th and 6th graders into the Chicago Groupon office for a morning of learning, fun, and tech engagement.

Scout Out Engineering introduces girls to engineering concepts through a combination of presentations and hands-on learning. Groupon’s goal is to excite these girls about technology and keep them interested in engineering and STEM education.

IMG_7898 (1)

IMG_7905 (1)

One tenet of the Girl Scouts that makes them great is their all inclusive, ‘every girl’ approach. For the Girl Scouts, every girl should be able to participate in any activity regardless of her background or skillset. Last year, Groupon was advised to plan for girls with no internet in their homes, no experience with computers, and no idea who – or what – Groupon was as a tech company. With those guidelines in mind, we planned the program as a hands-on engineering centered event that, for a tech company, was strangely void of computers.

If the focus of last year’s program was to introduce the idea of STEM education and emphasize its importance, then this year’s focus was to build on that foundation and actually do something about it.

In the six months leading up to our 2014 planning, ideas incubated and matured, technology advanced, and the profile of the ‘every girl’ evolved. In 2014 ‘every girl’ used a computer, a smartphone, and got exposed to some aspect of STEM education daily. The Girl Scouts encouraged Groupon to incorporate computers into the program–many of the girls may have already done some form of coding–and there were no limits on what technology the girls could be exposed to.

With these new guidelines we designed a program with a tech heavy core that better represented the work that happens here at Groupon. Hands-on computer learning took center stage and the focus on coding allowed participants the chance to code alongside top engineers and continue their learning outside Groupon’s green walls. A bridge building activity became an opportunity for girls to work cross functionally and employ a few of the key concepts that keep Groupon Engineering running. Girls learned about agile methodologies, iterated on their work, and closed the day with a real, live white boarding retrospective session (and, of course, pizza.)

IMG_7521 (1)

IMG_7907 (1)

Scout Out Engineering at Groupon exposed girls to technology in an immediate and accessible way. It became an event for Groupon employees to use their talents to spark interest in subject matter that they are passionate about, and it gave everyone the opportunity to realize how essential empowering young girls can be. When it comes to STEM education at Groupon, there has always been an abundance of employee support and our support for the Scout Out Engineering event was no different. From the planning team, to speakers, to volunteers, Groupon Engineering was ready and willing to donate time, energy, and resources to teach these girls a thing or two about tech.

IMG_7522 (1)


Gnome Foundation and Groupon product names (UPDATED)

at November 11th, 2014

UPDATE: There is some recent confusion around Groupon’s intended use of a product name that the GNOME Foundation believes infringes on their trademarks. While notified by the GNOME Foundation directors that they believed this was the case, we were not able to come to an agreement and were proceeding with the registration of our marks. We apologize for any distress this has caused GNOME Foundation and the open source community.

We love open source at Groupon. We have open-sourced a number of projects on Groupon’s github. Our relationship with the open source community is more important to us than a product name.

After additional conversations with the open source community and the Gnome Foundation, we have decided to abandon our pending trademark applications for “Gnome.” We will choose a new name for our product going forward. We will continue to work with the Gnome Foundation as we rebrand our product.

Please see our joint statement on the GNOME Foundation’s website and below:

“Groupon has agreed to change its Gnome product name to resolve the GNOME Foundation’s concerns. Groupon is now abandoning all of its 28 pending trademark applications. The parties are working together on a mutually acceptable solution, a process that has already begun.”

No Tags


Groupon’s Geekon project adds Apache Kafka Support to Facebook’s Presto exabyte scale analytic SQL engine

at November 9th, 2014

Started as a project at Groupon’s global Geekon hackathon, support for Apache Kafka adds real time querying capabilities to Presto SQL query engine.

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to exabytes, originally released by Facebook. Apache Kafka is a high-throughput distributed messaging system.

With the ability of live data queries, Presto can now support use cases that were traditionally only available to special tools such as Splunk.

Groupon Engineering is planning to use Presto to analyze its real time event data streams and will replace an existing legacy system. Using Presto will allow engineers and data analysts to correlate current (live) data from Apache Kafka and historic data stored in Apache Hadoop. This capability will allow Groupon to shut down a number of existing legacy systems and reduce operating costs while improving insight into our real time data flows.

Groupon Engineering is engaged with the community to deliver excellence in open source development.

… and clearly, we are always hiring!


Groupon adopts Kill Bill, the open-source Payments Platform

at November 3rd, 2014

Groupon has always been a committed player in the open source community, both by releasing our tools and libraries to a larger audience and by using popular open source projects. So when we took a step back earlier this year to re-assess our global payments infrastructure, we naturally looked at what the community had to offer. We’re now pleased to announce that we have successfully integrated Kill Bill, the open source billing and payments platform, with a subset of our services, and we are planning a wider rollout.

Kill Bill provides a platform for building billing and payments infrastructures. It offers a framework for handling recurring subscriptions as well as unified APIs to support virtually any kind of payment gateway and payment method in the world, from wire transfers to credit card payments, as well as crypto-currencies and even Apple Pay.

While Kill Bill has been deployed in large scale infrastructures before (such as at Ning), the Groupon environment is truly unique; Groupon as the world’s largest marketplace of deals is present in 45 + countries, with more than 240,000 global, active deals, supporting over 120 payment methods. Our team focused on performance testing the system and made sure that each and every single payment handled is secure and reliable. As part of this process, we discovered the limits of some of the libraries we use, and reported and helped fix bugs in Java 8, JRuby, ActiveRecord and more. The community has been outstanding in this process, thanks to all of you!

We believe strongly in the exchange of ideas and cooperation between people. If this sounds good to you, we are hiring!

No Tags


Local Hotspots and Travel Flows, Directly from the Data

at October 29th, 2014

It’s a crucial question in ecommerce: How likely is customer X to buy product Y? For Local, we must of course consider the physical locations of both X and Y. This is the location relevance problem, which is one of the most important ingredients in determining the best deals for each of our users. When we send out emails or return search query results, the deals that we display have to be relevant. To solve this problem we need to know our users’ propensity to travel for the different services that we offer, and having an accurate measurement of these travel patterns helps us to understand demand and thus optimize our sales force.

We cannot simply assume that users will want to stay in their home neighborhoods. People want to get out and explore, and we want them to check Groupon first. One approach is to determine location relevance based on simple distance, but this is an over-simplification. We know that people flock to local hotspots and avoid certain neighborhoods. They are also more likely to travel farther for a rare service, a pricey restaurant, and many leisure activities such as museums and waterparks. Fortunately, we can capture these trends directly from the data. Here’s how.

Getting the data

For this analysis, we require data pertaining to where our users are located and what they have purchased. For this we leverage the fact that users can voluntarily provide us with their locations in the form of zipcodes or full addresses. For each historical order data point, there are several variables that we want to track due to their importance in determining whether or not someone is willing to travel for a Local deal. For this analysis we consider the following:

  1. the location of the user
  2. the location of the merchant
  3. the service that the merchant provides
  4. the price of the deal

In order to compute useful empirical probabilities regarding how likely users are to travel, we need to group the data points into bins. For the locations, we partition the world into markets (metropolitan areas plus the surrounding suburbs), which are further divided into “submarkets.” For the deals, we have a hierarchical taxonomy with three levels of merchant attribution: the type of services offered (e.g. milkshakes or body wraps), the merchant type (e.g. “Bakery & Desserts” or ”Spa Services”), and the general merchant category (e.g. “Food & Drink” or “Beauty / Wellness / Healthcare”). We further define price bins such as “$0-25″ on the low end and “$100+” on the high end.

Thus for each user we have a market and a submarket, and for each deal we also have a market and a submarket, in addition to a service (with the associated merchant type and category) and a price bin. Then for each <service, price bin, deal location, user location> combination we can empirically determine the odds that the user will be willing to travel for the type of deal in question, based on what has occurred in the past.

Based on these odds we can determine the most popular travel patterns, which will tell us where each city’s hotspots are located. We can further define an effective radius for each individual service, thereby determining how far users are typically willing to travel for, say, aerobics versus paintball.

Not so fast…

There are several subtleties to this analysis. For starters, many users and merchants will have multiple locations in our database. This can happen for instance when a user has multiple addresses registered with us and when merchants have multiple locations. To work around this, we assume the <user location, deal location> pair that corresponds to the shortest distance. In other words, if a user buys a deal that is closer to their home than to their workplace, we assume that they’re traveling from the former. Similarly, if someone buys a deal that has multiple locations, we assume that they’re going to redeem it at the location that is closest to them.

A bigger problem is data sparsity. Given the extremely broad variety of services that we offer, we find that some of our <service, price bin, deal location> combinations have too few data points, and thus a poor sampling of the relevant locations of the travel-willing users. For instance, in Chicago and the surrounding suburbs we have 14 submarkets. Thus if we wanted to determine which submarkets’ users are buying mid-price downtown yoga deals, we need to have far more than 14 data points to get a good sampling. We work around this problem by using geography-independent fallbacks, utilizing our taxonomy. For instance if we lack sufficient data at the <service, price bin> level, then we collapse out the price bin and only consider the service. If we still lack sufficient data, we then fall back a level in our taxonomy and use the merchant type or the even more granular merchant category.

Another important issue is outlier deals. Especially amazing deals might draw users from a much wider radius than is typical, which would skew our results. To deal with this we use outlier removal to exclude the very top- and bottom- performing historical deals from our dataset.

Results

For each <service, price bin, user location, deal location> combination, our result is the probability that a user from that location would be willing to travel for a deal of that service, price bin, and location. To be sure that we’re not just seeing noise and that these travel flows are actual organic tendencies, we say that a flow is important enough to be deemed “travel-worthy” if this probability reaches a threshold of 20% or more. This level was found to be aggressive enough to leave us with only the truly statistically significant flows, yet low enough to give us sufficient useful information on the travel patterns for each city and service.

As expected, we find that users are indeed inclined to travel beyond their home neighborhoods, and that those travel propensities depend on where they live. For instance, the median distance Chicago users are willing to travel is about 5 miles. However, this median depends strongly on user submarket, and is under 2 miles for downtown users but approximately 12 miles for users from the South suburbs.

chicago

Where are these users traveling, and for what? Our results tell us these hotspots as a function of service, and we find that they depend on a combination of merchant density and merchant quality. For more common services, such as steakhouses, we find that users generally travel from areas of lower to higher merchant density. However, users are also willing to travel for particularly great merchants, and this preference is more likely to dictate the travel patterns for more unique services, like museums.

For example, one service in our “Food & Drink” category is “Cupcake.” We can query our results to give us the travel worthy patterns for this service for Chicago and the lowest price bin.

cupcake4

Chicago’s cupcake hotspots, here marked with red dots and defined as submarkets having at least 3 travel patterns ending there, all contain regions with the highest densities of cupcake-providing merchants. Similarly, for steakhouses we find that the submarkets containing the Magnificent Mile and Naperville are the major hotspots, as any Chicagoan might expect.

Which <service, deal location> combinations draw the most travelers? In Chicago in particular, we find that people are flocking to a Murder Mystery dinner spot in the West suburbs, a Hawaiian restaurant in the North suburbs, and the Field Museum and Segway tours downtown.

Despite our strict travel-worthy threshold, we still must verify that these travel patterns are actual organic travel tendencies, as opposed to being due to gaps in our inventory. Fortunately using census data for each submarket we find a strong correlation between the resident and business densities and the number of travel patterns that end there, indicating that our hotspots reflect actual travel patterns and are not biased by Groupon offerings. Still there are two behaviors that we need to stay on the lookout for:

  1. regions with low business density but high travel worthiness that are getting more than their fair share of deals
  2. regions with high business density but low travel worthiness that are getting less than their fair share of deals

In this way we can keep an eye on our inventory and troubleshoot as needed.

As mentioned above, we can also define an effective radius for each <service, price bin, location> combination, to determine how far users from said location are willing to travel for certain types of deals. We define this to be the 75th percentile of all of the relevant user-deal distances in our historical data set. By doing so we find that the lowest-radius services tend to include everyday fitness activities like aerobics classes, gym memberships, and spinning classes, whereas the highest-radius services tend to include weekend leisure activities such as white water rafting, off-roading, and skiing.

Next steps

This analysis was performed entirely with subscription and order data, and thus it was limited to a study of the interplay of merchant and user home locations. The expansion of our mobile business provides a huge opportunity for further tracking of travel patterns. Assuming users have given us their explicit consent to track it, GPS data gives us a much finer-grained picture of their behavior, thereby enabling us to learn where users are when they open the app and where they are when they place orders. Coupled with Gnome, we can further empower merchants to build stronger ties with customers who routinely travel to their neighborhoods.


Meet Groupon Engineering: Tim Macdonald

at October 23rd, 2014

Tim Macdonald

LR: How long have you been at Groupon? TM: Just over a year; I started part-time work in September 2013, and then switched to full-time last January.

LR: What are some of the challenges and problems you get to solve in your work as a software engineer at Groupon? TM: I’ve been on several teams, but so far have spent most of my time doing some interesting work with analyzing past deals we’ve run and using that data to predict how new deals would do in the market. It’s of course both challenging and exciting to be working on a suite of new products, and more specifically there were some good problems related to parsing deals—that is, figuring out what exactly they’re selling and how that stacks up to other ones. From a more technical standpoint, I’ve learned a new language that I really love (Clojure), massively improved my skills with another (Javascript), and generally be more involved with full-stack development.

LR: What do you enjoy most about working at Groupon? TM: Lots of great things about working at Groupon but I think the best part is the people I get to work with. It amazes me that the same group of people making great technical decisions and giving great product pushback can always make me laugh or be up for a beer!

LR: You recently took first place at the 2014 U.S. Scottish Fiddle Championship! How did you get into fiddle playing? TM: I’ve been playing the (classical) violin since I was four—I went to see the Tegucigalpa Philharmonic and was entranced. When I was twelve, I was at a Scottish festival, saw the fiddle competition, and thought, “That’s awesome! Wait a second…I play the violin…I could totally learn how to play like that.” So I took private lessons for a year and a half and started going to a summer fiddle camp, and now here I am! For this contest, I was specifically trying to make a point about historically informed performance. All of my pieces were written before 1793, and I was playing on period instruments. The baroque violin is very different from the modern one.

Tim’s first-place finish qualifies him to compete in the 2015 Glenfiddich Fiddle Championship in Scotland next October when he’ll go up against other maestros from around the world.

Tune in to Tim’s winning performance below, check out this article featuring the fiddle aficionado himself and leave any and all accolades in the comments.


Meet Groupon Engineering Operations: Erica Geil

at October 22nd, 2014

Erica Geil has been at Groupon for over four years. “The thing I love most about my job is what I learn from the people I work with at Groupon. Everyone has different experiences, different interests and that keeps me engaged with the initiatives I oversee. I never have the same day twice,” Erica said. As Sr. Director, Global Engineering Operations, Erica’s role gives input to the project management team responsible for driving global, cross-functional initiatives. She also oversees ETHOS, Groupon’s team focused on engineering training and culture, and she gives guidance to Salesforce engineering and product development – these groups provide critical data to many Groupon tools and systems.

Groupon will be 6 years old in November and Erica has seen the company grow exponentially in her time. Groupon started as a daily deals business and is evolving to a Pull model where users can check Groupon first to discover an inventory of services, goods and travel deals. Watch below to learn more about how Erica’s work in Engineering Operations is part of that growth.